Summary:
we have a long-standing problem with all the tests that hit
the repositories. The tests are triggered as soon as a compose
completes. At this point in time, the compose is not synced to
the mirrors, where the default 'fedora' repo definition looks;
the sync happens after the compose completes, and there is also
a metadata sync step that must happen after *that* before any
operation that uses the 'fedora' repository definition will
actually use the packages from the new compose. Thus all net
install tests and tests that installed packages have been
effectively testing the previous compose, not the current one.
We have some thoughts about how to fix this 'properly' (such
that the openQA tests wouldn't have to do anything special,
but their 'fedora' repository would somehow reflect the compose
under test), but none of them is in place right now or likely
to happen in the short term, so in the mean time this should
deal with most of the issues. With this change, everything but
the default_install tests for the netinst images should use
the compose-under-test's Everything tree instead of the 'fedora'
repository, and thus should install and test the correct
packages.
This relies on a corresponding change to openqa_fedora_tools
to set the LOCATION openQA setting (which is simply the base
location of the compose under test).
Test Plan:
Do a full test run, check (as far as you can) tests run sensibly
and use appropriate repositories.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D989
Summary:
Except when running on the pre-upgrade release in the upgrade
tests (where GPG check should always be OK).
Currently we always need to use --nogpgcheck on Rawhide, and we
must also use it on Branched prior to the Bodhi activation
point. At present we don't really have any simple way to know
when the Bodhi activation point has kicked in. We could assume
that it's safe to do GPG checking for 'candidate' (not nightly)
composes, but even that isn't 100% safe and isn't really the
*right* thing to do. So I think for now it's best to just always
use --nogpgcheck , until we come up with a decent way to check
for Bodhi enablement, or releng figures things out so we can
rely on packages being signed in Rawhide and in Branched before
Bodhi enablement.
Test Plan:
Check the tests all still run, make sure I didn't
miss any dnf calls.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D964
Summary:
again, added as a non-fatal module for realmd_join_cockpit as
it's convenient to do it here. Also abstract a couple of ipa
bits into a new exporter package in the style of SUSE's
mm_network, rather than using ill-fitting class inheritance as
we have before - we should probably convert our existing class
based stuff to work this way.
Also a few minor tweaks and clean-ups of the other tests:
The path in console_login() where we detect login of a regular
user when we want root or vice versa and log out was actually
broken because it would 'wait' for the result of the 'exit'
command, which obviously doesn't work (as it relies on running
another command afterwards, and we're no longer at a shell).
This commit no longer actually uses that path, but I spotted
the bug with an earlier version of this which did, and we may
as well keep the fix.
/var/log/lastlog is an apparently-extremely-large sparse file.
A couple of times it seemed to cause tar to run very slowly
while creating the /var/log archive for upload on failure. It's
no use for diagnosing bugs, so we may as well exclude it from
the archive.
I caught cockpit webUI login failing one time when testing the
test, so threw in a wait_still_screen before starting to type
the URL, as we have for the FreeIPA webUI.
I also caught a timing issue with the openQA webUI policy add
step; the test flips from the Users screen to the HBAC screen
then clicks the 'add' button, but there's actually an identical
'add' button on *both* screens, so it could wind up trying to
click the one on the Users screen instead, if the web UI took
a few milliseconds to switch. So we throw in a needle match to
make sure we're actually on the HBAC screen before clicking the
button.
We make the freeipa_webui test a 'milestone' so that if the
new test fails, restoring to the last-known-good milestone
doesn't take so long; it actually seems like openQA can get
confused and try to cancel the test if restoring the milestone
takes a *really* long time, and wind up with a zombie qemu
process, which isn't good. This seems to avoid that happening.
Test Plan:
In the simple case, just run all the FreeIPA-related
tests on Fedora 24 (as Rawhide is broken) and make sure they all
work properly. To get a bit more advanced you can throw in an
`assert_script_run 'false'` in either of the non-fatal tests to
break it and make sure things go properly when that happens (the
last milestone should be restored - which should be right after
freeipa_webui, sitting at tty1 - and run properly; things are
set up so each test starts with root logged in on tty1).
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D935
Summary:
this is following a SUSE model for tests where we need a server
end but don't want setting up the server to constitute a real
test in itself, we want it to be stable. The 'support_server'
test just boots a pre-built (by createhdds) disk image, sets up
networking, and runs the iSCSI server.
To run the iSCSI test we need to handle networking config in
anaconda (or we would need to set the support server up as a
DHCP server, which may be worth considering), so this adds that.
We also need to be able to specify the target device for a
volume in custom partitioning, so this adds that too.
Test Plan:
Build the necessary support server disk image (use
D883), then run the test and make sure it works. Also make sure
all other tests continue to work.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D884
Summary:
This requires a few other changes:
* turn clone_host_resolv into clone_host_file, letting you clone
any given host file (cloning /etc/hosts seems to make both
server deployment and client enrolment faster/more reliable)
* allow loading of multiple POSTINSTALL tests (so we can share
the freeipa_client_postinstall test). Note this is compatible,
existing uses will work fine
* move initial password change for the IPA test users into the
server deployment test (so the client tests don't conflict over
doing that)
* add GRUB_POSTINSTALL, for specifying boot parameters for boot of
the installed system, and make it work by tweaking _console_wait
_login (doesn't work for _graphical_wait_login yet, as I didn't
need that)
* make the static networking config for tap tests into a library
function so the tests can share it
* handle ABRT problem dirs showing up in /var/spool/abrt as well
as /var/tmp/abrt (because the enrol attempt hits #1330766 and
the crash report shows up in /var/spool/abrt, don't ask me why
the difference, I just work here)
* specify the DNS servers from the worker host's resolv.conf as
the forwarders for the FreeIPA server when deploying it; if we
don't do this, rolekit defaults to using the root servers as
forwarders(!) and thus we get the public, not phx2-appropriate,
results for e.g. mirrors.fedoraproject.org, some of which the
workers can't reach, so PackageKit package install always fails
(boy, was it fun figuring THAT mess out)
Even after all that, the test still doesn't actually pass, but
I'm reasonably confident this is because it's hitting actual bugs,
not because it's broken. It runs into #1330766 nearly every time
(I think I saw *one* time the enrolment actually succeeded), and
seems to run into a subsequent bug I hadn't seen before when
trying to work around that by trying the join again (see
https://bugzilla.redhat.com/show_bug.cgi?id=1330766#c37 ).
Test Plan:
Run the test, see what happens. If you're really lucky,
it'll actually pass. But you'll probably run into #1330766#c37,
I'm mostly posting for comment. You'll need a tap-capable openQA
instance to test this.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D880
Summary:
These require openQA tap networking to allow the server and
client boxes to communicate, and require masquerading (NAT) so
the server at least can reach a repository (dnf/rolekit really,
really do not want to work without a repo connection).
They use the 'parallel' test support to have the server deploy
run first while the client enrol test waits at the grub menu
until the server is done before it goes ahead.
This is all deployed and working on stg. The really tricky bit
was getting all the openvswitch and firewall config right in
ansible.
We *could* do the server deploy test as a follow-on from the
default install test to save the install, but then we'd have to
teach it to change the hostname and set up static networking
post-install. I'm not sure if it's worth doing that.
This requires the corresponding openqa_fedora_tools commit that
adds the hard disks (containing the kickstarts - it's possible
to get them from remote during install, but we have to set up
name resolution or hard code the IP of the server).
Test Plan:
Deploy this and the openqa_fedora_tools commit,
generate the disks, configure the networking (good luck! See
the docs in openqa_fedora_tools) and see if you can run the
tests. If you're using Docker, uh...sorry. You somehow need to
set things up so the workers can use tap interfaces that can
talk to each other and are NATed to the outside world. Have fun.
I can talk you through it on IRC...
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D831