I noticed a pattern lately of VNC tests failing on Rawhide when
we have a debug kernel (but passing with a regular kernel). On
closer investigation I think there's simply a screen blank
happening if the install process takes more than five minutes,
and that's more likely with a debug kernel. This extends the
loop we use to move the mouse every so often while waiting for
the install to complete (which is meant to defeat this sort of
thing) to also click the mouse, when we're a VNC client test. In
a quick check this seemed to help.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Update a needle with slightly different text rendering, and add
a workaround to hit tab three times rather than once on entering
the "Join a domain" screen, see
https://github.com/cockpit-project/cockpit/issues/14895 .
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The weird bug turned out to be caused by an internal DNS zone
in the new infra not being signed:
https://pagure.io/fedora-infrastructure/issue/9411
This is now resolved, so we can drop the workaround.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Revert "Regenerate grub.cfg for ppc64le Silverblue to boot (step 2)"
This reverts commit d384cfed30.
Revert "Regenerate grub.cfg for ppc64le Silverblue to boot, brc#1817004"
This reverts commit 8d7be9a227.
Not required anymore for f33 (only for f32)
And bad side effect for f33 (failure not analysed)
eg: https://openqa.stg.fedoraproject.org/tests/949783#step/_do_install_and_reboot/32
Keep correction to avoid warning in autoinst-log when ABRT var not defined.
Signed-off-by: Guy Menanteau <menantea@linux.vnet.ibm.com>
Signed-off-by: Michel Normand <normand@linux.vnet.ibm.com>
We've had this 'exception' for mcelog.service failing in here for
years. Looking into it, it seems to now be fixed:
https://bugzilla.redhat.com/show_bug.cgi?id=1526725
and hasn't happened in our official instances for years (I guess
because they're all Intel boxes). However, we have a similar case
on ppc64le with hcn-init.service failing spuriously:
https://bugzilla.redhat.com/show_bug.cgi?id=1894654
so I'm just converting it into a workaround for that instead. We
could wire this up to be more sophisticated, with some kind of
array or hash of services that are allowed to fail and more
complex checking code, but let's not bother unless/until it's
necessary.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
So, there's a problem with how we figure out the NetworkManager
connection to use in setup_tap_static: it expects the first
connection in the list to be the right one, but this is only
actually true so long as it's *active*. When we're in the tap
case, it's usually not going to actually *work* out of the box
on boot (or else we wouldn't need setup_tap_static at all...),
so some time after boot, NetworkManager gives up on it and marks
it as inactive. And after that, setup_tap_static won't work any
more.
I never noticed this as a problem before because usually we do
setup_tap_static before that point. But it seems in the vnc
client tests, on aarch64, desktop boot and login is slow enough
that by the time we switch to a VT and try to setup the network,
we're very close to that cutoff, and sometimes miss it.
This, I hope, avoids the problem by doing the network setup in
that test before we deal with the desktop login, then doing the
desktop login, then doing the actual VNC bits.
The alternative here would be to figure out a better way to do
setup_tap_static, but I can't.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Seems like this often fails when booting the desktop disk image
on aarch64 if we start typing right away.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
It seems like it can be *really* slow on aarch64, since 218:
https://github.com/cockpit-project/cockpit/issues/14840
this should give it a total of 180 seconds on aarch64 (90 second
still screen timeout plus 30 second assert_screen timeout, with
1.5x scale).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This sets us up to test the release-blocking aarch64 disk images
(Minimal, Server and Workstation). It also allows for testing
armhfp disk images on aarch64 worker hosts (though my testing of
that isn't going too well so far), and fixes the initial-setup
handling for a change upstream ('use password' is now the default
so we don't need to choose it). We rewire disk image deployment
test loading to work through the generic loader code rather than
using ENTRYPOINT, as it allows us to more gracefully handle
graphical (Workstation) vs. console (Server, Minimal), moving
the code for handling console initial-setup to a helper function
just like the code for gnome-initial-setup and having _console_
wait_login call it when appropriate. We also tweak desktop_vt a
bit because now we need to switch from a console running as test
to a desktop, which breaks the assumption that the highest
numbered session of user test is the desktop...
Signed-off-by: Adam Williamson <awilliam@redhat.com>
I noticed today that if we deploy FreeIPA with dnssec validation
enabled, dnf can't resolve dl.fedoraproject.org afterwards, which
is a problem because it means we wind up falling through to
random mirrors for metadata and package download once the server
is deployed, which can be slow and give old packages. This seems
to be why the server upgrade test on F33 is sometimes failing
because we get an older FreeIPA package on upgrade, even though
the newer one has been stable for a week.
It's difficult to pin down exactly where this bug is and fix it,
I've mailed some folks to try and work it out, but until that's
figured out, let's just disable dnssec validation.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We've been getting failures lately on the first page load, I
think because Firefox is getting even more grindy on startup. So
turn the 'sleep' into a 'wait_still_screen', extend another wait,
and tweak the 'browser' needle so it only matches after the
bookmark bar has loaded rather than as soon as half the chrome
appears. Also make all the wait_still_screens use similarity 45
for consistency (flashing cursor could be there on any of them).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This check wasn't working, the test passed whatever wait_serial
found. This version suggested by defolos works, I checked.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is the best option I can come up with to deal with #195.
Update notifications seem to have become transient in KDE lately
(even in F31 and F32, if I'm looking at these screenshots right).
This actually simplifies things a lot to do more or less the
same in the KDE and GNOME paths: open the 'permanent' store of
notifications (in GNOME you get to it by clicking on the clock,
in KDE via the systray) and then look for no notifications (live
path) or only an update notification (post-install path). We
only run this test for composes so we shouldn't need to worry
about anything older than F32, and I believe this should work
for KDE in F32 and F33. I left out click_unwanted_notifications
for now as I'm hoping it should be unnecessary, but we can add
it back in if necessary.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This does some of the things suggested by cheimes in
https://bugzilla.redhat.com/show_bug.cgi?id=1880628#c24 . It
seems to make the replica tests work with resolved, still work
with pre-F33 resolving, and not break anything. Also remove the
workaround to disable resolved if it's running, as we can now
work with it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We don't need a separate 'welcome' needle because it just matches
on an OK button anyway. So turn that needle into an OK needle
(we don't have any existing 'blue OK button' needle) and simplify
the logic to a single loop for kde_ok and krusader_settings_close.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
On ppc64le it looks like this test is often failing because it
takes a second or two to update the partition list after we
click update settings, but we're not waiting for that, so we
wind up clicking in the wrong place because we match the next
partition needle before the list is refreshed but click after
it's refreshed. Let's hope these waits solve it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Having systemd-resolved in use seems to cause problems for
FreeIPA servers:
https://bugzilla.redhat.com/show_bug.cgi?id=1880628
until the scripts are enhanced to do this or something, let's
disable it before server/replica deployment.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
ipa-replica-install already changes the DNS config to use the
local bind instance, we don't need to do this and it's actually
wrong (as it bypasses the local BIND we should use and uses
the VM host's DNS servers instead).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Seems what they had before worked until systemd-resolved became
the default; now we need to make sure we do nmcli mod and then
bring the connection down and up, as we do in tapnet.pm. Writing
to resolv.conf is kinda "wrong" for resolved but I don't think
it really breaks anything so I think I'll just leave those bits
in until F32 goes EOL just in case they're still somehow needed
on F31 or F32.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
https://openqa.fedoraproject.org/tests/667693#step/desktop_browser/8
shows us matching on Save File when the window is in kind of a
borked state; we'd probably wind up clicking on Open with,
because by the time we click the content of the window will have
moved to where it's actually supposed to be...so let's try this
to slow it down a bit.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Getting some odd failures where the downloaded file doesn't show
up in the right place which I think might be due to over rapid
clicking here. Try and slow it down a bit.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This should work even if the ifcfg plugin is not present (hi,
CoreOS) or 'predictable' (har) network names are on.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
There's a complex bug in current Rawhide affecting the database
server test; it boils down to "deployment fails because LANG is
set to a locale for which the corresponding langpack is not
installed". As we know broadly what's going on there now, let's
work around it with a soft failure so we catch any later bugs in
the process.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The time before the ssh key provision request goes through turns
out to be kinda unpredictable, so instead of just a hardcoded
wait then assuming it should succeed, let's do a loop-y retry
thing instead.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The FreeIPA UI change that the previous commit adapted to is in
4.8.9. That's stable for Rawhide and F33 already, but still in
testing for F32, and won't go to F31. So we need to make the
change conditional on release number, and we also add the update
to workarounds for F32 so we don't have to do something awkward
while we wait for it to go stable.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
OTP field was moved into the last position in the password change dialog
to prevent issues with OTP code expiring while users enter their
passwords.
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
GNOME now also splits 'Restart...' and 'Power Off...' as KDE
does, so we need to tweak the conditional and add some needles.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
In Fedora 33, we generally no longer include a disk-based swap
partition by default (instead swap-on-ZRAM is used, see
https://fedoraproject.org/wiki/Changes/SwapOnZRAM ). This tweaks
our tests to account for that. In tests that aren't to do with
swap at all, we stop including a swap partition in order to be
closer to the default layout. We replace the old _no_swap blivet
and custom tests with _with_swap tests that, as the name implies,
*explicitly include* a swap partition, and adjust the postinstall
test to check the disk swap partition is there.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
30 seconds doesn't seem to be reliable enough. Let's try 60, if
that's not enough I'll try and think of something smarter.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Just repositioning the mouse appears not to be enough to prevent
the sesssion going idle any more, since the 20200731.n.0 compose.
Not sure what causes this, probably the kernel. Adding a space
keypress seems to help in both KDE and GNOME.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is a bit complex to automate, because we cannot really use
the production Zezere server (provision.fedoraproject.org) as
the test case shows, as we'd have to solve authentication and
we also don't really want to constantly keep registering new
hosts to it that are going to disappear and never be seen again.
So, instead we'll do it by setting up our *own* Zezere, and
provisioning our IoT system in that. We run two tests. The
'ignition' test is the actual IoT 'device'; all it really does
is boot up, sit around, and wait to be provisioned. The 'server'
test first sets up a Zezere server, then logs into it, adds an
ssh key, claims the IoT device, provisions it, and connects to
it to create a special file which tells the 'ignition' test
everything worked and it can close out.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is to make the infra folks happy, apparently using 10.0.x.x
and 10.1.x.x is causing conflicts since our actual infra network
uses those ranges too.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
It actually is supposed to be installed by default, so if it's
missing that's a bug. It's been added to comps now so it should
be there from now on.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
It got split out and is not installed by default in 33-0.8. This
is intentional, not a bug, see https://pagure.io/fesco/issue/2114
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This adds a test that automates
https://fedoraproject.org/wiki/QA:Testcase_Clevis. It requires
os-autoinst-4.6-18.20200623git5038d8c or newer, and a worker
host in the 'tpm' class which is set up to have an instance of
swtpm running at /tmp/mytpmX , where X is the worker instance
number, for each worker. The Fedora infrastructure ansible
plays have been updated to handle this via an instantiated
systemd service, which other instances can also adopt.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
`cp -R foo/*` doesn't get all files in `foo/`, it misses hidden
files. This turns out to be a problem with recent anaconda, as
it expects to find a .treeinfo file here. So, let's use rsync.
We could probably do this with cp too but I can't think of the
right arguments right now...
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This adds a pair of tests, one which does almost all the work
from the test case, the other just a client test to check that
we can connect to an HTTP server running in a container on the
host. We also have to bump the _console_wait_login timeout on
this path a bit as we're booting a disk image that was installed
with DHCP working, but we change the network setup so DHCP does
not work any more, and the system spends quite some time trying
to bring the network up on boot before eventually giving up and
proceeding.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Here we are creating ~/.config for a newly-created user with root
ownership. We can't leave it that way, as commands run as the
user account won't be able to change it, as they should be able
to. So we need to change the ownership (and, just in case, fix
SELinux contexts) afterwards.
This was the real source of the problem we were seeing (the test
failing early due to the gsettings command which should turn the
screen background black failing).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We're trying to launch stuff the instant we see a desktop, and
it seems to be failing quite often in GNOME. Let's give it a few
seconds.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
IoT does nightly Branched and Rawhide composes that are built
as RC candidates, for some reason. So let's except it from this
check.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
I started out trying to fix os-release for the recent change to
add "Prerelease" tags to the VERSION and PRETTY_NAME fields, then
things spiralled. It got me thinking about the awkward DEVELOPMENT
variable we use, so I decided to get rid of it and refactor the
few things that use it. I refactored the anaconda prerelease tag
check, and wrote a new giant comment that gives details about
exactly how anaconda decides whether to show those tags, to give
context to our choices about when to expect them. This check now
uses a new LABEL variable the scheduler now sets. I also wound up
creating new UP1REL and UP2REL vars to define the 'source' release
for upgrade tests, separate from CURRREL and PREVREL, which are
now never lies - they really are the current stable and previous
stable release, even for update upgrade tests.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The update live image build test keeps running out of disk space.
We've bumped the minimal disk image from 12GB all the way up to
20GB so far but it keeps happening. So let's try a different
strategy: use a scratch disk to mount /var/lib/mock. That's where
all the space gets used. This should allow us to reduce the size
of the minimal disk image again, and giving it 25GB of empty disk
should avoid it running out of space again for a while.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is failing on every update and that's not telling us anything
useful - we already know about the bug - so let's work around it.
Not adding a softfail as it's a bit more awkward to do that.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We can get tripped up by the tutorial screen when launching
Boxes; borrow some code from the app start/stop test to check for
and handle it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The default timeout for check_screen is 0, so we were only giving
the enter key press a fraction of a second to take effect before
expecting to see locked_screen_switch_user. This is too tight,
see https://openqa.fedoraproject.org/tests/586257 . Let's give it
five seconds before we give up.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Since GDM shows the "system-menu-button", it could not correctly
switch users on a locked screen. I added a check to see
if we are on a locked screen and behave accordingly.
This adds a new test that implementsQA:Testcase_desktop_login
on both GNOME and KDE.
While working on this, we realized that the "desktop_clean"
needles were really "app menu" needles, and for KDE, this was
a duplication with the new "system menu" needles, because on KDE
the app menu and the system menu are the same. So I (Adam)
started to de-duplicate that, but also realized that "app menu
button" is a much more accurate name for these needles, so I was
renaming the old desktop_clean needles to app_menu_button. That
led me to the realization that "check_desktop_clean" is itself a
dumb name, because we don't (at least, any more, way back in the
mists of time we may have done) do anything to check that the
desktop is "clean" - we're really just asserting that we're at a
desktop *at all*. While thinking *that* through, I *also* realized
that the whole "open the overview and look for the app grid icon"
workaround it did is no longer necessary, because GNOME doesn't
use a translucent top bar any more. That went away in GNOME 3.32,
which is in Fedora 30, our oldest supported release.
So I threw that away, renamed the function "check_desktop",
cleaned up all the needle naming and tagging, and also added an
app menu needle for GNOME in Japanese because we were missing
one (the Japanese tests have been using the "app grid icon"
workaround the whole time).
This stuff is kinda broken in various ways and halfline thinks
he can fix the underlying bug anyway. So let's go back to just
the GNOME live test being broken for now.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
I tested this workaround on staging before pushing it to git and
it worked, but then when I pushed it to prod it didn't work. On
stg I also had this to set GDM to debugging mode, so maybe this
is also needed for the workaround to work for some reason?
Signed-off-by: Adam Williamson <awilliam@redhat.com>
A GNOME bug seems to result in us getting to GDM, not a liveuser
desktop, after running 'systemctl isolate graphical.target' from
a live boot to runlevel 3 since the end of March. This works
around that to let the test run, as it's not really a failure of
the test per se.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Previous commit same summary had some side effect
solved by this new one.
And avoid a warning in autoinst-log when ABRT var not defined.
Signed-off-by: Michel Normand <normand@linux.vnet.ibm.com>
I merged the previous commit before realizing the ordering was
wrong. All other 'actions' lines have to come *before* the one
that adds 'reboot', because one of the conditions for that is
whether @actions is populated - basically, if we're taking any
actions, we also have to reboot afterwards. If we add an action
*after* that line (but no actions were added before that line),
we'll do it but then not reboot and the test will break.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This reverts commit 00b756f0e2.
Unfortunately, I made a typo in the script and the fix did not
work. I do not want to rebase the master (in order not to break
things for everyone) so I am reverting again.
Sorry.
This reverts commit d784bf54ca.
It turned out that Locations are not connected to Konqueror
at all. The reason why the test is failing is that the
application has been removed to limit the number of
web browsers.
We seem to be seeing the bug this works around:
https://bugzilla.redhat.com/show_bug.cgi?id=1765685
in F30 and F31 update tests even with this wait. At least, it
looks that way. Trying this to see if a longer wait helps.
Signed-off-by: Adam Williamson <awilliam@redhat.com>