Summary:
With the previous code, the temporary file wasn't closed before
we try to upload its contents to the disk image. This seems to
cause problems when run in Python 2; the file is truncated on
upload. So rejig things a bit so we use tempfile.mkstemp, close
the file after writing to it, then remove it ourselves after
doing the upload. Tested with Python 2 and Python 3 that this
creates a correct image.
If something goes wrong at the wrong time the temp file will
be left around, but hey, it's a temp file - they get cleaned
up on system shutdown. So doesn't seem like a big deal.
Test Plan:
Check building the two images that involve a file
upload in both Python 2 and Python 3, make sure the file is
complete and correct in all cases.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D786
Summary:
createhdds.sh was just too damn simple and understandable, so
I thought I'd make it three times longer, object oriented,
and hard to understand!
OK, OK, that's not great sales. Alright. The main thing was to
make it smarter. This rewrite lets it do these things:
* Only create the images that are missing (not rebuild all)
* Work out the releases to build images for
* Rename images when appropriate
* Rebuild images when they need rebuilding
* Remove old / abandoned images
It can figure out what images ought to be present - including
working out the 'next' release and figuring out from that what
releases it needs images for - and build only the missing ones.
There's a 'version' concept for images; if the existing image
is older than the version given in the data file, it'll be
rebuilt. The data file can list 'rename' pairs, allowing images
to be renamed (like when we move from a single image to multiple
label/filesystem variants). This code uses fedfind's ability
to find the current release version to figure out what releases
we need virtbuilder images for (so you don't have to pass it
in). And it can find image files that aren't in the 'currently
expected' set and wipe them. Images can also have a 'maxage',
triggering a rebuild when they exceed it - this is intended
for the virtbuilder images, so we get a rebuild with the
latest updates every so often (default is two weeks).
The point of all this is to help with unattended deployment/
maintenance, i.e. the ansible deployment we have in infra;
the idea is that we can just set that up to run the 'all'
subcommand every so often, and it'll remove old images, create
new ones, and rebuild ones that are outdated.
I kept the ability to build a single image (or a whole image
'group'), and included the ability to just run a check without
actually doing a rebuild. There's a few little weird things
and holes here as it's not really the focus of the tool.
Test Plan:
Build all images, do a full test run, and see if
it works OK. Test out all the variations of building single
images / image groups, and using the 'check' command.
Reviewers: jskladan, garretraziel
Reviewed By: jskladan, garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D687
Summary:
For F22 we don't need to remove firewalld, as the approach to
flavor configurations was changed and the conflict wasn't
there any more. On the other hand for F21 we still do, and
we need to use yum, not dnf. So make the command run inside
virt-builder conditional on $VERSION. I figure we still might
want to create F21 images if we want to test F21->F23 upgrades.
Test Plan:
I built the new set of HDDs on BOS with this change,
so we can see if the upgrade tests all run as expected. To
check the F21 images I guess we'd have to poke at 'em with
guestfish or something.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D572
Summary:
This converts openqa_trigger into a fedora_openqa_schedule
package which is properly modularized: there's a CLI module,
a schedule module, a report module, and for now conf_test_
suites is its own module (though I think it's kind of ugly
and we should turn it into a JSON file or something).
ISO file download location configuration is now done with
an optional config file, as with the splits it becomes a
mess to try and pass it through from the CLI args. This also
means custom ISO locations will be respected by other things
we write which use the 'schedule' module.
This includes a setup.py so the package and fedora-openqa-
schedule command can be installed systemwide. We could now
extend this to install stuff like the systemd services and
little scripts like run-nightly.sh.
Test Plan:
Check that things work more or less as before. New CLI command
is 'fedora-openqa-schedule'; it has the 'current' and 'compose'
sub-commands, plus a new 'report' sub-command which works like calling
report-job-results.py directly used to. Check that installing systemwide
works properly. Check that ISO download location configuration works as
expected. Running './fedora-openqa-schedule' from within the git checkout
should also work.
Reviewers: garretraziel, jskladan
Reviewed By: garretraziel, jskladan
Maniphest Tasks: T541
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D547
Summary:
Have the backing functions always raise TriggerExceptions if
no jobs could be scheduled, and have the CLI command functions
just quit (with exit code 1) immediately when this happens.
I don't see any value to logging the errors and continuing to
run, nothing useful is going to happen with no jobs.
This allows us to have the 'current' cronjob run the compose
report: we just have it do:
openqa_trigger.py current && check-compose
then it won't generate a compose report every time it runs, but
only when the current compose changed and some jobs ran.
Test Plan:
Well, just make sure things quit or run as intended,
I guess.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D541
Summary:
So the first attempt to use the waiting stuff in production
failed because, at some point, koji_done got a socket.error
from the server. Not sure if that was a Koji outage or some
kind of rate control, but even if it's rate control and we need
to tweak the wait interval, this seems advisable: we shouldn't
die the first time we hit any kind of error while waiting, we
should retry a few times first (with increasingly long delays
between the retry attempts). I know bare except clauses are BAD,
but I think it's OK here as we can't really cover every possible
exception which might get raised in any module during a 'go hit
a server and do a bunch of stuff' operation, and if the error
keeps happening we *are* going to raise it eventually.
Test Plan:
Check I didn't break the 'normal' case, and try causing
an error to appear somehow (e.g. disconnect from the network or
hack up the 'client' instantiation in fedfind to use the wrong URL)
and see if the error handling works as intended.
Reviewers: garretraziel, jskladan
Reviewed By: garretraziel, jskladan
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D532
Summary:
This is an alternative to D516. It drops the 'all' subcommand
entirely and adds some capabilities to 'compose' to make it
suitable for scripting. --ifnotcurrent will check if the compose
is the same as the current validation event, and bail out if it
is. --wait waits for the compose to be available. I've also
enhanced fedfind (in 1.4.1) to allow passing just 'Branched'
as the milestone; it will guess the release and date, now (it
didn't before).
With all of that taken together, we could have three cron jobs,
one for 'current', one for 'branched', and one for 'rawhide'.
That way we don't have to do any clever multiprocess/round-robin
waiting stuff, and the jobs for each release will be run and
reported as fast as possible (and we can run the compose report
right after the trigger script and have that sent out
efficiently too).
Test Plan:
Try all the possibilities with 'compose' - especially
check that it works with just '-m Rawhide' and '-m Branched',
and that the --wait and --ifnotcurrent args work as intended.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D525
Summary:
We can just parse the release out of the 'build' value, here.
All tests still run properly because we use the * wildcard
in the job groups and so on. Passing a proper 'version' value
prevents T581: when you schedule jobs from an ISO, openQA
will obsolete any running or scheduled jobs with the same
DISTRI, FLAVOR, VERSION and ARCH.
Test Plan:
Schedule jobs for two composes at the same time
(e.g. Branched and Rawhide), see that the second set does
not obsolete the first. Make sure setting VERSION has no
unexpected consequences.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D522
Summary:
This goes along with the openqa_fedora diff to add a no_swap
test.
Test Plan:
If the new test runs (see other diff), check its results
can be submitted to the wiki correctly.
Reviewers: garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D504
Summary:
D500 adds 32-bit test definitions to the test templates, so we
can schedule that arch again.
Test Plan: Goes along with D500, same test plan.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D501
Summary:
This way you can just copy/paste the damn list for report_job_
results instead of reformatting it, if you want to feed it in
manually for any reason.
Test Plan:
Run some commands that schedule jobs and make sure they're
printed as '1 2 3' not '[1, 2, 3]'
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D493
Summary:
Cancelled jobs have their own state (at least in the openQA
running on happyassassin, they do). This causes r_j_r to get
stuck forever if any of the jobs are cancelled. Consider a
cancelled job 'done'.
Test Plan:
Try submitting results for a set of jobs that includes one or
more cancelled jobs. It should now work (before it would get
stuck).
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D492
Summary:
This adds the three new custom storage tests from D490 to the
conf_test_suites table. It also renames the software RAID test,
as I changed the test case names upstream (in the wiki) to be
more consistent.
Test Plan:
After running the new tests, try submitting the results to the
wiki. It should work correctly.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D491
Summary: This broke result reporting if any dupes were present.
Test Plan: Try reporting some results with dupes.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D477
Summary:
We've never quite set things up to run 32-bit tests. We should,
at some point, but for now we're just wasting a bunch of time/
bandwidth downloading them and never testing them.
Test Plan:
Schedule a run and make sure all the same tests are still run,
but half as many images are downloaded...
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D455
Summary:
Since adamw created Python client for OpenQA, we
can use it instead of calling Perl in subprocess. It simplyfies
usage and special code for running in Docker is no longer needed.
This version requires user to create configuration file either
in `/etc/openqa/client.conf` or in `~/.config/openqa/client.conf`
with the same KEY and SECRET as in host machine.
To execute jobs in Docker, just specify correct server and
port (probably `[localhost:8080]`) in configuration file.
Only problem remains with self-signed certificate. It's
necessary to either disable SSL cert verifying, import
self-signed certificate or use HTTP instead of HTTPS in Docker,
see https://github.com/os-autoinst/openQA-python-client/pull/1.
Test Plan: Tested on running tests for compose F22 Final RC1.
Reviewers: adamwill, jskladan
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D425
Summary:
Logging is introduced instead of output to stdout and several
new options are added - user can specify directory for downloading
isos and he can also specify docker container where openqa is
running. Info about newest tested version is not written if no
images were found.
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D420
Summary:
Wrong brace used for a str.format() call - caused a crash when
no universal test image is found for a compose.
Test Plan:
Check that tests still run. Ideally try testing a compose with
no universal image (e.g. a Rawhide nightly with no boot.iso).
Reviewers: garretraziel, kparal
Reviewed By: kparal
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D384
Summary:
Use the new wikitcms / Wikitcms feature that results can be
marked as 'bot' (indicating they're from an automated test
system, not a human). Requires python-wikitcms >= 1.1.4
Test Plan:
Update to wikitcms 1.1.4, report some results; they should
appear with bot=true and show the bot head on the result page.
Reviewers: kparal, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D349
Also refactor the result submission a bit to share the code
between CLI and module modes. This requires python-wikitcms
1.11+.
This should be substantially more efficient - it should make
only two wiki roundtrips per result page, and only init and
login to the wiki once. Going via relval requires a wiki init
and then two roundtrips for *each result*.
https://phab.qadevel.cloud.fedoraproject.org/D316
This handles scheduling of jobs for more than one type of
image; currently we'll run tests for Workstation live as well.
It requires some cleverness to run some tests for *all* images
(currently just default_boot_and_install) but run all the tests
that can be run with any non-live installer image with the best
image available for the compose. We introduce a special (openQA,
not fedfind) 'flavor' called 'universal'; we run a couple of
checks to find the best image in the compose for running the
universal tests, and schedule tests for the 'universal' flavor
with that image. The 'best' image is a server or 'generic' DVD
if possible, and if not, a server or 'generic' boot.iso.
ISO files have the compose's version identifier prepended to
their names. Otherwise they retain their original names, which
should usually be unique within a given compose, except for
boot.iso files, which have their payload and arch added into
their names to ensure they don't overwrite each other.
This also adds a mechanism for TESTCASES (in conf_test_suites)
to define a callback which will be called with the flavor of
the image being tested; the result of the callback will be used
as the 'test name' for relval result reporting purposes. This
allows us to report results against the correct 'test instance'
for the image being tested, for tests like Boot_default_install
which have 'test instances' for each image. We can extend this
general approach in future for other cases where we have
multiple 'test instances' for a single test case.
The patch jskladan applied was an older broken one I sent
accidentally; apologies. This is more or less my intended
version, with some of the cleanups from jskladan preserved
and a couple of his suggestions added (!= instead of not ==,
and a bit of just-in-case exception handling).
The basic approach is that openqa_trigger gets a ValidationEvent from
python-wikitcms - either the Wiki.current_event property for
'current', or the event specified, obtained via the newly-added
Wiki.get_validation_event(), for 'event'. For 'event' it then just
goes ahead and runs the jobs and prints the IDs. For 'current' it
checks the last run compose version for each arch and runs if needed,
as before. The ValidationEvent's 'sortname' property is the value
written out to PERSISTENT to track the 'last run' - this property is
intended to always sort compose events 'correctly', so we should
always run when appropriate even when going from Rawhide to Branched,
Branched to a TC, TC to RC, RC to (next milestone) TC.
On both paths it gets a fedfind.Release object via the ValidationEvent
- ValidationEvents have a ff_release property which is the
fedfind.Release object that matches that event. It then queries
fedfind for image locations using a query that tries to get just *one*
generic-ish network install image for each arch. It passes the
location to download_image(), which is just download_rawhide_iso()
renamed and does the same job, only it can be simpler now.
From there it works pretty much as before, except we use the
ValidationEvent's 'version' property as the BUILD setting for OpenQA,
and report_job_results get_relval_commands() is tweaked slightly to
parse this properly to produce a correct report-auto command.
Probably the most likely bits to break here are the sortname thing
(see wikitcms helpers.py fedora_release_sort(), it's pretty stupid, I
should re-write it) and the image query, which might wind up getting
more than one image depending on how exactly the F22 Alpha composes
look. I'll keep a close eye on that. We can always take the list from
fedfind and further filter it so we have just one image per arch.
Image objects have a .arch attribute so this will be easy to do if
necessary. I *could* give the fedfind query code a 'I'm feeling lucky'-
ish mode to only return one image per (whatever), but not sure if that
would be too specialized, I'll think about it.