Summary:
Have the backing functions always raise TriggerExceptions if
no jobs could be scheduled, and have the CLI command functions
just quit (with exit code 1) immediately when this happens.
I don't see any value to logging the errors and continuing to
run, nothing useful is going to happen with no jobs.
This allows us to have the 'current' cronjob run the compose
report: we just have it do:
openqa_trigger.py current && check-compose
then it won't generate a compose report every time it runs, but
only when the current compose changed and some jobs ran.
Test Plan:
Well, just make sure things quit or run as intended,
I guess.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D541
Summary:
So the first attempt to use the waiting stuff in production
failed because, at some point, koji_done got a socket.error
from the server. Not sure if that was a Koji outage or some
kind of rate control, but even if it's rate control and we need
to tweak the wait interval, this seems advisable: we shouldn't
die the first time we hit any kind of error while waiting, we
should retry a few times first (with increasingly long delays
between the retry attempts). I know bare except clauses are BAD,
but I think it's OK here as we can't really cover every possible
exception which might get raised in any module during a 'go hit
a server and do a bunch of stuff' operation, and if the error
keeps happening we *are* going to raise it eventually.
Test Plan:
Check I didn't break the 'normal' case, and try causing
an error to appear somehow (e.g. disconnect from the network or
hack up the 'client' instantiation in fedfind to use the wrong URL)
and see if the error handling works as intended.
Reviewers: garretraziel, jskladan
Reviewed By: garretraziel, jskladan
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D532
Summary:
This is an alternative to D516. It drops the 'all' subcommand
entirely and adds some capabilities to 'compose' to make it
suitable for scripting. --ifnotcurrent will check if the compose
is the same as the current validation event, and bail out if it
is. --wait waits for the compose to be available. I've also
enhanced fedfind (in 1.4.1) to allow passing just 'Branched'
as the milestone; it will guess the release and date, now (it
didn't before).
With all of that taken together, we could have three cron jobs,
one for 'current', one for 'branched', and one for 'rawhide'.
That way we don't have to do any clever multiprocess/round-robin
waiting stuff, and the jobs for each release will be run and
reported as fast as possible (and we can run the compose report
right after the trigger script and have that sent out
efficiently too).
Test Plan:
Try all the possibilities with 'compose' - especially
check that it works with just '-m Rawhide' and '-m Branched',
and that the --wait and --ifnotcurrent args work as intended.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D525
Summary:
We can just parse the release out of the 'build' value, here.
All tests still run properly because we use the * wildcard
in the job groups and so on. Passing a proper 'version' value
prevents T581: when you schedule jobs from an ISO, openQA
will obsolete any running or scheduled jobs with the same
DISTRI, FLAVOR, VERSION and ARCH.
Test Plan:
Schedule jobs for two composes at the same time
(e.g. Branched and Rawhide), see that the second set does
not obsolete the first. Make sure setting VERSION has no
unexpected consequences.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D522
Summary:
This goes along with the openqa_fedora diff to add a no_swap
test.
Test Plan:
If the new test runs (see other diff), check its results
can be submitted to the wiki correctly.
Reviewers: garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D504
Summary:
D500 adds 32-bit test definitions to the test templates, so we
can schedule that arch again.
Test Plan: Goes along with D500, same test plan.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D501
Summary:
This way you can just copy/paste the damn list for report_job_
results instead of reformatting it, if you want to feed it in
manually for any reason.
Test Plan:
Run some commands that schedule jobs and make sure they're
printed as '1 2 3' not '[1, 2, 3]'
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D493
Summary:
Cancelled jobs have their own state (at least in the openQA
running on happyassassin, they do). This causes r_j_r to get
stuck forever if any of the jobs are cancelled. Consider a
cancelled job 'done'.
Test Plan:
Try submitting results for a set of jobs that includes one or
more cancelled jobs. It should now work (before it would get
stuck).
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D492
Summary:
This adds the three new custom storage tests from D490 to the
conf_test_suites table. It also renames the software RAID test,
as I changed the test case names upstream (in the wiki) to be
more consistent.
Test Plan:
After running the new tests, try submitting the results to the
wiki. It should work correctly.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D491
Summary: This broke result reporting if any dupes were present.
Test Plan: Try reporting some results with dupes.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D477
Summary:
We've never quite set things up to run 32-bit tests. We should,
at some point, but for now we're just wasting a bunch of time/
bandwidth downloading them and never testing them.
Test Plan:
Schedule a run and make sure all the same tests are still run,
but half as many images are downloaded...
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D455
Summary:
Logging is introduced instead of output to stdout and several
new options are added - user can specify directory for downloading
isos and he can also specify docker container where openqa is
running. Info about newest tested version is not written if no
images were found.
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D420
Summary:
Wrong brace used for a str.format() call - caused a crash when
no universal test image is found for a compose.
Test Plan:
Check that tests still run. Ideally try testing a compose with
no universal image (e.g. a Rawhide nightly with no boot.iso).
Reviewers: garretraziel, kparal
Reviewed By: kparal
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D384
Summary:
Use the new wikitcms / Wikitcms feature that results can be
marked as 'bot' (indicating they're from an automated test
system, not a human). Requires python-wikitcms >= 1.1.4
Test Plan:
Update to wikitcms 1.1.4, report some results; they should
appear with bot=true and show the bot head on the result page.
Reviewers: kparal, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D349
Also refactor the result submission a bit to share the code
between CLI and module modes. This requires python-wikitcms
1.11+.
This should be substantially more efficient - it should make
only two wiki roundtrips per result page, and only init and
login to the wiki once. Going via relval requires a wiki init
and then two roundtrips for *each result*.
https://phab.qadevel.cloud.fedoraproject.org/D316
This handles scheduling of jobs for more than one type of
image; currently we'll run tests for Workstation live as well.
It requires some cleverness to run some tests for *all* images
(currently just default_boot_and_install) but run all the tests
that can be run with any non-live installer image with the best
image available for the compose. We introduce a special (openQA,
not fedfind) 'flavor' called 'universal'; we run a couple of
checks to find the best image in the compose for running the
universal tests, and schedule tests for the 'universal' flavor
with that image. The 'best' image is a server or 'generic' DVD
if possible, and if not, a server or 'generic' boot.iso.
ISO files have the compose's version identifier prepended to
their names. Otherwise they retain their original names, which
should usually be unique within a given compose, except for
boot.iso files, which have their payload and arch added into
their names to ensure they don't overwrite each other.
This also adds a mechanism for TESTCASES (in conf_test_suites)
to define a callback which will be called with the flavor of
the image being tested; the result of the callback will be used
as the 'test name' for relval result reporting purposes. This
allows us to report results against the correct 'test instance'
for the image being tested, for tests like Boot_default_install
which have 'test instances' for each image. We can extend this
general approach in future for other cases where we have
multiple 'test instances' for a single test case.
The patch jskladan applied was an older broken one I sent
accidentally; apologies. This is more or less my intended
version, with some of the cleanups from jskladan preserved
and a couple of his suggestions added (!= instead of not ==,
and a bit of just-in-case exception handling).
The basic approach is that openqa_trigger gets a ValidationEvent from
python-wikitcms - either the Wiki.current_event property for
'current', or the event specified, obtained via the newly-added
Wiki.get_validation_event(), for 'event'. For 'event' it then just
goes ahead and runs the jobs and prints the IDs. For 'current' it
checks the last run compose version for each arch and runs if needed,
as before. The ValidationEvent's 'sortname' property is the value
written out to PERSISTENT to track the 'last run' - this property is
intended to always sort compose events 'correctly', so we should
always run when appropriate even when going from Rawhide to Branched,
Branched to a TC, TC to RC, RC to (next milestone) TC.
On both paths it gets a fedfind.Release object via the ValidationEvent
- ValidationEvents have a ff_release property which is the
fedfind.Release object that matches that event. It then queries
fedfind for image locations using a query that tries to get just *one*
generic-ish network install image for each arch. It passes the
location to download_image(), which is just download_rawhide_iso()
renamed and does the same job, only it can be simpler now.
From there it works pretty much as before, except we use the
ValidationEvent's 'version' property as the BUILD setting for OpenQA,
and report_job_results get_relval_commands() is tweaked slightly to
parse this properly to produce a correct report-auto command.
Probably the most likely bits to break here are the sortname thing
(see wikitcms helpers.py fedora_release_sort(), it's pretty stupid, I
should re-write it) and the image query, which might wind up getting
more than one image depending on how exactly the F22 Alpha composes
look. I'll keep a close eye on that. We can always take the list from
fedfind and further filter it so we have just one image per arch.
Image objects have a .arch attribute so this will be easy to do if
necessary. I *could* give the fedfind query code a 'I'm feeling lucky'-
ish mode to only return one image per (whatever), but not sure if that
would be too specialized, I'll think about it.