handle errors while waiting for compose smartly

Summary:
So the first attempt to use the waiting stuff in production
failed because, at some point, koji_done got a socket.error
from the server. Not sure if that was a Koji outage or some
kind of rate control, but even if it's rate control and we need
to tweak the wait interval, this seems advisable: we shouldn't
die the first time we hit any kind of error while waiting, we
should retry a few times first (with increasingly long delays
between the retry attempts). I know bare except clauses are BAD,
but I think it's OK here as we can't really cover every possible
exception which might get raised in any module during a 'go hit
a server and do a bunch of stuff' operation, and if the error
keeps happening we *are* going to raise it eventually.

Test Plan:
Check I didn't break the 'normal' case, and try causing
an error to appear somehow (e.g. disconnect from the network or
hack up the 'client' instantiation in fedfind to use the wrong URL)
and see if the error handling works as intended.

Reviewers: garretraziel, jskladan

Reviewed By: garretraziel, jskladan

Subscribers: tflink

Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D532
This commit is contained in:
Adam Williamson 2015-08-27 15:51:15 -07:00
parent 75936a2870
commit 0396d70de0

View File

@ -13,6 +13,7 @@ try:
import wikitcms.wiki import wikitcms.wiki
except ImportError: except ImportError:
wikitcms = None wikitcms = None
import fedfind.exceptions
import fedfind.release import fedfind.release
from openqa_client.client import OpenQA_Client from openqa_client.client import OpenQA_Client
@ -247,16 +248,10 @@ def run_compose(args, client, wiki=None):
if args.wait: if args.wait:
logging.info("Waiting up to %s mins for compose", str(args.wait)) logging.info("Waiting up to %s mins for compose", str(args.wait))
waitstart = time.time() try:
while True: ff_release.wait(waittime=args.wait)
if time.time() - waitstart > args.wait * 60: except fedfind.exceptions.WaitError:
sys.exit("Wait timer expired! No jobs run.") sys.exit("Waited too long for compose to appear!")
logging.debug("Checking for compose...")
if ff_release.koji_done and ff_release.pungi_done:
logging.info("Compose complete! Scheduling jobs.")
break
else:
time.sleep(120)
jobs = [] jobs = []
try: try: