Per the bug mentioned upstream, grub2-mkconfig will currently not set
the kernel options for BLS entries prefixed with a machine-id
different to the running system.
This affects the centos element, as the upstream .qcow2 comes with a
pre-existing BLS entry but a blank machine-id. This only affects
9-stream -- prior releases either don't use BLS or have entries
configured to use a common variable from grubenv which is updated
correctly.
We currently can not end-to-end test this in OpenDev because we run
our functional tests on Ubuntu Focal (they use devstack), whose kernel
can not read the XFS format on the 9-stream .qcow2. This expands the
functional tests (that run on Debian Buster, with a later kernel) to
add the vm element, so the bootloader path is exercised (this requires
a block-device too). This at least runs the bootloader install, we
can confirm the kernel options look right from the dumping provided
the logs.
Change-Id: I327f5e7a95e47905c01138c8c4483f3f03e8efff
This reverts I2701260d54cf6bc79f1ac765b512d99d799e8c43,
Idf2a471453c5490d927979fb97aa916418172153 and part of
Iecf7f7e4c992bb23437b6461cdd04cdca96aafa6 which added special flags to
update kernels via grubby.
These changes actually ended up reverting the behaviour on Fedora 35,
which is what led me to investigate what was going on more fully.
All distros still support setting GRUB_DEVICE in /etc/default/grub;
even the BLS based ones (i.e. everything !centos7).
The implementation *is* confusing -- in earlier distros each BLS entry
would refer to the variable $kernelopts; which grub2-mkconfig would
write into /boot/grub2/grubenv. After commit [1] this was reverted,
and the kernel options are directly written into the BLS entry.
But the real problem is this bit from [2]
get_sorted_bls()
{
if ! [ -d "${blsdir}" ] || ! [ -e /etc/machine-id ]; then
return
fi
...
files=($(for bls in ${blsdir}/${machine_id}-*.conf; do
...
}
i.e., to avoid overwriting BLS entries for other OS-boots (?),
grub2-mkconfig will only update those BLS entries that match the
current machine-id.
The problem for DIB is that we are clearing the machine-id early in
finalise.d/01-clear-machine-id, but then running the bootloader update
later in finalise.d/50-bootloader.
The result is that the bootloader entry generated when we installed
the kernel (which guessed at the root= device, etc.) is *not* updated.
Even more annoyingly, the gate doesn't pick this up -- because the
gate tests run on a DIB image that was booted with
"root=LABEL=cloudimg-rootfs" the kernel initially installed with
"install-kernel" (that we never updated) is actually correct. But
this fails when built on a production host.
Thus we don't need any of the explicit grubby updates; these are
reverted here. This moves the machine-id clearing to after the
bootloader setup, which allows grub2-mkconfig to setup the BLS entries
correctly.
[1] 4a742183a3
[2] https://src.fedoraproject.org/rpms/grub2/blob/rawhide/f/0062-Add-BLS-support-to-grub-mkconfig.patch
Depends-On: https://review.opendev.org/c/zuul/nodepool/+/818705
Change-Id: Ia0e49980eb50eae29a5377d24ef0b31e4d78d346
Patch allow to set path for local image source,
instead download latest or use the cached image.
This permit to build image also in environment without internet access.
re-propose of patch: https://review.opendev.org/c/openstack/diskimage-builder/+/809009
Change-Id: I54395b09af339caee040326b809e8fbf8b0e7d6a
A recent(-ish) change in git [1] has exposed a bug in caching that
appears in one very specific circumstance -- updating the
openstack/openstack super-repo [2].
This repo gets a submodule update every time something is pushed. By
using "--git-dir" while the cwd is one-level above the actual repo we
are confusing [1] which is not finding the submodule directories
correctly and giving us an error:
Could not access submodule 'foo'
for every submodule that has updated between now and the last time we
updated the cache. [3]
The git manual does warn about this
If you just want to run git as if it was started in <path> then use
git -C <path>.
Indeed, that is what we want to do in this path. Modify the calls to
use -C.
[1] 505a276596
[2] https://opendev.org/openstack/openstack/
[3] The result for opendev production is that image builds fail every
time an openstack/* project is checked in; we then race to retry
the build before another commit lands and updates the submodules
again.
Change-Id: Iadb23454e29d8869e11407e1592007b0f0963e17
Refactor things to use explicit names, and put in a trap to cleanup
after any errors.
Currently, if the build/run/export steps fail, it leaves behind images
which eventually clog things to the point podman won't run any more
(see also https://github.com/containers/podman/pull/12233 about errors
seen due to this)
Change-Id: Ib328a07ad67e3f71f379fbf34ae7ef74e212ef1c
Ic68e8c5b839cbc2852326747c68ef89f630f26a3 removed the sudo from the
tar extraction here, meaning that production is failing to create the
chroot. This is hidden in testing because
DIB_CONTAINERFILE_PODMAN_ROOT is set. Make the sudo here
unconditional.
Change-Id: I6e36e3fc65981f85fad12ea2cd10780fde9c37da
CentOS Stream 9 is close to be released, and official mirrors are
already poplated. This patch is adding support to centos-minimal in CS9.
Also enable centos-minimal/[8,9]-stream-build-succeeds tests.
This patch is being tested together with [1] to apply following list of elements:
vm centos-minimal simple-init growroot nodepool-base openstack-repos infra-package-needs
[1] https://review.opendev.org/c/openstack/project-config/+/811442
Change-Id: Iecf7f7e4c992bb23437b6461cdd04cdca96aafa6
The if/elif block added in [0] doesn't work for gentoo, let's hope
that we can get along with an easy fix.
[0] https://review.opendev.org/c/openstack/diskimage-builder/+/804000
Signed-off-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: I543e04d2d7efea3e718bae31aa1cc4767bd359f8
This more closely matches the nodepool-builder container, which is
Bullseye based.
Refactor to remove unnecessary abstract job.
Change-Id: I34822608f19e1ce9ef781034ff831d6359ed8e15
This adds 9-stream support to the centos element.
See https://review.opendev.org/q/topic:cs9 for related patches.
Change-Id: Ib80fbd21edb77c25764eff2c0d66e55bde7a90af
We need to update the base reference platform we perform the
functional tests on. Debian bullseye seems like the best choice -- it
is recent enough to last for a while, and will match the
nodepool-builder container environment.
Depends-On: https://review.opendev.org/c/zuul/zuul-jobs/+/814088
Change-Id: Ic68e8c5b839cbc2852326747c68ef89f630f26a3
At one point we had an array of functest jobs; we were testing
building on trusty, bionic, maybe centos, with python2 and python3.
The only thing left of all these combos these days is "bionic-python3"
(that's where this naming comes from).
We can remove the levels of abstraction now and just have the one job
to avoid confusion.
Change-Id: Id37d62a17f9b7f6dc6dc35585c29eddd435ce913
The only job left in the "extras" functional test is for gentoo. We
might as well just drop this and put the gentoo devstack test in the
gate -- in terms of resources it's a 1:1 swap now.
Cleanup the ordering of the check queue list
Change-Id: I2af8f9235131cd0cce33e67c8d0d05c3b357320d
We test the minimal builds in the devstack based end-to-end tests.
There is no need to also run these functional tests; they test *less*
than the full build/upload/boot testing done in the devstack test.
We consolidate the separate "image" based job
Change-Id: I73ca83bedb4b0c40af5209f9c93a0e657c152591
This element has been deprecated since Sep 2019. At this point we
don't need to keep gate testing it.
Change-Id: I2851b9be93effac30ca7c97ef8fd17e8641eeffa
This builds ubuntu-minimal to test apt-sources, an element that
appears unused. We don't need this as a gate test.
Change-Id: I5310c0a38a19a7c00764c1057b0a26aafdbacccc
I'm not aware this element is used/was ever used. It hasn't ever been
updated to Focal. To reduce our testing footprint remove this test,
and note in the element its probably broken.
Change-Id: I17cd3b13948287fe78990cfbe16a22919a329ba9
This reverts commit 1f4fb1d7a5.
This unfortunately wasn't actually tested. Because the image-based
tests run sequentially, a prior failure in the centos-8 job meant the
ubuntu job never ran.
This is failing with
10-cache-ubuntu-tarball: line 28: DIB_LOCAL_IMAGE: unbound variable
There is also a seemingly unused variable DIB_IMAGE_LOCAL_FILE; I'm
not sure what this is doing.
For now revert, and it can be re-proposed with appropriate testing.
Change-Id: I0f3897c90dc863ee04c3295b9cb094f02d8658e3
It looks like upstream have changed this line to "download.example",
breaking our subsitution. Let's do a generic match.
Change-Id: I8e443022a5f239b98ccefe73a9abf8cf259dc8e9
At this point py35 practically means Xenial, and we are not interested
in having that as a build platform. Drop the py35 jobs.
Update python setup.cfg metadata; we are building on 3.9 so add that.
Change-Id: I981f0f67a6fd809af1ab70934358dc3404890f35