It turns out we do need to create the machine-id for the same reason
as on 8. This was being hidden by the bootloader choosing the root
disk label from the host (see the dependent change).
Change I3b518802d681b888916a5cc6a3dcf7e1b537da1e has modified the
testing to use a different root-disk label, which should help catch
this in the fututure.
Depends-On: https://review.opendev.org/c/zuul/nodepool/+/853574
Change-Id: I64de66cac25fd2e051780fb4812e075c647eb76e
--root-label was added with I596104d1a63b5dc6549e8460a1ae3da00165ef04
This sets the ROOT_LABEL environment variable.
Over the years how this deploys has become more complex; now this
value gets written into DIB_BLOCK_DEVICE_PARAMS_YAML default values,
which is then loaded into DIB_ROOT_LABEL.
To override this from the environment you need to specify a full
DIB_BLOCK_DEVICE_CONFIG -- we don't have a way to just merge in the
root label setting.
Using the command-line argument is difficult with tools like nodepool
where the command-line is baked into something else. However we
already have methods for overriding environment variables on dib
calls.
Several of the other variables here accept default values from the
environment, so this is not an outlier. Making ROOT_LABEL also do
this allows us to test with non-default root devices in the gate (see
the linked change).
Change-Id: Ia1ef48c24841a86f387ff9603c64fd23d8670193
Needed-By: https://review.opendev.org/c/zuul/nodepool/+/853574
Without this change, the final unmount will timeout after the
rollbacks are called when the partitioning fails due to a user error.
dmsetup remove is called both for partition and LVM volume devices.
Change-Id: I99679ea00338d4018a95d4da9b21685161cd5049
This commit fixes wrong name of yum.conf files
on CentOS 9 Stream in centos element to correct ones.
Change-Id: I25f0661fa79b7bc8ac1b8e3b2831a413c4161d1d
openEuler 20.03-LTS-SP2 was out of date in May 2022. 22.03 LTS
is the newest LTS version. It was release in March 2022 and
will be maintained for 2 years. This patch upgrades the LTS
version. It'll be used in Devstack, Kolla-ansible and so on
in CI jobs.
This patch also enables the YUM mirror to speed up the package
download.
Change-Id: Iba38570d96374226b924db3aca305f7571643823
Somewhere between the upstream container
rockylinux/rockylinux:8.6.20220515 and the latest release, systemd
started to be pre-installed in the container.
With <= 20220515 installing the kernel-core package would end up
pulling in systemd. As part of the systemd package installation, the
/etc/machine-id file is created and populated.
The kernel package post-install steps install the kernel with
/bin/kernel-install; this is responsible for copying the kernel
binaries into /boot. It does this based on the machine-id, and it
seems its failure case with a blank machine-id is to simply skip
copying the kernels into /boot. To compound this problem, it seems
our bootloader installation doesn't notice that we don't have a kernel
installed, so we end up building an unbootable image.
Testing is/was showing us this; but as rocky is non-voting and this
occured at a random time (rather than in response to a dib change) I
think it slipped by us.
To work around this, create the machine-id early in the container. We
already have paths that remove the machine-id from final images.
Change-Id: I07e8262102d4e76c861667a98ded9fc3f4f4b82d
I think that generally this is a lot of noise in the logs, as the
internals of cache-url is well tested, so we don't need to trace log
by default.
Change-Id: I25b5a1ec0d8f99691b2b4b62b9fdd537e5a773e4
This is a squash of two changes that have unfortunately simultaneously
broken the gate.
The functests are failing with
sha256sum: bionic-server-cloudimg-amd64.squashfs.manifest: No such file or directory
I think what has happened here is that the SHA256 sums file being used
has got a new entry "bionic-server-cloudimg-amd64.squashfs.manifest"
which is showing up in a grep for
"bionic-server-cloudimg-amd64.squashfs". sha256 then tries to also
check this hash, and has started failing.
To avoid this, add an EOL marker to the grep so it only matches the
exact filename.
Change I7fb585bc5ccc52803eea107e76dddf5e9fde8646 updated the
containerfile tests to Jammy and it seems that cgroups v2 prevents
podman running inside docker [1]. While we investigate, move this
testing back to focal.
[1] https://github.com/containers/podman/issues/14884
Change-Id: I1af9f5599168aadc1e7fcdfae281935e6211a597
Gentoo can manage python versions itself. Before this commit users were
forced to set python versions themselves. Now they have the option to
set it if they wish.
The workaround needed for git is also no longer needed, so it's been
removed.
Change-Id: I06b259ef73a40df6b8ab92a5b424bffcf4ef764d
Signed-off-by: Matthew Thode <mthode@mthode.org>
The block device lvm lvs `size` attribute was passed directly to
lvcreate, so using units M, G means base 2. All other block device
size values are parsed with accepted conventions of M, B being base 10
and MiB, GiB being base 2.
lvm lvs `size` attributes are now parsed the same as other size
attributes. This improves consistency and makes it practical to
calculate volume sizes to fill the partition size. This means existing
size values will now create slightly smaller volumes. Previous sizes
can be restored by changing the unit to MiB, GiB, or increasing the
value for a base 10 unit.
The impact on this change should be minimal, the only known uses of lvm
volumes (TripleO, and element block-device-efi-lvm) uses extents
percentage instead of size. The smaller sizes can always be increased
after deployment.
Requested sizes will also be rounded down to align with physical
extents (4MiB). Previously specifying a value which did not align on
4MiB would consume an extra extent which could unexpectedly consume
more than the partition size.
Change-Id: Ia109cc5105071d82cc895d8d9cb85bc47da20a7a
All indication in CI is that Centos Stream9's use of dhclient
appears to point to compatability issues when interacting with
dnsmasq. However, this doesn't appear to be the issue with the
internal dhcp client. As such, lets constraint the RH default
so that it no longer applies to Centos 9-stream.
I've also added a documentation entry for DIB_DHCP_CLIENT which
was previously undocumented.
As an aside, I've already reached out to RH's NetworkManager team
regarding this, but root cause is not entirely understood at this
point.
Change-Id: I235f75b385a8b0348c8fe064038c51409f8722c4
Story: 2010109
Task: 45677
Creating a separate /boot partition is desirable in some cases[1].
This change detects if /boot is a partition, and ensures that the
kernel/ramdisk paths are correct in either case. This is applied to
all BLS entries files, whether they were generated by the previous
grub2-mkconfig call or in the source image.
This means the rhel9 specific workaround can be removed since all
paths are now normalised at this stage.
[1] https://review.opendev.org/c/openstack/tripleo-image-elements/+/846807
Change-Id: I62120ec8c65876e451532d2654d37435eb3606a6
Resolves: rhbz#2101514
Currently if no Dockerfile is specified or found, we exit later with
an obscure error. Check this after the element search; if we still
don't have something to build then we can't continue.
Change-Id: Ifb17a0995fab0ccfe7ee08363676c1fa57e37592
'9-stream' was being matched against the regex '9',
causing builds on RHEL9 to try to install C9S RPMs.
We want this the other way so that DIB_RELEASE=9
will not match the regex '9-stream'.
Resolves: rhbz#2097443
Signed-off-by: Lon Hohberger <lhh@redhat.com>
Change-Id: Iefd7c23512c460e33117d12bbc33606134daa9e2
Add a warning in satellite configuration as when no activation_key
is provided and no environment is configured, subscription-manager
might hang as it's prompting the user to provide the missing
parameter.
Change-Id: I9564841ca845eafc2bd39be6b05bef62e8062f28
Due to the referenced inline issue, 9-stream currently fails running
setfiles in a chroot without /proc. Since we want to actually label
/proc, we don't want it mounted. This pulls in the fixed packages to
get things going until the fix is rolled out.
Change-Id: Id41c16130e975779cb70e2ab19807a689450d026
When building an image, say RHEL9, on a host installed with that
same image, you will be blocked from mounting the filesystems to
extract contents, as the host OS kernel will identify the duplicate
UUIDs and error accordingly.
This was previously fixed for the root filesystem, but not the boot
filesystem.
Change-Id: I63a34fba033ed1c459aeb9c201c8821fa38a36e9
the command had one error in it (missing one backslash)
and was rendered wrong, w/o any backslashes at all.
Change-Id: If187f645b818f47d10b602ccee12c29892a8d88d
After some recent reordering[0], the /boot/grub directory isn't created
early enough on Gentoo any more, let us just ensure ourselves that it is
in place when we create the grub config.
[0] I8cb34914bbbfa05521bbb71cc6637368b980358f
Change-Id: I8a84d08c3090e46b00d1d626fb984f66ea33f256
Previously a module version was splitted from the module name:
nvidia, 510.47.03, 5.4.0-109-generic, x86_64: installed
In Jammy it is now a part of the name:
nvidia/510.47.03, 5.15.0-27-generic, x86_64: installed
Assuming the fact that it would be threatted as a path this change
doesn't brake anything which was working before. But at the same
time it allows to pass last step where dkms is requested to build all
modules.
Change-Id: Ic1bb2b45f9db906b64ca03ae5c4e05b2114f2a74
It may happen a base image has an edited version of cloud-init
"cloud.cfg" that prevents the host keys to be generated.
While it didn't represent an issue with older releases of cloud-init,
starting cloud-init-22 this isn't true anymore.
Before that release, an sshd-keygen@.service was present and called by
sshd-keygen.target (which was called by sshd.service), and we ended up
with ssh host keys in any cases - either generated from cloud-init, or
generated by sshd-keygen.service.
But cloud-init-22 introduced an edition to the sshd-keygen.service,
making it check for the presence of cloud-init service, and preventing
this sshd-keygen to kick in this case.
So we'd better ensure cloud-init is able to generate the keys, else
we'll be in a bad state, since it's instructed to remove the ones
present.
Closes-Bug: #1971751
Change-Id: I37b2f3e9d57a86544ef14e74a4a927309c18bbf0
This adds arm64 ubuntu-minimal Jammy functests and x86 ubuntu image
based Jammy functests. To make this happen we have to install
debootstrap from debian unstable on the functest nodes in order to get
access to a debootstrap that knows what jammy is.
As we ramp up Jammy support in our tools having good testing will be
helpful.
Change-Id: I1d1dc752ce176457d0656cbd50e27a2721ca9856
This makes 03-reset-bls-entries consistent with rhel so that the glob
match is *.conf, and a check is added to ensure that a rename is
actually required.
Change-Id: I4adff43cf7d4f31d939e6ddf37ac8d162ccd0db7
This reverts commit 8401290976.
We are reverting this because some users may want to use predictable
device names and may not even use Debian. However, after some
investigation we have found a couple of bugs in dhcp-all-interfaces on
Debuntu distros. The parent change corrects those bugs. Additionally new
Linux kernels emit "move" events to udev when interfaces are renamed to
their predictable name. Support this "move" in the dhcp-all-interfaces
udev rules. Making these changes appaers to produce functional images
for Debian users using predictable device names. If predictable device
names are not desired turning them off is straightforward and release
notes are updated to give users the info they need to do that outside of
this element.
Change-Id: I125f1a0c78a103b51bda961528c3e66c345bf604
Co-Authored-By: Clark Boylan <clark.boylan@gmail.com>
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
There are two issues with dhcp-all-interfaces on debuntu interfaces
addressed here. First is the path to dhclient lease files is
/var/lib/dhcp not /var/lib/dhclient. Second there is a missing newline
in the ENI interface file which causes a parse error.
Change-Id: Ice83e0d49a4234301dc12daf828ba80fef414cdb
I just saw in the trace output of a failure
> grep -o 'CentOS-.[^>]*GenericCloud-.[^>]*.qcow2'
> sort -r
> head -1
sort: fflush failed: 'standard output': Broken pipe
sort: write error
i.e. the "head -1" has exited after reading one line, but "sort -r"
still wants to write and thus has hit a pipe failure, and because we
run with "-o pipefail" this has halted the script.
This seems like it has been there more or less forever, maybe we just
got lucky hitting it now? Anyway, we can work around this by using a
process substitution and passing the output of this into head, this
way we won't hit a pipe failure.
I also updated the fedora path as it does the same thing.
Change-Id: I44d97e5bb31702aacf396e0229329a2ef9c64f2f
We've not really been using the Focal containerfile, as we move
forward jammy is a better choice for keeping stable as we might find
some new users for it.
Also add binutils to bindep for native bullseye builds (see
Icb0e40827c9f8ac583fa143545e6bed9641bf613)
Change-Id: I22ebe2bbccaec34180e58996b21e47bfc4f36055
03-reset-bls-entries was previously a pre-install script to run after
the machine-id was set, but a new kernel may be installed during the
install phase, which will install another bls entry file with a
filename which differs from the machine-id.
This means this package installed bls file won't be updated when
grub2-mkconfig is called, resulting in incorrect kernel args and boot
device in the entry file that will get booted by default.
By fixing the filenames after the new kernel is installed,
grub2-mkconfig will update the bls file that actually gets used on
boot.
Change-Id: I653bef9638e38ded68458fd40d90e30e5206caad
According to the systemd documentation[1], if /etc/machine-id is empty
it will be populated with a unique value, but not in a way which
triggers an actual first boot event (running units with
ConditionFirstBoot=yes set)
This change writes "uninitialized" to /etc/machine-id to ensure that
systemd-firstboot.service actually runs, and other units can use
first-boot-complete.target as a dependency to trigger on first boot.
Since /var/lib/dbus/machine-id is sometimes a symlink to
/etc/machine-id, it is truncated before writing to /etc/machine-id.
On older versions of systemd before first boot semantics were
formalised, any non-uuid value will trigger a new machine-id to be
generated, so "uninitialized" also works.
[1] https://www.freedesktop.org/software/systemd/man/machine-id.html#First%20Boot%20Semantics
Change-Id: I77c35e51a3da2e8a6b5a2c80d033a159b303c9af
This started a long way from here, when I noticed that "top" on centos
9-stream images wasn't working because ncurses-base wasn't installed.
This led me to the extant install of bash/glibc/ncurses-libs from
Iecf7f7e4c992bb23437b6461cdd04cdca96aafa6. However it didn't really
explain why these are brought in here.
Reading further it became clearer that over the years of distribution
additions, Fedora updates, etc. this has grown into a bit of a mess.
Refactor the release package installs into a more logical flow,
pulling out checks/comments for Fedora's of ancient history, etc.
Remove the 9-stream package installs; this isn't the place for them,
and the should be brought in by the base packages.
Ultimately, this is intendend to a be a no-op refactor.
Change-Id: Ie7d9a6497d0d20a3303ec0da3d0668c74efa2c3d
The recent git ownership-checking changes (see related bug for full
details) mean we can not run git in non-owned directories.
We have a couple of cases here where we have done a "pushd" to work in
the REPO_DEST context; this is the destination directory that is
inside the chroot so needs to be operated on as "root" (via sudo
calls). This certainly makes sense -- but given the new way of things
it can hide what context each call is working in, which is now very
important. Previously this worked because you could read it; now it's
doing the UID check too, calls in here without sudo now fail.
Remvoe the pushd's and make every call that works in REPO_DEST
explicit with -C, and add sudo calls around it.
Change-Id: Id1f6bd94c9c77ef6ab2b562a7e0bc48f749c58ac
Related-Bug: https://bugs.launchpad.net/devstack/+bug/1968798
The bootloader element installs the grub bootloader for whole-disk
images, but it also correctly sets values in /etc/default/grub and BLS
entries.
This value setting is useful even if the bootloader isn't installed.
For example, the overcloud-full partition image benefits from a
correct /etc/default/grub and BLS entries which ironic-python-agent
will use when it installs grub on the disk during baremetal deploy.
This change moves the actual grub install to the end of the script,
and if there is no $DIB_BLOCK_DEVICE set then install is skipped.
This allows overcloud-full to use the bootloader element instead of
the grub2 element, so the correct grub defaults are set on centos9,
including the correct root device on centos9.
Change-Id: I8cb34914bbbfa05521bbb71cc6637368b980358f
This reverts the mirror removal in
I817b412b7f06523df635e8b16111bc1081b40f66 and updates the test to F35,
which is mirrored.
Change-Id: I00d24690f57bedd3fc5ebbc18de0ed874ad1e4ef
In some build environments Docker is already installed - and adding
podman is not an option. Add a new variable to toggle this, and
rename the now incorrectly titled DIB_CONTAINERFILE_PODMAN_ROOT to
just ...RUNTIME_ROOT to match.
Change-Id: I677e4f491b40360dceabdf4f2a9e64c7cb493dc7
This adds a check for the root device having filesystem type btrfs,
and when it is assume there is a subvolume called "root". This fixes
extract-image when using Fedora-Cloud-Base btrfs images.
This should be sufficient until there is another btrfs base image with
a different subvolume layout.
Change-Id: Ib18979090585ba92566e523951b521b9d902fcb7
This change is proposed again, avoiding lsblk features missing from
older distros:
- lsblk is avoided entirely for a whole-disk image with a single
partition, which would be the majority of old image building jobs
- Field PARTTYPENAME not available on the lsblk in CentOS-8, instead
rely on the GUID being correct for EFI partitions
- Argument --output-all not available on the lsblk in CentOS-7, this
is just for logging debug, so can be removed
This reverts commit b06bac734c.
Change-Id: Ib0d4e7751fd968511fc7f672d524e58d1488ae11
This reverts commit 0630b3cb69.
Reason for revert: breaks compatibility with CentOS Stream 8, lsblk does not have PARTTYPENAME until version 2.35 and CS8 has version 2.32.1 installed
Change-Id: I7fc0e76f0eeb8594d8a0d57629b2c67526b961ad
Rocky Linux is very similar to CentOS 8. CentOS 8 required and forced
NetworkManager with glean so we update dib to do the same for Rocky.
Change-Id: I145e57d61059c2f34dc2d4810e83809b71c6aade
RHEL-9 base images are whole-disk images with the /boot/efi partition
correctly set up for EFI Secure Boot. This doesn't work with
extract-image because it only mounts the root partition, leaving
/boot/efi empty even though grub2-efi & shim packages are "installed".
This change mounts discovered partitions to mnt/boot, mnt/boot/efi so
all content can be extracted from the image.
Partition detection is done by reading block device attributes and
matching on Boot Loader Specification[1] UIDs or labels as observed in
supported base images.
[1] https://systemd.io/BOOT_LOADER_SPECIFICATION/
Change-Id: I8487002a18ae6ca98609ab68d92ae9173a2b864f
Similar to the CentOS-9-Stream fix [1] this change renames the default
BLS entry to match the current machine-id so that grub2-mkconfig calls
will refresh the kernel options.
However there is an additional issue with the rhel-9 base image. It is
unique in having a dedicated boot partition, so the path to the kernel
and initramfs don't include /boot. This results in an unbootable image
when /boot is a directory of the root partition.
These paths do not get corrected by calling grub2-mkconfig, so this
change performs a sed on the paths to fix them for a root partition
/boot.
[1] I327f5e7a95e47905c01138c8c4483f3f03e8efff
Change-Id: I37a1d310e1854f4a49725e355d484e456ea4fc7a
The check removed here came in with
I4481b43e4a8fe4144be9c7eb9d9c618bbb2df21e a long time ago. At that
time we were not building EFI images, and were building i386 images;
both of which are now untrue.
We can simplify this now by merging it into the gpt/mbr path. If we
are in there we know that we should set --target=i386-pc for BIOS
boot. For sanity check that we are x86 in this path -- PPC is handled
separately (although it's probably bit-rotted) and ARM64 is EFI.
Change-Id: Ie9839c9adc642b0dd688bced3faa46e9314e9799
Co-Authored-By: Clark Boylan <clark.boylan@gmail.com>
OpenDev relies on the epel role to configure the epel repository for our
image builds. Specifically we need epel to pull in haveged. Update the
epel role to recognize rocky and configure it properly.
Change-Id: I968d4702ef39590e972b782a09e18a5db40703ad
This fixes a regression introduced by
Ia99687815667c3cf5e82cf21d841d3b1008b8fa9
It turns out that [[ -d /usr/lib/grub/*-efi ]] is not a good check,
because [[ doesn't split that and try to glob match ( [ would ). This
has resulted in us triggering this path on ARM64.
This is an x86-64 only check, because on other platforms we either
don't support EFI or are EFI only. Restrict this check to get arm64
working again.
Change-Id: I6a75f8504826bcb0ac122d53dfb9faff975077f4
Gentoo updated the layout and files for vaidating stages
At least we can validate cryptographically and infer valid checksum now.
https://www.gentoo.org/news/2022/02/17/changed-signatures.html
Change-Id: I708b44419ae53dec2c19a2210ef427dcd2eb6002
Signed-off-by: Matthew Thode <mthode@mthode.org>
The fedora and fedora-minimal elements currently test Fedora <=34. The
opendev ci mirrors no longer mirror Fedora 34 which means we cannot rely
on those mirrors for these tests. Remove the mirror configuration when
testing these older Fedora builds.
Change-Id: I817b412b7f06523df635e8b16111bc1081b40f66