When your booting a Linux system using dracut, i.e. with any
redhat style distribution, dracut's internal code looks to validate
the kernel hmac signature in before proceeding to userspace.
It does this by looking at the /boot/ folder file for the kernel
hmac file.
And it normally does this with the root filesystem. Except if the
kernel is not on the root filesystem and is instead on a /boot
filesystem, this breaks horribly. This is compounded because
DIB enables the operator to restructure the OS image/layout
to fit their needs. In order for this to be navigated, as dracut
is written, we need to pass a "boot=" argument to the kernel.
So now we attempt to purge any prior boot entry in the disk image
content, which is good because any filesystem operations invalidate
it, and then we attempt to identify the boot filesystem, and save a
boot kernel command line parameter so the resulting image can
boot properly if FIPS was enabled in the prior image.
Regex developed with https://sed.js.org utilizing stdin:
VAR="quiet boot=UUID=173c759f-1302-48a3-9d51-a17784c21e03 text"
VAR="quiet boot=PARTUUID=173c759f-1302-48a3-9d51-a17784c21e03"
VAR="quiet boot=PARTUUID=173c759f-1302-48a3-9d51-a17784c21e03 reboot=meow"
VAR="quiet boot=UUID=/dev/sda1 text"
VAR="quiet boot=/dev/sda1"
VAR="quiet boot=/dev/sda1 reboot=meow"
VAR="quiet after_boot=1 reboot=meow boot=/dev/sda1"
VAR="quiet after_boot=1 reboot=meow"
Which resulted in stdout:
VAR="quiet text"
VAR="quiet"
VAR="quiet reboot=meow"
VAR="quiet text"
VAR="quiet"
VAR="quiet reboot=meow"
VAR="quiet after_boot=1 reboot=meow"
VAR="quiet after_boot=1 reboot=meow"
Change-Id: I9034c21e84deda2ba2c0ec0d1d6d6595ed10bed4
By default [1] the `aarch64` ARCH value is converted to `arm64`. But
Fedora uses `aarch64` to refer to the architecture.
Convert incoming ARCH values of `arm64` into `aarch64` as is already
done for `amd64` -> `x86_64`
[1] 174089a6a5/diskimage_builder/lib/common-defaults (L29-L30)
Change-Id: I6d9698e45b1183007bac49544da196ec78a7ac6a
The RedHat platforms vary if they come pre-installed with curl or
curl-minimal. For example, Fedora 37 container images have curl, and
centos 9-stream and Rocky images have curl-minimal. If you try and
install curl when curl-minimal is installed, you get an error, and
vice-versa.
Unfortunately package-installs can't really sort this out; we're just
passing a package list to the system package manager. We don't have a
way to say "the curl OR curl-minimal package is fine". As this breaks
builds and is such a common dependency that it's already there, let's
just add a note that curl is required and blank out the package-map.
Change-Id: I9ccebe2dbf3a8682dab60c2070c5f78849e01446
skipsdist now basically means don't install the project at all
(regardless of the usedevelop setting) which creates problems for dib's
entrypoints. Remove skipsdist so that entrypoints can be found. Also, we
remove basepython because this confuses tox v4 on whether or not the
python it wants is present.
Change-Id: I16388a8ad50483228d0b71745f11563f891249c0
The previous commit was tested on 2TB without issue, but testing on a
very small volume (80GB) resulted in the thin pool lvextend failing
for being one extent too large.
This change reduces the pool size by one extent.
Change-Id: I7ca002783f8f15946bc84af95eecaa097e70aaf1
Related: rhbz#2149586
An LVM thin pool has an associated metadata volume, and it can be
assumed that the size of this volume on the image is minimized for
distribution.
This change grows the metadata volume by 1GiB, which is recommended[1] as
a reasonable default. This fixes a specific issue with the metadata
volume being exausted when growing into a 2TB drive.
Other minor changes include:
- Human readable printed values have switched to GiB, MiB, KiB, B
- Growth percentage volumes are adjusted down to not over-provision
the thin volume
[1] https://access.redhat.com/solutions/6318131
Change-Id: I1dd6dd932bb5f5d9adac9b78a026569165bd4ea9
Resolves: rhbz#2149586
If client have not internet or have some limitation, such as firewall/proxy/etc. this step will stop build image with error. Client must have possible override of URL for pass this step.
Change-Id: Iafe3283665a437d0a9cf83a93ff66c0613310b69
These must have broken when we switched the base nodes to Jammy.
Update to use compatible versions of distros.
We need to squish another gate-breaking change in here to update the
containerfile "podman build" calls to use "--network host". We added
this with Ia885237406bf4c7b9d49b349f374558ae746401f and the only
external user I can find is kayobe, which is setting this anyway.
I honestly haven't 100% root-caused what changed to require this; the
last time our containerfile jobs ran and worked has unfortunately been
purged so I can't compare versions to try and pinpoint something;
i.e. this may be a podman bug or feature. At first I thought it
related to the networking plugin package from the Depends-On (which is
still useful for the right packages) but that didn't help get the
bridge networking working.
Depends-On: https://review.opendev.org/c/zuul/nodepool/+/867590
Change-Id: I23f091654cb212e8bdd908664b262de9bfe98cef
The problem lays with the 'extract-image' script as
it is using lsblk commands to extract image's partition
(find out root/efi/boot, lines:100-102) but the output
is empty inside a container.
lsblk gives empty output for FSTYPE, LABEL, GUID..
the fix is to use blkid.
Closes-Bug: 1974350
Change-Id: I3b460c6dd9caa519c55327c5bd4b7e4585a8bd22
Added growpart element. It allows for growing specific partitions
during the deployment, which will result in less post deploy actions
needed for the server to be ready for use.
Change-Id: I6519fba3e8f1d078b99d3c03f2ac85f7b6e37d8a
A recent change that didn't fail with hard-tabs made me realise we're
not running tox -e pep8 ... which means we're not running dib-lint
which should find this (and other things).
I couldn't pinpoint when this happened; maybe job config was never in
this repo.
Anyway, move the pylint and dib-lint/flake8 testing to the now
standard "linters" and update the linting job to
openstack-tox-linters.
It looks like pylint is very lightly used (came in with
I7e24d8348db3aef79e1395d12692199a1f80161a and we've never expanded any
testing). Leave this alone for now, but probably it is not important
any more.
This revealed some issues; updated flake8
(Iaa19c36f8cab8482a01f764c588375db8e7d8be3) found some spacing issues
with keywords and an update to elrepo to match our standard bash
flags.
Change-Id: I45bf108c467f7c8190ca252e6c48450c2622aaf8
Starting with Fedora 36 the NetworkManager package no longer includes
ifcfg support by default. You need an additional package
"NetworkManager-initscripts-ifcfg-rh" to pull in the compatibility
plugin. Glean's support for Fedora relies on this compatibility system
so we install this package via the simple-init element package deps.
Change-Id: I76ac39b8dedcb1c5bc4595aedc0a732c99c8721e
This change extends the block device lvs attributes to allow creating
a volume which represents a thin pool, and to create volumes which are
allocated from this pool.
Change-Id: Ic58f55c36236cc8c6279fbcb708e27dc2982f2d5
This change enhances the growvols script to support all volumes being
backed by one thin provisioning pool.
If a pool is detected, the following occurs:
- validation to confirm every volume is backed by the pool
- only the pool is extended into the new partition
- volumes are extended by the same amount as the non thin-provisioned
case
This results in no volumes being over-provisioned, so
out-of-space behaviour will be the same as the non thin-provisioned
case.
This change also switches to using /dev/mapper device mapper paths for
volume block devices, since that is the only path the thin pool is
mapped to.
Change-Id: I96085fc889e72c942cfef7e3acb6f6cd73f606dd
It turns out we do need to create the machine-id for the same reason
as on 8. This was being hidden by the bootloader choosing the root
disk label from the host (see the dependent change).
Change I3b518802d681b888916a5cc6a3dcf7e1b537da1e has modified the
testing to use a different root-disk label, which should help catch
this in the fututure.
Depends-On: https://review.opendev.org/c/zuul/nodepool/+/853574
Change-Id: I64de66cac25fd2e051780fb4812e075c647eb76e
--root-label was added with I596104d1a63b5dc6549e8460a1ae3da00165ef04
This sets the ROOT_LABEL environment variable.
Over the years how this deploys has become more complex; now this
value gets written into DIB_BLOCK_DEVICE_PARAMS_YAML default values,
which is then loaded into DIB_ROOT_LABEL.
To override this from the environment you need to specify a full
DIB_BLOCK_DEVICE_CONFIG -- we don't have a way to just merge in the
root label setting.
Using the command-line argument is difficult with tools like nodepool
where the command-line is baked into something else. However we
already have methods for overriding environment variables on dib
calls.
Several of the other variables here accept default values from the
environment, so this is not an outlier. Making ROOT_LABEL also do
this allows us to test with non-default root devices in the gate (see
the linked change).
Change-Id: Ia1ef48c24841a86f387ff9603c64fd23d8670193
Needed-By: https://review.opendev.org/c/zuul/nodepool/+/853574
Without this change, the final unmount will timeout after the
rollbacks are called when the partitioning fails due to a user error.
dmsetup remove is called both for partition and LVM volume devices.
Change-Id: I99679ea00338d4018a95d4da9b21685161cd5049
This commit fixes wrong name of yum.conf files
on CentOS 9 Stream in centos element to correct ones.
Change-Id: I25f0661fa79b7bc8ac1b8e3b2831a413c4161d1d
The opensuse qcow2 image seems to regularly have a mismatch with its
sha256sum file. Possibly because these are being served by different
mirrors and are out of sync with each other? I have been able to
reproduce this locally downloading each file and comparing the resutling
hash.
Change-Id: Ic849f5b2afa488d9518065084112f8fc6e3083b2
openEuler 20.03-LTS-SP2 was out of date in May 2022. 22.03 LTS
is the newest LTS version. It was release in March 2022 and
will be maintained for 2 years. This patch upgrades the LTS
version. It'll be used in Devstack, Kolla-ansible and so on
in CI jobs.
This patch also enables the YUM mirror to speed up the package
download.
Change-Id: Iba38570d96374226b924db3aca305f7571643823
Somewhere between the upstream container
rockylinux/rockylinux:8.6.20220515 and the latest release, systemd
started to be pre-installed in the container.
With <= 20220515 installing the kernel-core package would end up
pulling in systemd. As part of the systemd package installation, the
/etc/machine-id file is created and populated.
The kernel package post-install steps install the kernel with
/bin/kernel-install; this is responsible for copying the kernel
binaries into /boot. It does this based on the machine-id, and it
seems its failure case with a blank machine-id is to simply skip
copying the kernels into /boot. To compound this problem, it seems
our bootloader installation doesn't notice that we don't have a kernel
installed, so we end up building an unbootable image.
Testing is/was showing us this; but as rocky is non-voting and this
occured at a random time (rather than in response to a dib change) I
think it slipped by us.
To work around this, create the machine-id early in the container. We
already have paths that remove the machine-id from final images.
Change-Id: I07e8262102d4e76c861667a98ded9fc3f4f4b82d
I think that generally this is a lot of noise in the logs, as the
internals of cache-url is well tested, so we don't need to trace log
by default.
Change-Id: I25b5a1ec0d8f99691b2b4b62b9fdd537e5a773e4
This is a squash of two changes that have unfortunately simultaneously
broken the gate.
The functests are failing with
sha256sum: bionic-server-cloudimg-amd64.squashfs.manifest: No such file or directory
I think what has happened here is that the SHA256 sums file being used
has got a new entry "bionic-server-cloudimg-amd64.squashfs.manifest"
which is showing up in a grep for
"bionic-server-cloudimg-amd64.squashfs". sha256 then tries to also
check this hash, and has started failing.
To avoid this, add an EOL marker to the grep so it only matches the
exact filename.
Change I7fb585bc5ccc52803eea107e76dddf5e9fde8646 updated the
containerfile tests to Jammy and it seems that cgroups v2 prevents
podman running inside docker [1]. While we investigate, move this
testing back to focal.
[1] https://github.com/containers/podman/issues/14884
Change-Id: I1af9f5599168aadc1e7fcdfae281935e6211a597