When your booting a Linux system using dracut, i.e. with any
redhat style distribution, dracut's internal code looks to validate
the kernel hmac signature in before proceeding to userspace.
It does this by looking at the /boot/ folder file for the kernel
hmac file.
And it normally does this with the root filesystem. Except if the
kernel is not on the root filesystem and is instead on a /boot
filesystem, this breaks horribly. This is compounded because
DIB enables the operator to restructure the OS image/layout
to fit their needs. In order for this to be navigated, as dracut
is written, we need to pass a "boot=" argument to the kernel.
So now we attempt to purge any prior boot entry in the disk image
content, which is good because any filesystem operations invalidate
it, and then we attempt to identify the boot filesystem, and save a
boot kernel command line parameter so the resulting image can
boot properly if FIPS was enabled in the prior image.
Regex developed with https://sed.js.org utilizing stdin:
VAR="quiet boot=UUID=173c759f-1302-48a3-9d51-a17784c21e03 text"
VAR="quiet boot=PARTUUID=173c759f-1302-48a3-9d51-a17784c21e03"
VAR="quiet boot=PARTUUID=173c759f-1302-48a3-9d51-a17784c21e03 reboot=meow"
VAR="quiet boot=UUID=/dev/sda1 text"
VAR="quiet boot=/dev/sda1"
VAR="quiet boot=/dev/sda1 reboot=meow"
VAR="quiet after_boot=1 reboot=meow boot=/dev/sda1"
VAR="quiet after_boot=1 reboot=meow"
Which resulted in stdout:
VAR="quiet text"
VAR="quiet"
VAR="quiet reboot=meow"
VAR="quiet text"
VAR="quiet"
VAR="quiet reboot=meow"
VAR="quiet after_boot=1 reboot=meow"
VAR="quiet after_boot=1 reboot=meow"
Change-Id: I9034c21e84deda2ba2c0ec0d1d6d6595ed10bed4
The `diskimage-builder` command provides a yaml file based interface
to `disk-image-create` and `ramdisk-image-create`. Every argument to
these scripts has a YAML equivalent. The command has the following
features:
- Environment values can be provided from the calling environment as
well as YAML
- All arguments are validated with jsonschema in the most appropriate
YAML type
- Schema is self-documenting and printed when running with --help
- Multiple YAML files can be specified and each file can have multiple
images defined
- Entries with duplicate image names will be merged into a single
image build, with attributes overwritten, elements appended, and
environment values updated/overwritten. A missing image name implies
the same image name as the previous entry.
- --dry-run and --stop-on-failure flags
A simple YAML defintion would resemble:
- imagename: centos-minimal
checksum: true
install-type: package
elements: [centos, vm]
- imagename: ironic-python-agent
elements:
- ironic-python-agent-ramdisk
- extra-hardware
The TripleO project has managed image build options with YAML files
and it has proved useful having git history and a diff friendly
format, specifically for the following situations:
- Managing differences between distros (centos, rhel)
- Managing changes in major distro releases (centos-8, centos-9-stream)
- Managing the python2 to python3 transition, within and across major
distro releases
Now that the TripleO toolchain is being retired this tool is being
proposed to be used for the image builds of TripleO's successor, as
well as the rest of the community.
Subsequent commits will add documentation and switch some tests to
using `diskimage-builder`.
Change-Id: I95cba3530d1b1c6c52cf547338762e33738f7225
By default [1] the `aarch64` ARCH value is converted to `arm64`. But
Fedora uses `aarch64` to refer to the architecture.
Convert incoming ARCH values of `arm64` into `aarch64` as is already
done for `amd64` -> `x86_64`
[1] 174089a6a5/diskimage_builder/lib/common-defaults (L29-L30)
Change-Id: I6d9698e45b1183007bac49544da196ec78a7ac6a
The second release masks the first release, which is probably a
mistake.
Order them from most significant to least;
release > distro > family > default
And fix up the indentation.
Change-Id: I54a6a49d4fe001b1a16ab38637cb55542ce96cdb
Sometimes umount doesn't have much time to finish and failed with
error 'target is busy', but this is not an actual error in some cases
and the operation should be repeated again with some timeout.
This solves the issue and raise actual exception only after several
tries with timeout.
Closes-Bug: #2004492
Change-Id: I069af85b52e20e9fd688f9ae07e66beb2179f3e1
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
The RedHat platforms vary if they come pre-installed with curl or
curl-minimal. For example, Fedora 37 container images have curl, and
centos 9-stream and Rocky images have curl-minimal. If you try and
install curl when curl-minimal is installed, you get an error, and
vice-versa.
Unfortunately package-installs can't really sort this out; we're just
passing a package list to the system package manager. We don't have a
way to say "the curl OR curl-minimal package is fine". As this breaks
builds and is such a common dependency that it's already there, let's
just add a note that curl is required and blank out the package-map.
Change-Id: I9ccebe2dbf3a8682dab60c2070c5f78849e01446
The previous commit was tested on 2TB without issue, but testing on a
very small volume (80GB) resulted in the thin pool lvextend failing
for being one extent too large.
This change reduces the pool size by one extent.
Change-Id: I7ca002783f8f15946bc84af95eecaa097e70aaf1
Related: rhbz#2149586
An LVM thin pool has an associated metadata volume, and it can be
assumed that the size of this volume on the image is minimized for
distribution.
This change grows the metadata volume by 1GiB, which is recommended[1] as
a reasonable default. This fixes a specific issue with the metadata
volume being exausted when growing into a 2TB drive.
Other minor changes include:
- Human readable printed values have switched to GiB, MiB, KiB, B
- Growth percentage volumes are adjusted down to not over-provision
the thin volume
[1] https://access.redhat.com/solutions/6318131
Change-Id: I1dd6dd932bb5f5d9adac9b78a026569165bd4ea9
Resolves: rhbz#2149586
If client have not internet or have some limitation, such as firewall/proxy/etc. this step will stop build image with error. Client must have possible override of URL for pass this step.
Change-Id: Iafe3283665a437d0a9cf83a93ff66c0613310b69
These must have broken when we switched the base nodes to Jammy.
Update to use compatible versions of distros.
We need to squish another gate-breaking change in here to update the
containerfile "podman build" calls to use "--network host". We added
this with Ia885237406bf4c7b9d49b349f374558ae746401f and the only
external user I can find is kayobe, which is setting this anyway.
I honestly haven't 100% root-caused what changed to require this; the
last time our containerfile jobs ran and worked has unfortunately been
purged so I can't compare versions to try and pinpoint something;
i.e. this may be a podman bug or feature. At first I thought it
related to the networking plugin package from the Depends-On (which is
still useful for the right packages) but that didn't help get the
bridge networking working.
Depends-On: https://review.opendev.org/c/zuul/nodepool/+/867590
Change-Id: I23f091654cb212e8bdd908664b262de9bfe98cef
The problem lays with the 'extract-image' script as
it is using lsblk commands to extract image's partition
(find out root/efi/boot, lines:100-102) but the output
is empty inside a container.
lsblk gives empty output for FSTYPE, LABEL, GUID..
the fix is to use blkid.
Closes-Bug: 1974350
Change-Id: I3b460c6dd9caa519c55327c5bd4b7e4585a8bd22
Added growpart element. It allows for growing specific partitions
during the deployment, which will result in less post deploy actions
needed for the server to be ready for use.
Change-Id: I6519fba3e8f1d078b99d3c03f2ac85f7b6e37d8a
A recent change that didn't fail with hard-tabs made me realise we're
not running tox -e pep8 ... which means we're not running dib-lint
which should find this (and other things).
I couldn't pinpoint when this happened; maybe job config was never in
this repo.
Anyway, move the pylint and dib-lint/flake8 testing to the now
standard "linters" and update the linting job to
openstack-tox-linters.
It looks like pylint is very lightly used (came in with
I7e24d8348db3aef79e1395d12692199a1f80161a and we've never expanded any
testing). Leave this alone for now, but probably it is not important
any more.
This revealed some issues; updated flake8
(Iaa19c36f8cab8482a01f764c588375db8e7d8be3) found some spacing issues
with keywords and an update to elrepo to match our standard bash
flags.
Change-Id: I45bf108c467f7c8190ca252e6c48450c2622aaf8
Starting with Fedora 36 the NetworkManager package no longer includes
ifcfg support by default. You need an additional package
"NetworkManager-initscripts-ifcfg-rh" to pull in the compatibility
plugin. Glean's support for Fedora relies on this compatibility system
so we install this package via the simple-init element package deps.
Change-Id: I76ac39b8dedcb1c5bc4595aedc0a732c99c8721e
The utility `passwd` is currenly missing from the images built
with the rocky-container image due to its container lineage.
Change-Id: If80c202c8adab6c5b750c54da5784b5afcd6bf19
This change extends the block device lvs attributes to allow creating
a volume which represents a thin pool, and to create volumes which are
allocated from this pool.
Change-Id: Ic58f55c36236cc8c6279fbcb708e27dc2982f2d5
This change enhances the growvols script to support all volumes being
backed by one thin provisioning pool.
If a pool is detected, the following occurs:
- validation to confirm every volume is backed by the pool
- only the pool is extended into the new partition
- volumes are extended by the same amount as the non thin-provisioned
case
This results in no volumes being over-provisioned, so
out-of-space behaviour will be the same as the non thin-provisioned
case.
This change also switches to using /dev/mapper device mapper paths for
volume block devices, since that is the only path the thin pool is
mapped to.
Change-Id: I96085fc889e72c942cfef7e3acb6f6cd73f606dd
It turns out we do need to create the machine-id for the same reason
as on 8. This was being hidden by the bootloader choosing the root
disk label from the host (see the dependent change).
Change I3b518802d681b888916a5cc6a3dcf7e1b537da1e has modified the
testing to use a different root-disk label, which should help catch
this in the fututure.
Depends-On: https://review.opendev.org/c/zuul/nodepool/+/853574
Change-Id: I64de66cac25fd2e051780fb4812e075c647eb76e
--root-label was added with I596104d1a63b5dc6549e8460a1ae3da00165ef04
This sets the ROOT_LABEL environment variable.
Over the years how this deploys has become more complex; now this
value gets written into DIB_BLOCK_DEVICE_PARAMS_YAML default values,
which is then loaded into DIB_ROOT_LABEL.
To override this from the environment you need to specify a full
DIB_BLOCK_DEVICE_CONFIG -- we don't have a way to just merge in the
root label setting.
Using the command-line argument is difficult with tools like nodepool
where the command-line is baked into something else. However we
already have methods for overriding environment variables on dib
calls.
Several of the other variables here accept default values from the
environment, so this is not an outlier. Making ROOT_LABEL also do
this allows us to test with non-default root devices in the gate (see
the linked change).
Change-Id: Ia1ef48c24841a86f387ff9603c64fd23d8670193
Needed-By: https://review.opendev.org/c/zuul/nodepool/+/853574
Without this change, the final unmount will timeout after the
rollbacks are called when the partitioning fails due to a user error.
dmsetup remove is called both for partition and LVM volume devices.
Change-Id: I99679ea00338d4018a95d4da9b21685161cd5049
This commit fixes wrong name of yum.conf files
on CentOS 9 Stream in centos element to correct ones.
Change-Id: I25f0661fa79b7bc8ac1b8e3b2831a413c4161d1d
openEuler 20.03-LTS-SP2 was out of date in May 2022. 22.03 LTS
is the newest LTS version. It was release in March 2022 and
will be maintained for 2 years. This patch upgrades the LTS
version. It'll be used in Devstack, Kolla-ansible and so on
in CI jobs.
This patch also enables the YUM mirror to speed up the package
download.
Change-Id: Iba38570d96374226b924db3aca305f7571643823
Somewhere between the upstream container
rockylinux/rockylinux:8.6.20220515 and the latest release, systemd
started to be pre-installed in the container.
With <= 20220515 installing the kernel-core package would end up
pulling in systemd. As part of the systemd package installation, the
/etc/machine-id file is created and populated.
The kernel package post-install steps install the kernel with
/bin/kernel-install; this is responsible for copying the kernel
binaries into /boot. It does this based on the machine-id, and it
seems its failure case with a blank machine-id is to simply skip
copying the kernels into /boot. To compound this problem, it seems
our bootloader installation doesn't notice that we don't have a kernel
installed, so we end up building an unbootable image.
Testing is/was showing us this; but as rocky is non-voting and this
occured at a random time (rather than in response to a dib change) I
think it slipped by us.
To work around this, create the machine-id early in the container. We
already have paths that remove the machine-id from final images.
Change-Id: I07e8262102d4e76c861667a98ded9fc3f4f4b82d
I think that generally this is a lot of noise in the logs, as the
internals of cache-url is well tested, so we don't need to trace log
by default.
Change-Id: I25b5a1ec0d8f99691b2b4b62b9fdd537e5a773e4
This is a squash of two changes that have unfortunately simultaneously
broken the gate.
The functests are failing with
sha256sum: bionic-server-cloudimg-amd64.squashfs.manifest: No such file or directory
I think what has happened here is that the SHA256 sums file being used
has got a new entry "bionic-server-cloudimg-amd64.squashfs.manifest"
which is showing up in a grep for
"bionic-server-cloudimg-amd64.squashfs". sha256 then tries to also
check this hash, and has started failing.
To avoid this, add an EOL marker to the grep so it only matches the
exact filename.
Change I7fb585bc5ccc52803eea107e76dddf5e9fde8646 updated the
containerfile tests to Jammy and it seems that cgroups v2 prevents
podman running inside docker [1]. While we investigate, move this
testing back to focal.
[1] https://github.com/containers/podman/issues/14884
Change-Id: I1af9f5599168aadc1e7fcdfae281935e6211a597
Gentoo can manage python versions itself. Before this commit users were
forced to set python versions themselves. Now they have the option to
set it if they wish.
The workaround needed for git is also no longer needed, so it's been
removed.
Change-Id: I06b259ef73a40df6b8ab92a5b424bffcf4ef764d
Signed-off-by: Matthew Thode <mthode@mthode.org>
The block device lvm lvs `size` attribute was passed directly to
lvcreate, so using units M, G means base 2. All other block device
size values are parsed with accepted conventions of M, B being base 10
and MiB, GiB being base 2.
lvm lvs `size` attributes are now parsed the same as other size
attributes. This improves consistency and makes it practical to
calculate volume sizes to fill the partition size. This means existing
size values will now create slightly smaller volumes. Previous sizes
can be restored by changing the unit to MiB, GiB, or increasing the
value for a base 10 unit.
The impact on this change should be minimal, the only known uses of lvm
volumes (TripleO, and element block-device-efi-lvm) uses extents
percentage instead of size. The smaller sizes can always be increased
after deployment.
Requested sizes will also be rounded down to align with physical
extents (4MiB). Previously specifying a value which did not align on
4MiB would consume an extra extent which could unexpectedly consume
more than the partition size.
Change-Id: Ia109cc5105071d82cc895d8d9cb85bc47da20a7a
All indication in CI is that Centos Stream9's use of dhclient
appears to point to compatability issues when interacting with
dnsmasq. However, this doesn't appear to be the issue with the
internal dhcp client. As such, lets constraint the RH default
so that it no longer applies to Centos 9-stream.
I've also added a documentation entry for DIB_DHCP_CLIENT which
was previously undocumented.
As an aside, I've already reached out to RH's NetworkManager team
regarding this, but root cause is not entirely understood at this
point.
Change-Id: I235f75b385a8b0348c8fe064038c51409f8722c4
Story: 2010109
Task: 45677
Creating a separate /boot partition is desirable in some cases[1].
This change detects if /boot is a partition, and ensures that the
kernel/ramdisk paths are correct in either case. This is applied to
all BLS entries files, whether they were generated by the previous
grub2-mkconfig call or in the source image.
This means the rhel9 specific workaround can be removed since all
paths are now normalised at this stage.
[1] https://review.opendev.org/c/openstack/tripleo-image-elements/+/846807
Change-Id: I62120ec8c65876e451532d2654d37435eb3606a6
Resolves: rhbz#2101514
Currently if no Dockerfile is specified or found, we exit later with
an obscure error. Check this after the element search; if we still
don't have something to build then we can't continue.
Change-Id: Ifb17a0995fab0ccfe7ee08363676c1fa57e37592
'9-stream' was being matched against the regex '9',
causing builds on RHEL9 to try to install C9S RPMs.
We want this the other way so that DIB_RELEASE=9
will not match the regex '9-stream'.
Resolves: rhbz#2097443
Signed-off-by: Lon Hohberger <lhh@redhat.com>
Change-Id: Iefd7c23512c460e33117d12bbc33606134daa9e2
Add a warning in satellite configuration as when no activation_key
is provided and no environment is configured, subscription-manager
might hang as it's prompting the user to provide the missing
parameter.
Change-Id: I9564841ca845eafc2bd39be6b05bef62e8062f28
Due to the referenced inline issue, 9-stream currently fails running
setfiles in a chroot without /proc. Since we want to actually label
/proc, we don't want it mounted. This pulls in the fixed packages to
get things going until the fix is rolled out.
Change-Id: Id41c16130e975779cb70e2ab19807a689450d026
When building an image, say RHEL9, on a host installed with that
same image, you will be blocked from mounting the filesystems to
extract contents, as the host OS kernel will identify the duplicate
UUIDs and error accordingly.
This was previously fixed for the root filesystem, but not the boot
filesystem.
Change-Id: I63a34fba033ed1c459aeb9c201c8821fa38a36e9
the command had one error in it (missing one backslash)
and was rendered wrong, w/o any backslashes at all.
Change-Id: If187f645b818f47d10b602ccee12c29892a8d88d
After some recent reordering[0], the /boot/grub directory isn't created
early enough on Gentoo any more, let us just ensure ourselves that it is
in place when we create the grub config.
[0] I8cb34914bbbfa05521bbb71cc6637368b980358f
Change-Id: I8a84d08c3090e46b00d1d626fb984f66ea33f256
Previously a module version was splitted from the module name:
nvidia, 510.47.03, 5.4.0-109-generic, x86_64: installed
In Jammy it is now a part of the name:
nvidia/510.47.03, 5.15.0-27-generic, x86_64: installed
Assuming the fact that it would be threatted as a path this change
doesn't brake anything which was working before. But at the same
time it allows to pass last step where dkms is requested to build all
modules.
Change-Id: Ic1bb2b45f9db906b64ca03ae5c4e05b2114f2a74
It may happen a base image has an edited version of cloud-init
"cloud.cfg" that prevents the host keys to be generated.
While it didn't represent an issue with older releases of cloud-init,
starting cloud-init-22 this isn't true anymore.
Before that release, an sshd-keygen@.service was present and called by
sshd-keygen.target (which was called by sshd.service), and we ended up
with ssh host keys in any cases - either generated from cloud-init, or
generated by sshd-keygen.service.
But cloud-init-22 introduced an edition to the sshd-keygen.service,
making it check for the presence of cloud-init service, and preventing
this sshd-keygen to kick in this case.
So we'd better ensure cloud-init is able to generate the keys, else
we'll be in a bad state, since it's instructed to remove the ones
present.
Closes-Bug: #1971751
Change-Id: I37b2f3e9d57a86544ef14e74a4a927309c18bbf0
This adds arm64 ubuntu-minimal Jammy functests and x86 ubuntu image
based Jammy functests. To make this happen we have to install
debootstrap from debian unstable on the functest nodes in order to get
access to a debootstrap that knows what jammy is.
As we ramp up Jammy support in our tools having good testing will be
helpful.
Change-Id: I1d1dc752ce176457d0656cbd50e27a2721ca9856
This makes 03-reset-bls-entries consistent with rhel so that the glob
match is *.conf, and a check is added to ensure that a rename is
actually required.
Change-Id: I4adff43cf7d4f31d939e6ddf37ac8d162ccd0db7
This reverts commit 8401290976.
We are reverting this because some users may want to use predictable
device names and may not even use Debian. However, after some
investigation we have found a couple of bugs in dhcp-all-interfaces on
Debuntu distros. The parent change corrects those bugs. Additionally new
Linux kernels emit "move" events to udev when interfaces are renamed to
their predictable name. Support this "move" in the dhcp-all-interfaces
udev rules. Making these changes appaers to produce functional images
for Debian users using predictable device names. If predictable device
names are not desired turning them off is straightforward and release
notes are updated to give users the info they need to do that outside of
this element.
Change-Id: I125f1a0c78a103b51bda961528c3e66c345bf604
Co-Authored-By: Clark Boylan <clark.boylan@gmail.com>
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
There are two issues with dhcp-all-interfaces on debuntu interfaces
addressed here. First is the path to dhclient lease files is
/var/lib/dhcp not /var/lib/dhclient. Second there is a missing newline
in the ENI interface file which causes a parse error.
Change-Id: Ice83e0d49a4234301dc12daf828ba80fef414cdb
I just saw in the trace output of a failure
> grep -o 'CentOS-.[^>]*GenericCloud-.[^>]*.qcow2'
> sort -r
> head -1
sort: fflush failed: 'standard output': Broken pipe
sort: write error
i.e. the "head -1" has exited after reading one line, but "sort -r"
still wants to write and thus has hit a pipe failure, and because we
run with "-o pipefail" this has halted the script.
This seems like it has been there more or less forever, maybe we just
got lucky hitting it now? Anyway, we can work around this by using a
process substitution and passing the output of this into head, this
way we won't hit a pipe failure.
I also updated the fedora path as it does the same thing.
Change-Id: I44d97e5bb31702aacf396e0229329a2ef9c64f2f
We've not really been using the Focal containerfile, as we move
forward jammy is a better choice for keeping stable as we might find
some new users for it.
Also add binutils to bindep for native bullseye builds (see
Icb0e40827c9f8ac583fa143545e6bed9641bf613)
Change-Id: I22ebe2bbccaec34180e58996b21e47bfc4f36055
03-reset-bls-entries was previously a pre-install script to run after
the machine-id was set, but a new kernel may be installed during the
install phase, which will install another bls entry file with a
filename which differs from the machine-id.
This means this package installed bls file won't be updated when
grub2-mkconfig is called, resulting in incorrect kernel args and boot
device in the entry file that will get booted by default.
By fixing the filenames after the new kernel is installed,
grub2-mkconfig will update the bls file that actually gets used on
boot.
Change-Id: I653bef9638e38ded68458fd40d90e30e5206caad
According to the systemd documentation[1], if /etc/machine-id is empty
it will be populated with a unique value, but not in a way which
triggers an actual first boot event (running units with
ConditionFirstBoot=yes set)
This change writes "uninitialized" to /etc/machine-id to ensure that
systemd-firstboot.service actually runs, and other units can use
first-boot-complete.target as a dependency to trigger on first boot.
Since /var/lib/dbus/machine-id is sometimes a symlink to
/etc/machine-id, it is truncated before writing to /etc/machine-id.
On older versions of systemd before first boot semantics were
formalised, any non-uuid value will trigger a new machine-id to be
generated, so "uninitialized" also works.
[1] https://www.freedesktop.org/software/systemd/man/machine-id.html#First%20Boot%20Semantics
Change-Id: I77c35e51a3da2e8a6b5a2c80d033a159b303c9af