This is a partial revert of I1af9f5599168aadc1e7fcdfae281935e6211a597.
I believe we worked around this issue with podman and cgroups with
Ie663d01d77e17f560a92887cba1e2c86b421b24d in the nodepool-builder
container. So we can unpin this.
Change-Id: I6a818999006c539e84aae8b59d5055c2f3aa25ca
This provides some test coverage on the new diskimage-builder.
It also makes the logfile argument handling a bit simpler.
Change-Id: Iecba581a00ba26131248566cb3088a1566dde00d
The `diskimage-builder` command provides a yaml file based interface
to `disk-image-create` and `ramdisk-image-create`. Every argument to
these scripts has a YAML equivalent. The command has the following
features:
- Environment values can be provided from the calling environment as
well as YAML
- All arguments are validated with jsonschema in the most appropriate
YAML type
- Schema is self-documenting and printed when running with --help
- Multiple YAML files can be specified and each file can have multiple
images defined
- Entries with duplicate image names will be merged into a single
image build, with attributes overwritten, elements appended, and
environment values updated/overwritten. A missing image name implies
the same image name as the previous entry.
- --dry-run and --stop-on-failure flags
A simple YAML defintion would resemble:
- imagename: centos-minimal
checksum: true
install-type: package
elements: [centos, vm]
- imagename: ironic-python-agent
elements:
- ironic-python-agent-ramdisk
- extra-hardware
The TripleO project has managed image build options with YAML files
and it has proved useful having git history and a diff friendly
format, specifically for the following situations:
- Managing differences between distros (centos, rhel)
- Managing changes in major distro releases (centos-8, centos-9-stream)
- Managing the python2 to python3 transition, within and across major
distro releases
Now that the TripleO toolchain is being retired this tool is being
proposed to be used for the image builds of TripleO's successor, as
well as the rest of the community.
Subsequent commits will add documentation and switch some tests to
using `diskimage-builder`.
Change-Id: I95cba3530d1b1c6c52cf547338762e33738f7225
By default [1] the `aarch64` ARCH value is converted to `arm64`. But
Fedora uses `aarch64` to refer to the architecture.
Convert incoming ARCH values of `arm64` into `aarch64` as is already
done for `amd64` -> `x86_64`
[1] 174089a6a5/diskimage_builder/lib/common-defaults (L29-L30)
Change-Id: I6d9698e45b1183007bac49544da196ec78a7ac6a
The second release masks the first release, which is probably a
mistake.
Order them from most significant to least;
release > distro > family > default
And fix up the indentation.
Change-Id: I54a6a49d4fe001b1a16ab38637cb55542ce96cdb
Sometimes umount doesn't have much time to finish and failed with
error 'target is busy', but this is not an actual error in some cases
and the operation should be repeated again with some timeout.
This solves the issue and raise actual exception only after several
tries with timeout.
Closes-Bug: #2004492
Change-Id: I069af85b52e20e9fd688f9ae07e66beb2179f3e1
Signed-off-by: Maksim Malchuk <maksim.malchuk@gmail.com>
The RedHat platforms vary if they come pre-installed with curl or
curl-minimal. For example, Fedora 37 container images have curl, and
centos 9-stream and Rocky images have curl-minimal. If you try and
install curl when curl-minimal is installed, you get an error, and
vice-versa.
Unfortunately package-installs can't really sort this out; we're just
passing a package list to the system package manager. We don't have a
way to say "the curl OR curl-minimal package is fine". As this breaks
builds and is such a common dependency that it's already there, let's
just add a note that curl is required and blank out the package-map.
Change-Id: I9ccebe2dbf3a8682dab60c2070c5f78849e01446
skipsdist now basically means don't install the project at all
(regardless of the usedevelop setting) which creates problems for dib's
entrypoints. Remove skipsdist so that entrypoints can be found. Also, we
remove basepython because this confuses tox v4 on whether or not the
python it wants is present.
Change-Id: I16388a8ad50483228d0b71745f11563f891249c0
The previous commit was tested on 2TB without issue, but testing on a
very small volume (80GB) resulted in the thin pool lvextend failing
for being one extent too large.
This change reduces the pool size by one extent.
Change-Id: I7ca002783f8f15946bc84af95eecaa097e70aaf1
Related: rhbz#2149586
An LVM thin pool has an associated metadata volume, and it can be
assumed that the size of this volume on the image is minimized for
distribution.
This change grows the metadata volume by 1GiB, which is recommended[1] as
a reasonable default. This fixes a specific issue with the metadata
volume being exausted when growing into a 2TB drive.
Other minor changes include:
- Human readable printed values have switched to GiB, MiB, KiB, B
- Growth percentage volumes are adjusted down to not over-provision
the thin volume
[1] https://access.redhat.com/solutions/6318131
Change-Id: I1dd6dd932bb5f5d9adac9b78a026569165bd4ea9
Resolves: rhbz#2149586
If client have not internet or have some limitation, such as firewall/proxy/etc. this step will stop build image with error. Client must have possible override of URL for pass this step.
Change-Id: Iafe3283665a437d0a9cf83a93ff66c0613310b69
These must have broken when we switched the base nodes to Jammy.
Update to use compatible versions of distros.
We need to squish another gate-breaking change in here to update the
containerfile "podman build" calls to use "--network host". We added
this with Ia885237406bf4c7b9d49b349f374558ae746401f and the only
external user I can find is kayobe, which is setting this anyway.
I honestly haven't 100% root-caused what changed to require this; the
last time our containerfile jobs ran and worked has unfortunately been
purged so I can't compare versions to try and pinpoint something;
i.e. this may be a podman bug or feature. At first I thought it
related to the networking plugin package from the Depends-On (which is
still useful for the right packages) but that didn't help get the
bridge networking working.
Depends-On: https://review.opendev.org/c/zuul/nodepool/+/867590
Change-Id: I23f091654cb212e8bdd908664b262de9bfe98cef
The problem lays with the 'extract-image' script as
it is using lsblk commands to extract image's partition
(find out root/efi/boot, lines:100-102) but the output
is empty inside a container.
lsblk gives empty output for FSTYPE, LABEL, GUID..
the fix is to use blkid.
Closes-Bug: 1974350
Change-Id: I3b460c6dd9caa519c55327c5bd4b7e4585a8bd22
Added growpart element. It allows for growing specific partitions
during the deployment, which will result in less post deploy actions
needed for the server to be ready for use.
Change-Id: I6519fba3e8f1d078b99d3c03f2ac85f7b6e37d8a
A recent change that didn't fail with hard-tabs made me realise we're
not running tox -e pep8 ... which means we're not running dib-lint
which should find this (and other things).
I couldn't pinpoint when this happened; maybe job config was never in
this repo.
Anyway, move the pylint and dib-lint/flake8 testing to the now
standard "linters" and update the linting job to
openstack-tox-linters.
It looks like pylint is very lightly used (came in with
I7e24d8348db3aef79e1395d12692199a1f80161a and we've never expanded any
testing). Leave this alone for now, but probably it is not important
any more.
This revealed some issues; updated flake8
(Iaa19c36f8cab8482a01f764c588375db8e7d8be3) found some spacing issues
with keywords and an update to elrepo to match our standard bash
flags.
Change-Id: I45bf108c467f7c8190ca252e6c48450c2622aaf8
Starting with Fedora 36 the NetworkManager package no longer includes
ifcfg support by default. You need an additional package
"NetworkManager-initscripts-ifcfg-rh" to pull in the compatibility
plugin. Glean's support for Fedora relies on this compatibility system
so we install this package via the simple-init element package deps.
Change-Id: I76ac39b8dedcb1c5bc4595aedc0a732c99c8721e
The utility `passwd` is currenly missing from the images built
with the rocky-container image due to its container lineage.
Change-Id: If80c202c8adab6c5b750c54da5784b5afcd6bf19
This change extends the block device lvs attributes to allow creating
a volume which represents a thin pool, and to create volumes which are
allocated from this pool.
Change-Id: Ic58f55c36236cc8c6279fbcb708e27dc2982f2d5
This change enhances the growvols script to support all volumes being
backed by one thin provisioning pool.
If a pool is detected, the following occurs:
- validation to confirm every volume is backed by the pool
- only the pool is extended into the new partition
- volumes are extended by the same amount as the non thin-provisioned
case
This results in no volumes being over-provisioned, so
out-of-space behaviour will be the same as the non thin-provisioned
case.
This change also switches to using /dev/mapper device mapper paths for
volume block devices, since that is the only path the thin pool is
mapped to.
Change-Id: I96085fc889e72c942cfef7e3acb6f6cd73f606dd
It turns out we do need to create the machine-id for the same reason
as on 8. This was being hidden by the bootloader choosing the root
disk label from the host (see the dependent change).
Change I3b518802d681b888916a5cc6a3dcf7e1b537da1e has modified the
testing to use a different root-disk label, which should help catch
this in the fututure.
Depends-On: https://review.opendev.org/c/zuul/nodepool/+/853574
Change-Id: I64de66cac25fd2e051780fb4812e075c647eb76e
--root-label was added with I596104d1a63b5dc6549e8460a1ae3da00165ef04
This sets the ROOT_LABEL environment variable.
Over the years how this deploys has become more complex; now this
value gets written into DIB_BLOCK_DEVICE_PARAMS_YAML default values,
which is then loaded into DIB_ROOT_LABEL.
To override this from the environment you need to specify a full
DIB_BLOCK_DEVICE_CONFIG -- we don't have a way to just merge in the
root label setting.
Using the command-line argument is difficult with tools like nodepool
where the command-line is baked into something else. However we
already have methods for overriding environment variables on dib
calls.
Several of the other variables here accept default values from the
environment, so this is not an outlier. Making ROOT_LABEL also do
this allows us to test with non-default root devices in the gate (see
the linked change).
Change-Id: Ia1ef48c24841a86f387ff9603c64fd23d8670193
Needed-By: https://review.opendev.org/c/zuul/nodepool/+/853574
Without this change, the final unmount will timeout after the
rollbacks are called when the partitioning fails due to a user error.
dmsetup remove is called both for partition and LVM volume devices.
Change-Id: I99679ea00338d4018a95d4da9b21685161cd5049
This commit fixes wrong name of yum.conf files
on CentOS 9 Stream in centos element to correct ones.
Change-Id: I25f0661fa79b7bc8ac1b8e3b2831a413c4161d1d
The opensuse qcow2 image seems to regularly have a mismatch with its
sha256sum file. Possibly because these are being served by different
mirrors and are out of sync with each other? I have been able to
reproduce this locally downloading each file and comparing the resutling
hash.
Change-Id: Ic849f5b2afa488d9518065084112f8fc6e3083b2
openEuler 20.03-LTS-SP2 was out of date in May 2022. 22.03 LTS
is the newest LTS version. It was release in March 2022 and
will be maintained for 2 years. This patch upgrades the LTS
version. It'll be used in Devstack, Kolla-ansible and so on
in CI jobs.
This patch also enables the YUM mirror to speed up the package
download.
Change-Id: Iba38570d96374226b924db3aca305f7571643823
Somewhere between the upstream container
rockylinux/rockylinux:8.6.20220515 and the latest release, systemd
started to be pre-installed in the container.
With <= 20220515 installing the kernel-core package would end up
pulling in systemd. As part of the systemd package installation, the
/etc/machine-id file is created and populated.
The kernel package post-install steps install the kernel with
/bin/kernel-install; this is responsible for copying the kernel
binaries into /boot. It does this based on the machine-id, and it
seems its failure case with a blank machine-id is to simply skip
copying the kernels into /boot. To compound this problem, it seems
our bootloader installation doesn't notice that we don't have a kernel
installed, so we end up building an unbootable image.
Testing is/was showing us this; but as rocky is non-voting and this
occured at a random time (rather than in response to a dib change) I
think it slipped by us.
To work around this, create the machine-id early in the container. We
already have paths that remove the machine-id from final images.
Change-Id: I07e8262102d4e76c861667a98ded9fc3f4f4b82d