When your booting a Linux system using dracut, i.e. with any
redhat style distribution, dracut's internal code looks to validate
the kernel hmac signature in before proceeding to userspace.
It does this by looking at the /boot/ folder file for the kernel
hmac file.
And it normally does this with the root filesystem. Except if the
kernel is not on the root filesystem and is instead on a /boot
filesystem, this breaks horribly. This is compounded because
DIB enables the operator to restructure the OS image/layout
to fit their needs. In order for this to be navigated, as dracut
is written, we need to pass a "boot=" argument to the kernel.
So now we attempt to purge any prior boot entry in the disk image
content, which is good because any filesystem operations invalidate
it, and then we attempt to identify the boot filesystem, and save a
boot kernel command line parameter so the resulting image can
boot properly if FIPS was enabled in the prior image.
Regex developed with https://sed.js.org utilizing stdin:
VAR="quiet boot=UUID=173c759f-1302-48a3-9d51-a17784c21e03 text"
VAR="quiet boot=PARTUUID=173c759f-1302-48a3-9d51-a17784c21e03"
VAR="quiet boot=PARTUUID=173c759f-1302-48a3-9d51-a17784c21e03 reboot=meow"
VAR="quiet boot=UUID=/dev/sda1 text"
VAR="quiet boot=/dev/sda1"
VAR="quiet boot=/dev/sda1 reboot=meow"
VAR="quiet after_boot=1 reboot=meow boot=/dev/sda1"
VAR="quiet after_boot=1 reboot=meow"
Which resulted in stdout:
VAR="quiet text"
VAR="quiet"
VAR="quiet reboot=meow"
VAR="quiet text"
VAR="quiet"
VAR="quiet reboot=meow"
VAR="quiet after_boot=1 reboot=meow"
VAR="quiet after_boot=1 reboot=meow"
Change-Id: I9034c21e84deda2ba2c0ec0d1d6d6595ed10bed4
--root-label was added with I596104d1a63b5dc6549e8460a1ae3da00165ef04
This sets the ROOT_LABEL environment variable.
Over the years how this deploys has become more complex; now this
value gets written into DIB_BLOCK_DEVICE_PARAMS_YAML default values,
which is then loaded into DIB_ROOT_LABEL.
To override this from the environment you need to specify a full
DIB_BLOCK_DEVICE_CONFIG -- we don't have a way to just merge in the
root label setting.
Using the command-line argument is difficult with tools like nodepool
where the command-line is baked into something else. However we
already have methods for overriding environment variables on dib
calls.
Several of the other variables here accept default values from the
environment, so this is not an outlier. Making ROOT_LABEL also do
this allows us to test with non-default root devices in the gate (see
the linked change).
Change-Id: Ia1ef48c24841a86f387ff9603c64fd23d8670193
Needed-By: https://review.opendev.org/c/zuul/nodepool/+/853574
GRUB_OPTS has never been documented as externally available, and is
not used. Assume it's value to simplify the code.
Move the grub version check separately, as we only support grub2
Remove references to buliding i386 images. I don't image it works in
any way.
Remove ci.md, which is no longer relevant.
Refactor the test for "building BIOS image on EFI system" consiberably
after these changes.
Change-Id: Ia99687815667c3cf5e82cf21d841d3b1008b8fa9
As noted inline, this works around potential issues by being a strong
indication you are in a container (e.g. [1]). Since nothing should be
changing anything on the host/build system, this is a generically
safer way to operate.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1975588
Change-Id: Ic6802c4ffc2e825f129af10717860a2d1770fe80
Element block-device-efi-lvm has been added which is like
block-device-efi but defines an LVM logical group in the root
partition. Three logical volumes are defined in that group, mounted to
/, /var, and /home.
This volume layout will not meet all requirements, but this is more of
an example demonstrating the capability to encourage more usage of
this existing feature.
This is based on the overcloud-partition-uefi element in
tripleo-image-elements, and I believe this capability is too useful to
have the only working example buried in a related project repo.
This change also fixes the element string matching in
_arg_defaults_hack, the 'vm' test was also matching against 'lvm' and
'block-device-efi-lvm' elements. Also the 'block-device-' test now
properly tests for this being the prefix of the block-device element.
This change also makes block-device-efi fsck-passno compliant with the
documentation[1] so that / has value 1 and all other mounts are set to
2.
[1] https://www.man7.org/linux/man-pages/man5/fstab.5.html
Change-Id: If86a0e49186ce5a65cc0084101d31ce59a97b854
Blueprint: whole-disk-default
When a build fails, we can exit and leave ${PROFILE_DIR} behind. Make
sure this is cleaned up with an exit trap.
While we're adding a function, update the syntax of the others for
consistency.
Change-Id: I14499b5ebaaa30126aaa6b3d1bd86ed64f110fda
At this moment the IPA image building with OpenSuse is broken and here, it was failing during the release check for Opensue because etc/SuSE-release is not valid anymore and deprecated for openSuse. Its renamed to /etc/os-release for openSuse rlease 15. This PR will solve the issue to build IPA image with OpenSuse base image. There is another PR opened in ironic-python-agent-builder, which adds all the missing packages, setuptools upgrade and svc mapping to do the build successful.
https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/778726
Bug-Report: https://bugs.launchpad.net/diskimage-builder/+bug/1921510
Change-Id: Id2759be29bfcbf2ecf1ce67e171686924b506b1a
This is never called externally to dib, so doesn't need to be an
entrypoint. Call it from within dib using the running python
executable and from the lib/ directory; this means we do not need to
have the virtualenv activated to run disk-image-create.
Change-Id: Ie9b551824792864402b0c63ccc350dc5c92dcc3f
This is really an internal dib tool. Move it to the lib directory,
and call it with the python we are running under.
This is one less reason to require the virtualenv to be activated when
you run 'disk-image-create'.
Change-Id: Id689683a0b1fdcb446b04ba967284a216133d743
Update an rc-update call to only be made if running openrc instead of
all gentoo profiles (systemd does not have rc-update).
Add python3-pyyaml package mappings.
Update serial console to support multiple arches.
Update open-iscsi and open-isns keywords (looks like upstream merged
some musl fixes).
Update the kernel and initramfs file name globs for the
gentoo-kernel-bin usage.
Change-Id: I259bffed3a3e3f92be2210ead6bdfa383917d457
Signed-off-by: Matthew Thode <mthode@mthode.org>
Now that DIB is python3 only we can remove a hack that made sure
scripts outside the chroot ran with the correct version of python.
This is necessary as python3 does not resolve symbolic links to the
binary like python2.x did, which causes element scripts to fail finding
modules when DIB was run from inside a venv.
This patch does the following:
1. Reverts 9c7b8d1714 which was the
workaround for mixed python2/3 environments.
2. Updates the scripts to use "python3" instead of "python".
Change-Id: If2402bb02fc8a4778fa9434fa167ea1fafd87c28
When I tried to build CentOS8 image for AArch64 I got error saying that
MBR is not supported. So make sure that it will not be used by default.
Change-Id: Ib67ab7f808d727c3c61932c540d398dbe723972f
Some phases of diskimage-builder run outside the chroot environment,
such as the extra-data.d scripts, and don't have access to dib-python.
This means these scripts may choose the wrong python version by using
"#!/usr/bin/env python" to execute. The svc-map element is an example.
This patch creates a temporary directory and symbolic link for the
correct version of python, then manipulates the environment PATH
to preference the symbolic link "python" command.
This will allow elements with these scripts to work correctly with
the version of python diskimage-builder is running under.
Change-Id: I289d621e1bfbba0eb174dff977d1a5c92c04e4fa
Co-Authored-By: Ian Wienand <iwienand@redhat.com>
As described inline, Bionic hosts will build invalid Trusty images.
Hack around this by disabling metadata_csum in the ext4 mkfs.
Change-Id: Ibd67d58ca830a9e60605d0700ee2b17906c804e6
After the introduction of 'Add output for mis-configured element
scripts' we started seeing CI failures in tripleo where
instack-undercloud is being used (rocky/queens):
/usr/lib/python2.7/site-packages/diskimage_builder/lib/dib-run-parts: line 108: DIB_DEBUG_TRACE: unbound variable
INFO: 2019-12-02 16:24:33,423 -- ############### End stdout/stderr logging ###############
ERROR: 2019-12-02 16:24:33,423 -- Hook FAILED.
Let's make sure that by default the env variable is set
to 0.
Change-Id: I38c76c0edee436f1e7dd0c9a868cea1e6ee3271d
Closes-Bug: #1854904
When running under nodepool in a foreground, non-daemonized situation
without a tty (i.e. within a container) we're seeing this "wait" hang
indefinitely.
It is probably related to "outfilter.py" and output file descriptors,
although TBH we haven't completely root-caused it. I won't claim this
is a great solution, but it should hopefully let the dib process
finish and just die, where outfilter will disappear.
Change-Id: If78da54df3d4c240fee16aee4413ec554b37c1d6
I commonly get asked for help when people are attempting to create
local image elements and they cannot get them to work.
diskimage-builder silently ignores element scripts that it doesn't
find to it's liking, such as non-executable or files with extensions
(.sh is a common mistake).
This patch extends the '-x' tracing flag down to dib-run-parts and
will cause it to print out helpful messages when these files would
otherwise be silently ignored.
Examples:
Ignoring non-executable files: 10-do-not-run-me
Ignoring non-conforming filenames: 10-I-can-run.sh
I am not enabling these by default as they can create extra noise
and require additional filesystem IO to produce.
Change-Id: Ic804efca3015c199440b4b10da951d71a815c64f
This reverts commit a3e9e7f89e.
We still have some issues with vhd creation on RAX
In short, it appears that images fail to resize unless they have a
specific "creator" field. Revert this while we consider the options.
[1] https://bugs.launchpad.net/nova/+bug/862653
Change-Id: I2b6a3bfbfe28432fbb6a2ce4a0211939d224b8d5
The vhdutil utility is completely dead; the whole subsystem it relies
on was removed with [1] so it's not even vaguely possible to keep it
up-to-date.
I took the .raw images on a nb and used the qemu-img there (so Xenial)
and generated some VPC images; uploaded them to rackspace and the all
seemed to boot fine. If there was a problem, maybe it's been fixed on
either the qemu or RAX side in the previous few years.
Thus swith to qemu-img to generate the vhd images too.
[1] https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=5c883cf036cf5ab8b1b79390549e2475f7a568dd
Change-Id: I3099d2ebb958370fcec623087a093b2c8dbdc6c4
Add a new environment variable $DIB_GZIP_BIN allowing builders to
specify a different gzip (such as pigz) to be used when compressing
tgz images.
Change-Id: Ifb617568140a149e2fda241e07ff8a59429e6697
As noted in the change, 7fd52ba841
increased the size of the EFI partition considerably. This has meant
that our padding upwards of the disk size is insufficient and EFI
builds (arm64 in particular) is failing due to out-of-disk errors
during final image operations like installing kernels.
Similar to the discussion we had in
I65fa13a088eecdfe61636678578577ea2cfb3c0c, this feels a bit ugly
because we're mixing logic here with sizes specified in block-device
config files. But it boils down to the same problem; we are
calculating the disk size here and passing it to the block-layer, so
unless we want to make large changes to the status quo about where
these sizes are calculated, small adjustments here are the most KISS
solution.
Thus we check if we have selected the EFI bootloader element, and thus
assume there will be a large system EFI partition and expand the disk
size accordingly.
Change-Id: Ifa05366c2f2b95259f3312e4dde8c85347075ba1
python 3.6 warns about regexes like:
DeprecationWarning: invalid escape sequence \+
I noticed that debugging a trove job and it really led me in the wrong
way. Fix this with making it a raw string.
Change-Id: I58ee1a49d62316c6c3f0588832c97f659f7e460b
This patch removes the check and default for rhel 8 requiring
xfs filesystem as rhel 8 images can successfully be built with
ext4 filesystems.
Change-Id: I1a6bfa26324fd43ae0c77c2c977dda0dd56e26e5
Make a version-less RHEL element to handle both '7' and '8' DIB_RELEASE.
The element usage should align with other elements which operate in the
same way such as the Fedora element.
Additionally, this patch adds support for RHEL8 that operates with
Python 3.
As of now, users of diskimage-builder will still be able to use the
'rhel7' element, or migrate to 'rhel' and specify their respective
DIB_RELEASE value.
* mount the xfs file-system for extraction as read-only. vaguely
based on explaination in [1] and the fact we only read the image
data into a tar, so can ignore this.
XFS (dm-1): Superblock has unknown read-only compatible features (0x4) enabled.
* Use the redhat system python as the dib-python version. dib was
ahead of it's time making an abstracted python interpreter for
system work ;) the system python should work for running the various
dib element scripts.
[1] https://unix.stackexchange.com/questions/247550/unmountable-xfs-filesystem
Redhat-Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1700253
Co-Authored-By: Ian Wienand <iwienand@redhat.com>
Change-Id: I90540675c70bb475d9db2ae24f81c648a31f3f95
I want to use the new --image-extra-size flag[1] but my use-case
calls for megabyte granularity of this value. Rather than adding
60% to an 800MB image, maybe I only want to add 100 or 200MB, etc.
[1] https://review.opendev.org/#/c/655127/
Change-Id: I8fb9685d60ebb1260d5efcf03c5c23c561c24384
Currently diskimage-builder supports two ways to specify the image
size. One is defining a fixed image size using DIB_IMAGE_SIZE, the
other one is auto-detection while adding a security margin of 60% as
free space. This means when building larger images (e.g. >100GB) with
unknown size upfront we end up with much wasted space, IO and network
traffic when uploading the images to several cloud providers. This can
be optimized by adding a third way by defining DIB_IMAGE_EXTRA_SIZE to
specify the free space in GB. This makes it possible to easily build
images of varying sizes while still minimizing the overhead by keeping
the free space constant to e.g. 1GB.
Change-Id: I114c739d11d0cfe3b8d8abc6df5ff989edfb67f2
In many cases, the statically sized 64MB journal is far below the
e2fstools default calculation[0] which calls for a 64MB journal only
on filesystems smaller than 16GB. On bare metal in particular, the
correct default journal size will often be in the 512MB-1GB range.
Since we cannot know what the target system is, this should be a
tunable parameter that the user can set depending on the intended
image usage.
Add a DIB_JOURNAL_SIZE envvar and --mkfs-journal-size parameter
to the image creation so users can override the default journal
size.
[0] https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/lib/ext2fs/mkjournal.c#n333
Change-Id: I65fa13a088eecdfe61636678578577ea2cfb3c0c
This is only one line, but it takes a lot to untangle ... basically
the current "correct" path is:
---
mk_build_dir()
-> sets trap trap_cleanup EXIT
... stuff ..
mount_proc_dev_sys
-> mounts $TMP_MOUNT_PATH/<proc,dev.sysfs>
pre-finalise.d
finalise.d
unmount_image $TMP_BUILD_DIR/mnt # nb == $TMP_MOUNT_PATH
-> unmount_dir()
-> recursive unmount everything inside TMP_MOUNT_PATH
TMP_IMAGE_PATH=$(dib-block-device getval image-path)
export TMP_IMAGE_PATH
dib-block-device umount
dib-block-device cleanup
... actually cleanup directories ...
---
Our current failure exit trap does:
---
dib-block-device umount
unmount_image
...
---
Note this is the *opposite* of what is done in the correct exit path.
In the failure case, if a script fails in the finalise stages it leads
to /proc, /sys, /dev etc. still being mounted inside the image; the
"dib-block-device umount" call doesn't know anything about these
mounts and tries to unmount the parent directory, and we get a hard
failure with a busy mount, and all the mounts are subsequently leaked.
Note that "unmount_dir", which is ultimately called by
"unmount_image", already knows to skip those mounts that
"dib-block-device umount" manages (this is the DIB_MOUNTPOINTS list).
This is further evidence it should be called *before* the
dib-block-device umount.
Change-Id: Ibef3ce9d1167b9c4ff3d5717b113cd3ed374f5e3
The path $TMP_BUILD_DIR/mnt becomes the / inside the chroot during
the chroot phases of diskimage-builder. Previously this path was being
created using the account running diskimage-builder. This account may
not be valid inside the chroot. This causes path validation, when running
on a Ubuntu bionic host, to fail.
This patch chown's the $TMP_BUILD_DIR/mnt to root.root to make sure
that / is owned by a valid account inside the chroot.
Change-Id: Ifedc136baa67c7952942aed2c8cb1041902fef91
Closes-Bug: 1811113
It looks like we dropped running these probably when we moved the
elements around. For testtools to find the test scripts we need to
add the __init__.py files to make the directories look like modules.
Also prevent copying any .pyc or cache files in as hooks.
Change-Id: I66d5f6ee62cc4d9ee14c64e819b4db57d035d09f
I'm not really sure why I originally had --logfile also log to stdout
in I202e1cb200bde17f6d7770cf1e2710bbf4cca64c, but it seem
counter-intuitive (indeed, I just tripped myself up thinking that in a
devstack job "--logfile" would put the logs into a separate file and
avoid the stdout logging, and I wrote it!).
Make it so specifying a --logfile puts dib into quiet mode for stdout.
Explicitly overriding DIB_QUIET will allow both if someone wants that.
Change-Id: I3279c9253eee1c9db69c958b87a0ce73efc0be9b
While trying to get docker image pre-caching to work we couldn't get a
docker daeomon to run within the chrooted environment. However we got
docker running with the help of bwrap outside of the chrooted
environment. The only option so far for this is the block-device.d
phase. But this has the problem that it runs after the image size has
been calculated. This leads to broken builds if the docker images
being pulled are big.
This can be solved by adding a post-root.d phase that runs outside the
chroot but before the image size calculation.
Change-Id: I36c2a81e2d9f5069f18ce5b0d52c5f1c7212c3ae
This is a lot of very low value noise in the logs as these iterate
through all the elements (often doing nothing). Turn it down and add
an echo so we just see what elements it is working on.
Change-Id: I0687de4722766189db9d4a7bd7d3cfb45d387b62
In exploring Gentoo caching, it was realised that we have no way to
bind mount the cache into the finalised image for the finalise.d
phases.
By adding a pre-finalise.d phase that runs outside the chroot, we can
mount outside things into the hierarchy at $TMP_BUILD_DIR/mnt which
are then seen by the in-chroot finalise.d phase.
This is similar to the pre-install phase
Change-Id: I9d782994843383ddf90f62c40498af9925fd9558
Some minor things after looking at these parts.
The dib-run-parts element doesn't do any of the copying any more, so
these comments are wrong.
The reason for the multiple mounts in the bind mount was non-obvious
to modern eyes (as util-linux has handled this for some time).
Formatting fix for the rst
Change-Id: Idb4c9ff32c49aced2c68a5c905bf7a8b2832a5a2
Redirecting our output through outfilter.py is inherently a bit racy,
since the disk-image-create process will exit, and then you might get
outfilter.py flushing any remaining output as it closes.
On an interactive prompt this might lead to final output overwriting
the prompt, etc. This can be a bit confusing when you start running
things in a loop.
If we save the original fd, then on the exit path close the redirected
fd's and wait a little bit for final output (as a result of the
close), we get a more consistent output.
Change-Id: I8efe57ab421c1941e99bdecab62c6e21a87e4584
Strip everything before "site-packages" in the output filename for the
PS4 prompt. This makes the line in debug logs significantly shorter
as we don't have the full virtualenv path every single time. The
important thing -- the file being called in the lib/ dir, is retained.
Change-Id: I00706b6f6c0425c7795f997c08ceda3374dc84b5
When switching to using log-file capture, we're getting
[gentoo/build-succeeds] outfile.write(ts_line.encode('utf-8'))
[gentoo/build-succeeds] UnicodeEncodeError: 'utf-8' codec can't
encode character 'udcc5' in position 59: surrogates not allowed
Use surrogateescape [1] on the output to avoid this
[1] https://www.python.org/dev/peps/pep-0383/
Change-Id: I2c2c537296edfa5a8fe661a41bd5bfb3bfcf57e3