From bfd3959b08c7a65ac76b8ee3cf7229535ce63b81 Mon Sep 17 00:00:00 2001 From: Neil Hanlon Date: Thu, 7 Mar 2024 16:36:13 -0500 Subject: [PATCH] add meeting minutes for 2024-03-07 --- docs/events/meeting-notes/2024-03-07.md | 66 +++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 docs/events/meeting-notes/2024-03-07.md diff --git a/docs/events/meeting-notes/2024-03-07.md b/docs/events/meeting-notes/2024-03-07.md new file mode 100644 index 0000000..e74e0c7 --- /dev/null +++ b/docs/events/meeting-notes/2024-03-07.md @@ -0,0 +1,66 @@ +# SIG/HPC Meeting 2024-03-07 + +## Attendees + +* Forrest Burt +* Brian Phan +* Sherif Nagy +* Enrico Billi +* Neil Hanlon +* Jeremy Siadal +* Chris Stackpole + +## Old Business + +* Intel Driver - + * Sherif is working on this, has a prototype, needs DKMS + * Used `make spec` script in the branch to create spec, and import from there + * We think that upstream should adopt a different format/packaging methodology + * Perhaps [packit](https://packit.dev) could be helpful? + * What branch/version to use? + * rhel-specific branches say not to use them; use the 'backports' branches instead + * sherif appears to be in the right place + * Next steps: + * Neil to bring dkms from epel into projects + * Sherif to upload to public location for review and testing + * Jeremy to work on testing with some latest hardware + * AI SIG + * where will userspace tools live? HPC? AI? Both? + * Neil: it should be reasonable for us to have the ability to easily release a package in multiple SIGs +* NVidia GPU driver Testing - + * Did not get time to review [Chris's work](https://github.com/mghpcsim/gpu-testing/tree/master) - will try to review this cycle +* Kernel Cnode / MoS + * re-actioning - Jeremy to work on once he has some time + +## New Business + +* Testing Warewulf - Brian + * Current plan: put the tests upstream into Warewulf repo, Testing team can pull from / engage with upstream + * What precisely are we going to test? + * Functional/E2E tests -- provision a small cluster, etc (see last week's [discussions](https://sig-hpc.rocky.page/events/meeting-notes/2024-02-22/#discussions)) + * Future work can include e.g. slurm + * Chris to check on status of slurm +* Packages to bring in + * [List](https://sig-hpc.rocky.page/packages/) on the wiki; needs updating (along with the rest of the wiki) + * if anyone wants to bring something in, has questions, etc. Please ask/get in touch! +* Neil to update the wiki + +## Open Floor + +* Vulnerability in [lustre](http://lists.lustre.org/pipermail/lustre-announce-lustre.org/2024/000270.html) - related to user namespaces + * Sherif was working on lustre-server, but it's a beast + * DDN already builds RPMS, but... is it worth it to rebuild vs just use upstream? + * Sherif: thinks it makes sense to rebuild against our specific user/kernel space + * there are lustre-server for 8, but not 9, it appears.. why? + * documentation supports this but again.. why? + * Sherif to look into why lustre-server exists for 8 but not 9 +* Next meeting in two weeks on Thursday, March 1 + +## Action Items + +* [ ] Chris to check on status of slurm +* [ ] Neil to update the wiki +* [ ] Sherif to look into why lustre-server exists for 8 but not 9 +* [ ] Neil to bring dkms from epel into projects +* [ ] Sherif to upload to public location for review and testing +* [ ] Jeremy to work on testing with some latest hardware