# SIG/HPC Meeting 2024-03-07 ## Attendees * Forrest Burt * Brian Phan * Sherif Nagy * Enrico Billi * Neil Hanlon * Jeremy Siadal * Chris Stackpole ## Old Business * Intel Driver - * Sherif is working on this, has a prototype, needs DKMS * Used `make spec` script in the branch to create spec, and import from there * We think that upstream should adopt a different format/packaging methodology * Perhaps [packit](https://packit.dev) could be helpful? * What branch/version to use? * rhel-specific branches say not to use them; use the 'backports' branches instead * sherif appears to be in the right place * Next steps: * Neil to bring dkms from epel into projects * Sherif to upload to public location for review and testing * Jeremy to work on testing with some latest hardware * AI SIG * where will userspace tools live? HPC? AI? Both? * Neil: it should be reasonable for us to have the ability to easily release a package in multiple SIGs * NVidia GPU driver Testing - * Did not get time to review [Chris's work](https://github.com/mghpcsim/gpu-testing/tree/master) - will try to review this cycle * Kernel Cnode / MoS * re-actioning - Jeremy to work on once he has some time ## New Business * Testing Warewulf - Brian * Current plan: put the tests upstream into Warewulf repo, Testing team can pull from / engage with upstream * What precisely are we going to test? * Functional/E2E tests -- provision a small cluster, etc (see last week's [discussions](https://sig-hpc.rocky.page/events/meeting-notes/2024-02-22/#discussions)) * Future work can include e.g. slurm * Chris to check on status of slurm * Packages to bring in * [List](https://sig-hpc.rocky.page/packages/) on the wiki; needs updating (along with the rest of the wiki) * if anyone wants to bring something in, has questions, etc. Please ask/get in touch! * Neil to update the wiki ## Open Floor * Vulnerability in [lustre](http://lists.lustre.org/pipermail/lustre-announce-lustre.org/2024/000270.html) - related to user namespaces * Sherif was working on lustre-server, but it's a beast * DDN already builds RPMS, but... is it worth it to rebuild vs just use upstream? * Sherif: thinks it makes sense to rebuild against our specific user/kernel space * there are lustre-server for 8, but not 9, it appears.. why? * documentation supports this but again.. why? * Sherif to look into why lustre-server exists for 8 but not 9 * Next meeting in two weeks on Thursday, March 1 ## Action Items * [ ] Chris to check on status of slurm * [ ] Neil to update the wiki * [ ] Sherif to look into why lustre-server exists for 8 but not 9 * [ ] Neil to bring dkms from epel into projects * [ ] Sherif to upload to public location for review and testing * [ ] Jeremy to work on testing with some latest hardware