add meeting minutes for 2024-03-07

This commit is contained in:
Neil Hanlon 2024-03-07 16:36:13 -05:00
parent b3ef86a981
commit bfd3959b08
Signed by: neil
GPG Key ID: 705BC21EC3C70F34

View File

@ -0,0 +1,66 @@
# SIG/HPC Meeting 2024-03-07
## Attendees
* Forrest Burt
* Brian Phan
* Sherif Nagy
* Enrico Billi
* Neil Hanlon
* Jeremy Siadal
* Chris Stackpole
## Old Business
* Intel Driver -
* Sherif is working on this, has a prototype, needs DKMS
* Used `make spec` script in the branch to create spec, and import from there
* We think that upstream should adopt a different format/packaging methodology
* Perhaps [packit](https://packit.dev) could be helpful?
* What branch/version to use?
* rhel-specific branches say not to use them; use the 'backports' branches instead
* sherif appears to be in the right place
* Next steps:
* Neil to bring dkms from epel into projects
* Sherif to upload to public location for review and testing
* Jeremy to work on testing with some latest hardware
* AI SIG
* where will userspace tools live? HPC? AI? Both?
* Neil: it should be reasonable for us to have the ability to easily release a package in multiple SIGs
* NVidia GPU driver Testing -
* Did not get time to review [Chris's work](https://github.com/mghpcsim/gpu-testing/tree/master) - will try to review this cycle
* Kernel Cnode / MoS
* re-actioning - Jeremy to work on once he has some time
## New Business
* Testing Warewulf - Brian
* Current plan: put the tests upstream into Warewulf repo, Testing team can pull from / engage with upstream
* What precisely are we going to test?
* Functional/E2E tests -- provision a small cluster, etc (see last week's [discussions](https://sig-hpc.rocky.page/events/meeting-notes/2024-02-22/#discussions))
* Future work can include e.g. slurm
* Chris to check on status of slurm
* Packages to bring in
* [List](https://sig-hpc.rocky.page/packages/) on the wiki; needs updating (along with the rest of the wiki)
* if anyone wants to bring something in, has questions, etc. Please ask/get in touch!
* Neil to update the wiki
## Open Floor
* Vulnerability in [lustre](http://lists.lustre.org/pipermail/lustre-announce-lustre.org/2024/000270.html) - related to user namespaces
* Sherif was working on lustre-server, but it's a beast
* DDN already builds RPMS, but... is it worth it to rebuild vs just use upstream?
* Sherif: thinks it makes sense to rebuild against our specific user/kernel space
* there are lustre-server for 8, but not 9, it appears.. why?
* documentation supports this but again.. why?
* Sherif to look into why lustre-server exists for 8 but not 9
* Next meeting in two weeks on Thursday, March 1
## Action Items
* [ ] Chris to check on status of slurm
* [ ] Neil to update the wiki
* [ ] Sherif to look into why lustre-server exists for 8 but not 9
* [ ] Neil to bring dkms from epel into projects
* [ ] Sherif to upload to public location for review and testing
* [ ] Jeremy to work on testing with some latest hardware