wiki/docs/events/meeting-notes/2024-03-07.md

2.8 KiB

SIG/HPC Meeting 2024-03-07

Attendees

  • Forrest Burt
  • Brian Phan
  • Sherif Nagy
  • Enrico Billi
  • Neil Hanlon
  • Jeremy Siadal
  • Chris Stackpole

Old Business

  • Intel Driver -
    • Sherif is working on this, has a prototype, needs DKMS
      • Used make spec script in the branch to create spec, and import from there
      • We think that upstream should adopt a different format/packaging methodology
        • Perhaps packit could be helpful?
    • What branch/version to use?
      • rhel-specific branches say not to use them; use the 'backports' branches instead
      • sherif appears to be in the right place
    • Next steps:
      • Neil to bring dkms from epel into projects
      • Sherif to upload to public location for review and testing
      • Jeremy to work on testing with some latest hardware
    • AI SIG
      • where will userspace tools live? HPC? AI? Both?
        • Neil: it should be reasonable for us to have the ability to easily release a package in multiple SIGs
  • NVidia GPU driver Testing -
    • Did not get time to review Chris's work - will try to review this cycle
  • Kernel Cnode / MoS
    • re-actioning - Jeremy to work on once he has some time

New Business

  • Testing Warewulf - Brian
    • Current plan: put the tests upstream into Warewulf repo, Testing team can pull from / engage with upstream
      • What precisely are we going to test?
        • Functional/E2E tests -- provision a small cluster, etc (see last week's discussions)
        • Future work can include e.g. slurm
      • Chris to check on status of slurm
  • Packages to bring in
    • List on the wiki; needs updating (along with the rest of the wiki)
    • if anyone wants to bring something in, has questions, etc. Please ask/get in touch!
  • Neil to update the wiki

Open Floor

  • Vulnerability in lustre - related to user namespaces
    • Sherif was working on lustre-server, but it's a beast
    • DDN already builds RPMS, but... is it worth it to rebuild vs just use upstream?
      • Sherif: thinks it makes sense to rebuild against our specific user/kernel space
      • there are lustre-server for 8, but not 9, it appears.. why?
        • documentation supports this but again.. why?
        • Sherif to look into why lustre-server exists for 8 but not 9
  • Next meeting in two weeks on Thursday, March 1

Action Items

  • Chris to check on status of slurm
  • Neil to update the wiki
  • Sherif to look into why lustre-server exists for 8 but not 9
  • Neil to bring dkms from epel into projects
  • Sherif to upload to public location for review and testing
  • Jeremy to work on testing with some latest hardware