generated from sig_core/wiki-template
Merge pull request 'meeting notes 2023-12-14' (#22) from meeting/2023-12-14 into main
All checks were successful
mkdocs build / build (push) Successful in 26s
All checks were successful
mkdocs build / build (push) Successful in 26s
Reviewed-on: #22
This commit is contained in:
commit
7656c4dc26
1 changed files with 161 additions and 0 deletions
161
docs/events/meeting-notes/2023-12-14.md
Normal file
161
docs/events/meeting-notes/2023-12-14.md
Normal file
|
@ -0,0 +1,161 @@
|
|||
# SIG/HPC meeting 2023-12-14
|
||||
|
||||
## Attendees:
|
||||
* Sherif Nagy
|
||||
* Neil Hanlon
|
||||
* Matt Bidwell
|
||||
* Rich Adams
|
||||
* Chris Simmons
|
||||
* Jeremy Siadal
|
||||
|
||||
## Follow ups
|
||||
|
||||
* No movement on wiki yet, maybe over break
|
||||
* Cnode Kernel - no movement
|
||||
* Compare spec files for Warewulf vs OpenHPC - Done!
|
||||
* Thank you Sherif
|
||||
* Building warewulf4 for rocky 8 and rocky 9
|
||||
* Can we keep this name w/ openhpc?
|
||||
* Chris - next release will also rename warewulf to warewulf3 to distinguish
|
||||
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||
* Requested - No update
|
||||
* Investigate what resources are required for testing - Neil
|
||||
* Not done yet
|
||||
* Create POC script to create tickets when new slurm is available - Neil
|
||||
* No movement
|
||||
* Sherif waiting to hear from Jeremy about Intel GPU drivers
|
||||
* Should have heard from them. Jeremy will follow up with them to see what happened
|
||||
|
||||
## Discussions
|
||||
|
||||
* last meeting of 2023; skip Dec 28th. next meeting will be Jan 11 -- needs announcing
|
||||
* Happy holidays!
|
||||
* slurm naming - slurm22 / slurm23 / slurm24
|
||||
* slurm24 is coming out soon
|
||||
* plan to support whatever schedmd is supporting -- two most recent releases
|
||||
* Testing resources for SIG/HPC
|
||||
* NVidia V100s - OK?
|
||||
* Cannot test MIG (multi-instance GPU) with that device
|
||||
* Sherif can take some of these after they decomm their current HPC, but not sure on timeframe
|
||||
* Need a place to host these, maybe RESF can do something
|
||||
* Neil is tracking this, to have better update in January
|
||||
* chris is working on testing for different schedulers
|
||||
* Warewulf -- OpenHPC needs to make naming more consistent
|
||||
* will remove warewulf4 from builds once it's in Rocky and other openhpc distros
|
||||
* Rocky not worrying about v3, openhpc will continue providing that
|
||||
* Slurm / pmix support
|
||||
* on for rocky 9 branch
|
||||
* there is a pmix5, but ... it's broken. Chris is looking at this over holiday break
|
||||
* rocky only has pmix 3.2, so if we need new features we may need to build and release in the SIG
|
||||
* newer versions (4) are backwards compatible, or, are supposed to be
|
||||
|
||||
## Action items
|
||||
|
||||
* Sherif to finish/complete work on the wiki
|
||||
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||
* Requested
|
||||
* Create POC script to create tickets when new slurm is available - Neil
|
||||
* Change warewulf -> warewulf3 in next openhpc release - Chris
|
||||
* Announce meeting cancelations for December - Neil/Sherif
|
||||
* Look into building pmix4 for rocky and building slurm23.11 w/ pmix support - Sherif
|
||||
* Follow up with Intel Driver team - Jeremy
|
||||
|
||||
## Old business
|
||||
|
||||
### 2023-11-30
|
||||
|
||||
* Sherif to finish/complete work on the wiki
|
||||
* Not done
|
||||
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||
* No updates
|
||||
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||
* Requested
|
||||
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||
* Not done yet
|
||||
* Investigate what resources are required for testing - Neil
|
||||
* Not done yet
|
||||
* Create POC script to create tickets when new slurm is available - Neil
|
||||
|
||||
### 2023-11-15
|
||||
|
||||
* Sherif to finish/complete work on the wiki
|
||||
* Not done
|
||||
* Sherif to add Jeremy and Chris to gitusers and sig_hpc - Done
|
||||
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||
* No updates
|
||||
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||
* Requested
|
||||
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||
* Not done yet
|
||||
* Investigate what resources are required for testing - Neil
|
||||
* Not done yet
|
||||
|
||||
### 2023-11-02
|
||||
* Sherif to work on abit on the wiki - Not done
|
||||
* Sherif to add Jeremy and Chris to the git user groups
|
||||
|
||||
### 2023-10-19:
|
||||
* Sherif to create kernel repo for kernel HPC, kernel-hpc-node, called now kernel-cnode - Done -
|
||||
* Jeermy, to get the ball rolling with intel GPU driver
|
||||
* Stack, Fix the slurm rest daemon and integrated it with openQA
|
||||
|
||||
### 2023-10-05:
|
||||
* None for this meeting, however we should be working on old business action items
|
||||
|
||||
### 2023-09-21:
|
||||
* Sherif: Get the SIG for drivers
|
||||
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||
* Chris: Bench mark nvidia open vs closed source
|
||||
|
||||
### 2023-09-07:
|
||||
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||
|
||||
### 2023-08-24:
|
||||
* Sherif: To push the testing repo file to release package
|
||||
* Sherif: testing / merging the_real_swa scripts
|
||||
|
||||
### 2023-08-10:
|
||||
* Sherif: Looking into the openQA testing - Pending
|
||||
|
||||
### 2023-07-27:
|
||||
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||
* Greg: to reach out to openPBS and cloud charly
|
||||
* Sherif: To update slurm23 to latest - Done -
|
||||
|
||||
### 2023-07-13:
|
||||
* Sherif needs to update the wiki - Done
|
||||
* Sherif to look into MPI stack
|
||||
* Chris will send Sherif a link with intro
|
||||
|
||||
### 2023-06-29:
|
||||
* Sherif release slurm23 sources - Done
|
||||
* Stack and Sherif working on the HPC list
|
||||
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||
|
||||
### 2023-06-15:
|
||||
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||
|
||||
### 2023-06-01:
|
||||
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||
* Plan the openHPC demo Chris / Sherif - Done
|
||||
* Finlise the slurm package with naming / configuration - Done
|
||||
|
||||
### 2023-05-18:
|
||||
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||
|
||||
### 2023-05-04
|
||||
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||
* Start building apptainer - on hold -
|
||||
* Start building singulartiry - on hold -
|
||||
* Start building warewulf - on hold -
|
||||
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||
|
||||
### 2023-04-20
|
||||
* Reach out to other communities “Greg” - on going -
|
||||
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||
* Reaching out to hardware vendors - nothing done yet -
|
||||
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
Loading…
Reference in a new issue