# SIG/HPC meeting 2024-01-11 ## Attendees: * Sherif Nagy * Neil Hanlon * Matt Bidwell * Brian Phan * Forrest Burt ## Follow ups * Sherif to finish/complete work on the wiki * still working * Decide what is being put into cnode kernel, what is being removed - Jeremy * still working * Request SB Certs for HPC cnode kernel from Security - Requested * Requested * Create POC script to create tickets when new slurm is available - Neil * Neil will work on it this month * Change warewulf -> warewulf3 in next openhpc release - Chris * no update * Announce meeting cancelations for December - Neil/Sherif * done * Look into building pmix4 for rocky and building slurm23.11 w/ pmix support - Sherif * Follow up with Intel Driver team - Jeremy * no updates on intel drivers ## Discussions * slurm23 will have two packages for the different versions * slurm22 will probably be EoL by upstream * will create slurm23.11 pacakge to differentiate from slurm, as slurm23.05 is stable * Nvidia drivers - check on status of open vs closed ones, what can we distribute? * Intersection with SIG/AI * Testing of HPC packages - work with Testing team * smoke tests, ensure clusters work, etc * Reach out to Sherif if interested in volunteering to work on this * slurm / pmix5 * there is interest in building against latest PMIX, but the latest (version 5) is broken * sounding like this is pretty widespread * no update on this just yet ## Action items * Update wiki * Refine package list of what the SIG publishes, how to use them * Some packages are up for grabs, recruit folks to contribute * Maybe make tickets for these so people can claim them? * Cnode Kernel - no movement * Request SB Certs for HPC cnode kernel from Security - Requested * Investigate what resources are required for testing - Neil * Create POC script to create tickets when new slurm is available - Neil * Sherif waiting to hear from Jeremy about Intel GPU drivers ## Old business ### 2023-12-14 * No movement on wiki yet, maybe over break * Cnode Kernel - no movement * Compare spec files for Warewulf vs OpenHPC - Done! * Thank you Sherif * Building warewulf4 for rocky 8 and rocky 9 * Can we keep this name w/ openhpc? * Chris - next release will also rename warewulf to warewulf3 to distinguish * Request SB Certs for HPC cnode kernel from Security - Requested * Requested - No update * Investigate what resources are required for testing - Neil * Not done yet * Create POC script to create tickets when new slurm is available - Neil * No movement * Sherif waiting to hear from Jeremy about Intel GPU drivers * Should have heard from them. Jeremy will follow up with them to see what happened ### 2023-11-30 * Sherif to finish/complete work on the wiki * Not done * Decide what is being put into cnode kernel, what is being removed - Jeremy * No updates * Request SB Certs for HPC cnode kernel from Security - Requested * Requested * Compare spec files for Warewulf vs OpenHPC - Sherif * Not done yet * Investigate what resources are required for testing - Neil * Not done yet * Create POC script to create tickets when new slurm is available - Neil ### 2023-11-15 * Sherif to finish/complete work on the wiki * Not done * Sherif to add Jeremy and Chris to gitusers and sig_hpc - Done * Decide what is being put into cnode kernel, what is being removed - Jeremy * No updates * Request SB Certs for HPC cnode kernel from Security - Requested * Requested * Compare spec files for Warewulf vs OpenHPC - Sherif * Not done yet * Investigate what resources are required for testing - Neil * Not done yet ### 2023-11-02 * Sherif to work on abit on the wiki - Not done * Sherif to add Jeremy and Chris to the git user groups ### 2023-10-19: * Sherif to create kernel repo for kernel HPC, kernel-hpc-node, called now kernel-cnode - Done - * Jeermy, to get the ball rolling with intel GPU driver * Stack, Fix the slurm rest daemon and integrated it with openQA ### 2023-10-05: * None for this meeting, however we should be working on old business action items ### 2023-09-21: * Sherif: Get the SIG for drivers * Sherif: Check the names of nvidia drivers "open , dkms and closed source" * Chris: Bench mark nvidia open vs closed source ### 2023-09-07: * Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them - ### 2023-08-24: * Sherif: To push the testing repo file to release package * Sherif: testing / merging the_real_swa scripts ### 2023-08-10: * Sherif: Looking into the openQA testing - Pending ### 2023-07-27: * Sherif: Reach out to jose-d about pmix - Done, no feedback yet - * Greg: to reach out to openPBS and cloud charly * Sherif: To update slurm23 to latest - Done - ### 2023-07-13: * Sherif needs to update the wiki - Done * Sherif to look into MPI stack * Chris will send Sherif a link with intro ### 2023-06-29: * Sherif release slurm23 sources - Done * Stack and Sherif working on the HPC list * Sherif email Jeremy, the slurm23 source URL - Done ### 2023-06-15: * Sherif to look int openHPC slurm spec file - Pending on Sherif * We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR ### 2023-06-01: * Get a list of packages from Jeremy to pick up from openHPC - Done * Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools * Plan the openHPC demo Chris / Sherif - Done * Finlise the slurm package with naming / configuration - Done ### 2023-05-18: * Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done * Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done ### 2023-05-04 * Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed - * Start building apptainer - on hold - * Start building singulartiry - on hold - * Start building warewulf - on hold - * Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat - ### 2023-04-20 * Reach out to other communities “Greg” - on going - * Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-” * Reaching out to hardware vendors - nothing done yet - * Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -