wiki/docs/events/meeting-notes/2023-11-30.md

4.8 KiB

SIG/HPC meeting 2023-11-30

Attendees:

* Sherif Nagy
* Neil Hanlon
* Matt Bidwell

Discussions:

  • Testing infrastructure
    • We can get a couple graphics cards, donated from ICHEC, needs to wait for decom
    • Neil will follow up and start creating a plan
  • Automation for slurm and other packages
    • Create a script to create a ticket when an update is available for a package
      • Query release-monitoring.org API for available updates
  • No updates on cnode kernel yet
  • Question on EoL/unsupported artifacts -- would we remove RPM/sources which we know have security vulnerabilities?
    • schedmd does pull the bad versions' source and rpms, e.g.
    • A: we don't have really any obligation to remove old artifacts which might have very vulnerable code, as we can't really control anything about the user's system aside from providing the latest, fixed artifacts

Action items:

  • Sherif to finish/complete work on the wiki
    • Not done
  • Decide what is being put into cnode kernel, what is being removed - Jeremy
    • No updates
  • Request SB Certs for HPC cnode kernel from Security - Requested
    • Requested
  • Compare spec files for Warewulf vs OpenHPC - Sherif
    • Not done yet
  • Investigate what resources are required for testing - Neil
    • Not done yet
  • Create POC script to create tickets when new slurm is available - Neil

Old business:

2023-11-15

  • Sherif to finish/complete work on the wiki
    • Not done
  • Sherif to add Jeremy and Chris to gitusers and sig_hpc - Done
  • Decide what is being put into cnode kernel, what is being removed - Jeremy
    • No updates
  • Request SB Certs for HPC cnode kernel from Security - Requested
    • Requested
  • Compare spec files for Warewulf vs OpenHPC - Sherif
    • Not done yet
  • Investigate what resources are required for testing - Neil
    • Not done yet

2023-11-02

* Sherif to work on abit on the wiki - Not done
* Sherif to add Jeremy and Chris to the git user groups

2023-10-19:

* Sherif to create kernel repo for kernel HPC, kernel-hpc-node, called now kernel-cnode - Done -
* Jeermy, to get the ball rolling with intel GPU driver
* Stack, Fix the slurm rest daemon and integrated it with openQA

2023-10-05:

* None for this meeting, however we should be working on old business action items

2023-09-21:

* Sherif: Get the SIG for drivers
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
* Chris: Bench mark nvidia open vs closed source

2023-09-07:

* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -

2023-08-24:

* Sherif: To push the testing repo file to release package
* Sherif: testing / merging the_real_swa scripts

2023-08-10:

* Sherif: Looking into the openQA testing - Pending

2023-07-27:

* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
* Greg: to reach out to openPBS and cloud charly
* Sherif: To update slurm23 to latest - Done -

2023-07-13:

* Sherif needs to update the wiki - Done
* Sherif to look into MPI stack
* Chris will send Sherif a link with intro

2023-06-29:

* Sherif release slurm23 sources - Done
* Stack and Sherif working on the HPC list
* Sherif email Jeremy, the slurm23 source URL - Done

2023-06-15:

* Sherif to look int openHPC slurm spec file - Pending on Sherif
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR

2023-06-01:

* Get a list of packages from Jeremy to pick up from openHPC - Done
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
* Plan the openHPC demo Chris / Sherif - Done
* Finlise the slurm package with naming / configuration - Done

2023-05-18:

* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done

2023-05-04

* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
* Start building apptainer - on hold -
* Start building singulartiry - on hold -
* Start building warewulf - on hold -
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -

2023-04-20

* Reach out to other communities “Greg” - on going -
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
* Reaching out to hardware vendors - nothing done yet -
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -