generated from sig_core/wiki-template
Compare commits
61 Commits
Author | SHA1 | Date | |
---|---|---|---|
c7fab19faa | |||
c7605f8034 | |||
0815462440 | |||
bfd3959b08 | |||
b3ef86a981 | |||
47af201e76 | |||
6f3fe414da | |||
7f5c79ba52 | |||
8851c71de1 | |||
f86483a8cd | |||
7656c4dc26 | |||
3ccb7cc9ef | |||
5706ef5b93 | |||
5ca159ce0f | |||
3cd70d1bb6 | |||
cfa0e4d714 | |||
67b68bc72e | |||
dbbabacbd8 | |||
ded39ddd6c | |||
31d7988504 | |||
e7dc9264e5 | |||
6fb857d722 | |||
dcaa1434ec | |||
cb33b03fe0 | |||
5dd69a0fbb | |||
f0a1fee457 | |||
da85f54b63 | |||
d8a1814e61 | |||
12e96a59bd | |||
70e2053ddd | |||
b76f5c001f | |||
82873c9759 | |||
81f7428fa2 | |||
5c2d652f97 | |||
b29806d51f | |||
53aec7fed4 | |||
d9bb59459e | |||
d75b49774a | |||
843bbbc1c7 | |||
56b55222f9 | |||
0787b5a86a | |||
211aec1f0d | |||
bcec901f3f | |||
c6dbf49aac | |||
077427b34d | |||
1f8e6fed60 | |||
3672ee17d5 | |||
2f7d5d4929 | |||
3ac9e13af3 | |||
eb6fa8489b | |||
2f5ef24b28 | |||
36432666a4 | |||
6bc29dce9e | |||
e31cf98718 | |||
6e8daa328a | |||
c869953ec0 | |||
78e9442fcc | |||
461e9d16bd | |||
a4d8111e7a | |||
68e1f239da | |||
821564dea8 |
1
docs/about.md
Normal file
1
docs/about.md
Normal file
@ -0,0 +1 @@
|
|||||||
|
TBD
|
2
docs/contact.md
Normal file
2
docs/contact.md
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
# Contact US
|
||||||
|
We hang out in our [SIG/HPC Mattermost channel](https://chat.rockylinux.org/rocky-linux/channels/sig-hpc) and #rockylinux-sig-hpc on irc.libera.chat "bridged to our MatterMost channel" also our [SIG forums are located here](https://forums.rockylinux.org/c/sig/hpc/61)
|
3
docs/events.md
Normal file
3
docs/events.md
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
# SIG/HPC Meeting
|
||||||
|
|
||||||
|
We are meeting twice a month on bi-weekly bases on Thursday at 9:00 PM UTC here on [Google meet](https://meet.google.com/hsy-qnoe-dxx) - for now -
|
71
docs/events/meeting-notes/2023-04-20.md
Normal file
71
docs/events/meeting-notes/2023-04-20.md
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
# SIG/HPC meeting 2023-04-20
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
|
||||||
|
* Alan Marshall
|
||||||
|
* Nje
|
||||||
|
* Neil Hanlon
|
||||||
|
* Matt Bidwell
|
||||||
|
* David (NezSez)
|
||||||
|
* Jonathan Andreson
|
||||||
|
* Stack
|
||||||
|
* Balaji
|
||||||
|
* Sherif
|
||||||
|
* Gregorgy Kurzer
|
||||||
|
* David DeBonis
|
||||||
|
|
||||||
|
|
||||||
|
## Quick round of introduction
|
||||||
|
|
||||||
|
Everyone introduced themselves
|
||||||
|
|
||||||
|
## Definition of stakeholders
|
||||||
|
|
||||||
|
"still needs lots to clarification and classification since those are very wide terms"
|
||||||
|
|
||||||
|
* HPC End-user ?maybe?
|
||||||
|
* HPC Systems admins and engineers, to provide them with tools and know how to build HPC clusters using Rocky linux
|
||||||
|
* HPC Vendors, however the SIG has to be vendor neutral and agnostic
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Stack: we need to make sure that we are not redoing efforts that already done with other groups
|
||||||
|
Greg engaged with Open HPC community and providing some core packages such as apptainer, mpi, openHPC
|
||||||
|
|
||||||
|
Sherif: we need to have one hat to fit most of all but we can't have one hat that fit all
|
||||||
|
Stack: Feedback regarding Sherif's idea that generic idea's are not great idea and there is a bad performance
|
||||||
|
Greg: we need to put building blocks in the this repo and will make life easiest and lower the barriers like Spack, slurm and easybuild
|
||||||
|
|
||||||
|
Devid (NezSez): Some end users won't understand / know anything about HPC and just needs to use the HPC, such as Maya or dynamic fluids
|
||||||
|
|
||||||
|
Neil: some tools can be very easily an entry point for organization and teams to use HPC like jupiter playbook
|
||||||
|
|
||||||
|
Stack: HPC is usually tuned to different needs, we can reach to other HPC that are running Rocky to ask them to promate rocky and establish a dialog to get an idea of what things that they are running into rocky
|
||||||
|
|
||||||
|
Matt: HPC out of the box there are few projects that doing that and we don't need to run in circles of what we are going to
|
||||||
|
|
||||||
|
Balaji: SIG for scientific application that focus on support the application and optimization, and HPC suggest the architecture to reach max capabilities
|
||||||
|
|
||||||
|
Greg: Agreeing with stack we don't want to provide application that there are tools that do that
|
||||||
|
|
||||||
|
|
||||||
|
Gregory Kurtzer (Chat):
|
||||||
|
A simple strategy might be just to start assembling a list of packages we want to include as part of SIG/HPC, and be open minded as this list expands.
|
||||||
|
|
||||||
|
Neil Hanlon(Chat):
|
||||||
|
actually have to leave now, but, if we make some sort of toolkit, it has to be quite unopinionated... OpenStack-Ansible is a good example of being unopinionated about how you run your openstack cluster(s), but give you all the tools to customize and tune to your unique situation, too
|
||||||
|
|
||||||
|
## Remarks:
|
||||||
|
* A point raised, should be rebuild some packages that area already in Epel or not and if we shall have a higher priority on our repo or not
|
||||||
|
* We need to think more about conflicts with other SIGs like lustre and sig storage
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
|
||||||
|
* List of applications “Thread on MM to post pkgs”
|
||||||
|
* Building blocks which are each pkg as a building block such as lustre, openHPC, slurm, etc…
|
||||||
|
* Reach out to other communities “Greg”
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want
|
||||||
|
* Meeting will be bi-weekly “Tantive Thursday 9:00PM UTC”
|
||||||
|
* Documentations
|
71
docs/events/meeting-notes/2023-05-04.md
Normal file
71
docs/events/meeting-notes/2023-05-04.md
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
# SIG/HPC meeting 2023-05-04
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
|
||||||
|
* Neil Hanlon
|
||||||
|
* Matt Bidwell
|
||||||
|
* Stack
|
||||||
|
* Sherif
|
||||||
|
* Nick Eggleston
|
||||||
|
* Gregory Kurtzer
|
||||||
|
* Forrest Burt
|
||||||
|
|
||||||
|
## Package lists
|
||||||
|
* Slurm - Epel
|
||||||
|
* Apptainer - Epel
|
||||||
|
* Lustre - lustre.org , no server for el9
|
||||||
|
* Warewulf - HPCNG github only el8
|
||||||
|
* Easybuild
|
||||||
|
* OpenHPC
|
||||||
|
* Spack
|
||||||
|
* openmpi *with slurm support*
|
||||||
|
* glusterfs-server gluster-selinux
|
||||||
|
* NIS, ypserv , ypbind, yptools nss_nis
|
||||||
|
* fail2ban
|
||||||
|
* Lmod
|
||||||
|
* conda
|
||||||
|
* sstack
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Greg: suggesting to have our own slurm, apptainer, singulatory, Warewulf
|
||||||
|
|
||||||
|
Greg: We can reach to DDN about anything related to Luster
|
||||||
|
|
||||||
|
Sherif: Suggesting to start building packages
|
||||||
|
|
||||||
|
Nick: To build the community we need to start looking into documentation and forums
|
||||||
|
|
||||||
|
Stack: we need to be careful and have strong justification for rebuilding stuff that exists in Epel
|
||||||
|
|
||||||
|
Greg: asked how HPC centre prefer to manage / or already managing their slurm setup
|
||||||
|
|
||||||
|
Few members mentioned one of the following two methods:
|
||||||
|
* Keep upgrading on minor version of slurm
|
||||||
|
* Keep upgrading on minor version of slurm then a major upgrade in a scheduled maintains window
|
||||||
|
|
||||||
|
Greg and Nick: adding major-minor version in package name something like python2/3
|
||||||
|
|
||||||
|
Sherif: Asking about Testing methodology with testing team
|
||||||
|
|
||||||
|
Stack: They hope at some point they are able to test all sigs and working on getting OpenQA build for this
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
|
||||||
|
* Start building slurm
|
||||||
|
* Start building apptainer
|
||||||
|
* Start building singulartiry
|
||||||
|
* Start building warewulf
|
||||||
|
* Greg reach out for OpenHPC - done
|
||||||
|
* Sherif: check about fourms
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
* List of applications “Thread on MM to post pkgs” - We have an idea now of which packages we need to build -
|
||||||
|
* Building blocks which are each pkg as a building block such as lustre, openHPC, slurm, etc… - We have an idea of what we need to do -
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
||||||
|
* Meeting will be bi-weekly “Tantive Thursday 9:00PM UTC” - Agreed -
|
||||||
|
* Documentations - Wiki is in place but still need some work -
|
51
docs/events/meeting-notes/2023-05-18.md
Normal file
51
docs/events/meeting-notes/2023-05-18.md
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
# SIG/HPC meeting 2023-05-18
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Stack
|
||||||
|
* Forrest Burt
|
||||||
|
* Nick Eggleston
|
||||||
|
* David H
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Al Bowles
|
||||||
|
* Chris Simmons
|
||||||
|
* Sherif
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
Chris: Are we willing to support all openHPC stack or just the modules and how we imagine achieving this?
|
||||||
|
|
||||||
|
Jeremy: Clear a bit of distro related stuff from openHPC would be great such as automake / autoconf
|
||||||
|
|
||||||
|
Stack: We need to have a base line so people can start use rocky on HPC and make Rocky accessible
|
||||||
|
|
||||||
|
Chris: A Demo / technical talk in 4 weeks
|
||||||
|
|
||||||
|
Chris: Are we going to focus on 8 and 9?
|
||||||
|
|
||||||
|
Stack and Chris, would be great if we can focus on 9
|
||||||
|
|
||||||
|
Sherif: I hope we can do both but with 9 in the spotlight "this needs to be a SIG decision"
|
||||||
|
|
||||||
|
Stack: Question, if we start moving openHPC within HPC sig are they going support more distros, we don't want to break packages for other EL distros
|
||||||
|
|
||||||
|
Chris: so far testing on Rocky as the only supported EL distro
|
||||||
|
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris"
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks"
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going , a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
56
docs/events/meeting-notes/2023-06-01.md
Normal file
56
docs/events/meeting-notes/2023-06-01.md
Normal file
@ -0,0 +1,56 @@
|
|||||||
|
# SIG/HPC meeting 2023-06-01
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Sherif
|
||||||
|
* Gregory Kurtzer
|
||||||
|
* David DeBonis
|
||||||
|
* Chris Simmons
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
Getting toolchains outside of openHPC such as automake
|
||||||
|
|
||||||
|
Greg: We need to talk if we need to have a generic SIG for toolchains
|
||||||
|
|
||||||
|
Greg: We need to look into adding more release packages such as intel compiler
|
||||||
|
|
||||||
|
Brain storm ideas about optimizing binaries
|
||||||
|
|
||||||
|
David: What would be the interest of having a light weight kernel for HPC
|
||||||
|
|
||||||
|
Jeremy: mentioning intel light weight kernel https://github.com/intel/mos
|
||||||
|
|
||||||
|
Chris: asking if there is any benchmark, hard numbers between shipped kernel and light weight kernel, so far, nothing solid
|
||||||
|
|
||||||
|
Sherif: Slurm now is build but not in standard path and we agreed we are going to move standard path
|
||||||
|
|
||||||
|
Greg: make sure you have the provide type
|
||||||
|
|
||||||
|
Chris: also make sure that downgrade works
|
||||||
|
|
||||||
|
Greg and Chris, we can also contribute to openHPC documentation
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif
|
||||||
|
* Finlise the slurm package with naming / configuration
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris"
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks"
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going , a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
59
docs/events/meeting-notes/2023-06-15.md
Normal file
59
docs/events/meeting-notes/2023-06-15.md
Normal file
@ -0,0 +1,59 @@
|
|||||||
|
# SIG/HPC meeting 2023-06-15
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Chris Simmons
|
||||||
|
* Nick Eggleston
|
||||||
|
* Forrest Burt
|
||||||
|
* Stack
|
||||||
|
* David DeBonis
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Greg Kurtzer
|
||||||
|
* Sherif
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Chris gave a quick demo about openHPC / presentation
|
||||||
|
|
||||||
|
Jeremy sent the packages
|
||||||
|
|
||||||
|
Greg: asked how the SIG's slurm is compatible with openHPC
|
||||||
|
|
||||||
|
Sherif needs to look at openHPC slurm packages
|
||||||
|
|
||||||
|
Chris we need to look on how to build easybuild and look into how to improve it
|
||||||
|
|
||||||
|
Chris and Greg talking about if there is any standard that explains how to build systems compatible with each others, openHPC does follow best practices from different entities
|
||||||
|
|
||||||
|
Chris provided https://github.com/holgerBerger/hpc-workspace which now a part of openHPC
|
||||||
|
|
||||||
|
Sherif mentioned, forums category is now in place https://forums.rockylinux.org/c/sig/hpc/61
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Sherif to look int openHPC slurm spec file
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
65
docs/events/meeting-notes/2023-06-29.md
Normal file
65
docs/events/meeting-notes/2023-06-29.md
Normal file
@ -0,0 +1,65 @@
|
|||||||
|
# SIG/HPC meeting 2023-06-29
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Matt Bidwell
|
||||||
|
* Al Bowles
|
||||||
|
* Forrest Burt
|
||||||
|
* David H
|
||||||
|
* Trevor Cooper
|
||||||
|
* Jeremy Siadal
|
||||||
|
* David DeBonis
|
||||||
|
* Sherif
|
||||||
|
* Brian Clemens
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Sherif, Explains how slurm packaging is done so far
|
||||||
|
|
||||||
|
Sherif, Recap what we are doing in the SIG in terms of packages and so on
|
||||||
|
|
||||||
|
Al, do we have a testing plan, are we going to use QA?
|
||||||
|
|
||||||
|
Sherif, No testing documentation yet, but we are working on getting this done
|
||||||
|
|
||||||
|
Jeremy, Question about Redhat closing sources
|
||||||
|
|
||||||
|
Sherif, Explained the UBI and the cloud method based on the latest Rocky blog post https://rockylinux.org/news/keeping-open-source-open/
|
||||||
|
|
||||||
|
Jeremy, Maybe in the future we will divert from redhat for example an HPC optimized kernel
|
||||||
|
|
||||||
|
Sherif, we will release the SIG repo packages today
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
|
||||||
|
* Sherif release slurm23 sources
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
67
docs/events/meeting-notes/2023-07-13.md
Normal file
67
docs/events/meeting-notes/2023-07-13.md
Normal file
@ -0,0 +1,67 @@
|
|||||||
|
# SIG/HPC meeting 2023-07-13
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Al Bowles
|
||||||
|
* Sherif
|
||||||
|
* Chris S
|
||||||
|
* Mustafa
|
||||||
|
* Forrest Burt
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Gregory Kurtzer
|
||||||
|
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Sherif mentioning the release of HPC sig repos and slurm22 , slumr23 for rocky 8 and 9
|
||||||
|
|
||||||
|
Chris sent the link to look into openHPC slurm SPEC https://github.com/openhpc/ohpc/tree/3.x/components/rms/slurm/SPECS
|
||||||
|
|
||||||
|
Sherif mostly we will need warewulf 3 and 4 to be build
|
||||||
|
|
||||||
|
Sherif thinks about reaching out to the guys in EPEL to see any collaboration
|
||||||
|
|
||||||
|
Sherif what to look for next?
|
||||||
|
|
||||||
|
Chris, maybe MPI stack and openPBS and MPI intergration with slurm
|
||||||
|
|
||||||
|
Sherif asks for the openHPC unit tests
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Sherif needs to update the wiki
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
82
docs/events/meeting-notes/2023-07-27.md
Normal file
82
docs/events/meeting-notes/2023-07-27.md
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
# SIG/HPC meeting 2023-07-27
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Sherif
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Stack
|
||||||
|
* Scott Groel
|
||||||
|
* Gregory Kurtzer
|
||||||
|
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
Chris, talked about the slurm system unit bug in upstream
|
||||||
|
|
||||||
|
Sherif, asked about the openQA status
|
||||||
|
|
||||||
|
Stack, they are working on that and still improving it
|
||||||
|
|
||||||
|
Sherif, asked about the PMIX support and how to do it
|
||||||
|
|
||||||
|
Jeremy, it is a bit more complex than what it seems but worth doing
|
||||||
|
|
||||||
|
David, what is the added value from moving the PMIX to newer version
|
||||||
|
|
||||||
|
Jeremy, we need to look at the user base and see if this is needed
|
||||||
|
|
||||||
|
Stack, asks about PMIX and what it is
|
||||||
|
|
||||||
|
David, the extension for pmix is more needed when you implement on scale and performance is an issue within slurm
|
||||||
|
|
||||||
|
Greg, we can have pmix into the SIG, that's not a bad idea
|
||||||
|
|
||||||
|
Jeremy, we will also need runtime aspects of it
|
||||||
|
|
||||||
|
David, yes we will need both
|
||||||
|
|
||||||
|
Jeremy, mentioned a package they would like to have into SIG/HPC they will send it to us
|
||||||
|
|
||||||
|
Greg, will reach out to openPBS and cloud charly
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and clout charly
|
||||||
|
* Sherif: To update slurm32 to latest
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
70
docs/events/meeting-notes/2023-08-10.md
Normal file
70
docs/events/meeting-notes/2023-08-10.md
Normal file
@ -0,0 +1,70 @@
|
|||||||
|
# SIG/HPC meeting 2023-08-10
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Scott Groel
|
||||||
|
* Alan Marshall
|
||||||
|
* Nick Eggleston
|
||||||
|
* Stack
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Sherif
|
||||||
|
* Maxine Hayes
|
||||||
|
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
Sherif: summarizing the action items from previous meetings
|
||||||
|
|
||||||
|
Jeremy: talked about some of the packages that needed to be within Rocky
|
||||||
|
|
||||||
|
Sherif: Asked about testing summery
|
||||||
|
|
||||||
|
Alan and Stack: We do have automated testing now and we are working on fixed openQA multi VM issues
|
||||||
|
|
||||||
|
Sherif: speaking about package live cycle with testing and releasing
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Sherif: Looking into the openQA testing
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
69
docs/events/meeting-notes/2023-08-24.md
Normal file
69
docs/events/meeting-notes/2023-08-24.md
Normal file
@ -0,0 +1,69 @@
|
|||||||
|
# SIG/HPC meeting 2023-08-24
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Sherif
|
||||||
|
* David DeBonis
|
||||||
|
* Neil Hanlon
|
||||||
|
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
Sherif, give a recap of what's action missing / pending from last week
|
||||||
|
|
||||||
|
Sherif, Needs to look at the scripts from the_real_swa
|
||||||
|
|
||||||
|
Jeremy, asked about if the SIG will be upstream for openELA or not
|
||||||
|
|
||||||
|
Sherif, at the moment RESF has their own tooling to obtain sources, however it will be a vote for Rocky Linux board and RESF board if we will be downstream of OpenELA
|
||||||
|
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
74
docs/events/meeting-notes/2023-09-07.md
Normal file
74
docs/events/meeting-notes/2023-09-07.md
Normal file
@ -0,0 +1,74 @@
|
|||||||
|
# SIG/HPC meeting 2023-09-07
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Sherif
|
||||||
|
* David DeBonis
|
||||||
|
* Stack
|
||||||
|
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Sherif, Asking if anyone would like that volunteer to maintain some packages
|
||||||
|
|
||||||
|
Jeremy, Will finalize the list of packages and then we can discuss it
|
||||||
|
|
||||||
|
David, looking into specialized drivers such as Nvida drivers maybe they are more suitable for SIG/AI
|
||||||
|
|
||||||
|
Jeremy, we need to look into the intel GPU drivers as well to be a part of the SIG/HPC or SIG/AI
|
||||||
|
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
|
||||||
|
Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
## 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
76
docs/events/meeting-notes/2023-09-21.md
Normal file
76
docs/events/meeting-notes/2023-09-21.md
Normal file
@ -0,0 +1,76 @@
|
|||||||
|
# SIG/HPC meeting 2023-09-21
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Sherif
|
||||||
|
* Nick Eggleston
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Chris S.
|
||||||
|
* Scott Groel
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Jeremy, We need spin off a special SIG for drivers
|
||||||
|
|
||||||
|
Chris, Do we have a benchmark between the nvidia open source vs close source, also we might need to build two versions one of HPC sig and one for drivers sig
|
||||||
|
|
||||||
|
Scott, is there any plans to supports xcat?
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Sherif: Get the SIG for drivers
|
||||||
|
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||||
|
* Chris: Bench mark nvidia open vs closed source
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-09-07:
|
||||||
|
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||||
|
|
||||||
|
## 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
## 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
75
docs/events/meeting-notes/2023-10-05.md
Normal file
75
docs/events/meeting-notes/2023-10-05.md
Normal file
@ -0,0 +1,75 @@
|
|||||||
|
# SIG/HPC meeting 2023-10-05
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Sherif
|
||||||
|
* Stack
|
||||||
|
* Chris S.
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Chris, did some benchmark testing on cloud provider using scripts to install nvidia drivers and compile the open source, so far the closed source driver is performing better, but some more testing needed and we need to publish the results to our wiki
|
||||||
|
|
||||||
|
open source out of tree kernel drivers should be in the SIG kernel as long as they are generic and then any performance enhanced one in the SIG HPC
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
|
||||||
|
None for this meeting, however we should be working on old business action items
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-09-21:
|
||||||
|
* Sherif: Get the SIG for drivers
|
||||||
|
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||||
|
* Chris: Bench mark nvidia open vs closed source
|
||||||
|
|
||||||
|
## 2023-09-07:
|
||||||
|
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||||
|
|
||||||
|
## 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
## 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
83
docs/events/meeting-notes/2023-10-19.md
Normal file
83
docs/events/meeting-notes/2023-10-19.md
Normal file
@ -0,0 +1,83 @@
|
|||||||
|
# SIG/HPC meeting 2023-10-19
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Sherif
|
||||||
|
* Stack
|
||||||
|
* Alan Marshall
|
||||||
|
* Jeremy Siadal
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
Stack, Asks about automating process for building slumr packages, Sherif explained the packaging process work and how we can improve it by using upstream monitoring tools
|
||||||
|
|
||||||
|
Jeremy, suggesting to start working on HPC rocky's kernel, will be mostly based on Rocky standard kernel with different configuration file
|
||||||
|
|
||||||
|
Stack, Found a problem slurmrestd, will look about it for next week
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Sherif to create kernel repo for kernel HPC, kernel-hpc-node
|
||||||
|
* Jeermy, to get the ball rolling with intel GPU driver
|
||||||
|
* Stack, Fix the slurm rest daemon and integrated it with openQA
|
||||||
|
* Sherif, staging repo for HPC
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-10-09:
|
||||||
|
* None for this meeting, however we should be working on old business action items
|
||||||
|
|
||||||
|
## 2023-09-21:
|
||||||
|
* Sherif: Get the SIG for drivers
|
||||||
|
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||||
|
* Chris: Bench mark nvidia open vs closed source
|
||||||
|
|
||||||
|
## 2023-09-07:
|
||||||
|
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||||
|
|
||||||
|
## 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
## 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
88
docs/events/meeting-notes/2023-11-02.md
Normal file
88
docs/events/meeting-notes/2023-11-02.md
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
# SIG/HPC meeting 2023-11-02
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Sherif
|
||||||
|
* Neil Hanlon
|
||||||
|
* Chris S
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Stack
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
Jeremy, gave an overview of kernel-cnode patch status and that he is working on some of the patches
|
||||||
|
|
||||||
|
Sherif, Asked about the intel GPU contacts and Jeremy will send the contacts over
|
||||||
|
|
||||||
|
Stack, still working on the slurm rest daemon
|
||||||
|
|
||||||
|
Chris S, talked about their benchmark for nvidia drivers
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
* Sherif to work on abit on the wiki
|
||||||
|
* Sherif to add Jeremy and Chris to the git user groups
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-10-19:
|
||||||
|
* Sherif to create kernel repo for kernel HPC, kernel-hpc-node, called now kernel-cnode - Done -
|
||||||
|
* Jeermy, to get the ball rolling with intel GPU driver
|
||||||
|
* Stack, Fix the slurm rest daemon and integrated it with openQA
|
||||||
|
|
||||||
|
## 2023-10-05:
|
||||||
|
* None for this meeting, however we should be working on old business action items
|
||||||
|
|
||||||
|
## 2023-09-21:
|
||||||
|
* Sherif: Get the SIG for drivers
|
||||||
|
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||||
|
* Chris: Bench mark nvidia open vs closed source
|
||||||
|
|
||||||
|
## 2023-09-07:
|
||||||
|
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||||
|
|
||||||
|
## 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
## 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
125
docs/events/meeting-notes/2023-11-15.md
Normal file
125
docs/events/meeting-notes/2023-11-15.md
Normal file
@ -0,0 +1,125 @@
|
|||||||
|
# SIG/HPC meeting 2023-11-15
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Sherif Nagy
|
||||||
|
* Neil Hanlon
|
||||||
|
* Chris McGuire
|
||||||
|
* Jeremy Siadal - Intel
|
||||||
|
* Filip Hans Polbratt - NSC
|
||||||
|
* Dr. Chris Simmons - MGHPCC
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
* Jeremy - Intel group quite interested in integrating with Rocky
|
||||||
|
* Likely internal team will provide drivers, but Jeremy will handle the spec file
|
||||||
|
|
||||||
|
* Kernel cnode - repo is created and should be ready
|
||||||
|
* Should be based off Rocky's base kernel, not upstream ML/LTS
|
||||||
|
* MOS - Multi Operating System
|
||||||
|
* LWK - Lightweight Kernel Project
|
||||||
|
* Will need to patch into the scheduler
|
||||||
|
* also will strip out everything that's not needed
|
||||||
|
* what could be put directly in the kernel instead of modules - could also eliminate initrd if needed
|
||||||
|
* Secureboot?
|
||||||
|
* This should be OK to do, but we need to make sure it's still compliant with shim
|
||||||
|
* we are now separating the SB certs by SIG, so we will need to request certs from Security
|
||||||
|
* MOS - some patches in the scheduler are not applying cleanly due to changes in scheduler code
|
||||||
|
* CIQ might be able to help investigate the 6 patches
|
||||||
|
* Intel GPU Driver - Neil didn't catch this
|
||||||
|
* Warewulf
|
||||||
|
* Spec files exist in the github for warewulf
|
||||||
|
* Sherif to investigate
|
||||||
|
* Testing
|
||||||
|
* Currently - Testing team handles this manually at this time
|
||||||
|
* Jeremy would like to have our own test harness
|
||||||
|
* Suggest TMT for maximum cross polination - Testing team should be working on that
|
||||||
|
* Neil can work to provide infrastructure for testing
|
||||||
|
* Must be at least 1 hardware system (x86) with at least two GPUs (Intel)
|
||||||
|
* Need x86 AND arm for Nvidia - A30, etc - 4x cards
|
||||||
|
* need infiniband (and SR/IOV support, and cross talk)
|
||||||
|
* Links
|
||||||
|
* https://tmt.readthedocs.io/en/latest/spec.html
|
||||||
|
* https://docs.fedoraproject.org/en-US/ci/tmt/
|
||||||
|
* Cloud RDMA - GCP
|
||||||
|
* Send this to Kernel SIG :)
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* Sherif to add Jeremy and Chris to gitusers and sig_hpc
|
||||||
|
* Neil added ohpcsim and jcsiadal to gitusers. Sherif will add to gitusers
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-11-02
|
||||||
|
* Sherif to work on abit on the wiki - Not done
|
||||||
|
* Sherif to add Jeremy and Chris to the git user groups
|
||||||
|
|
||||||
|
## 2023-10-19:
|
||||||
|
* Sherif to create kernel repo for kernel HPC, kernel-hpc-node, called now kernel-cnode - Done -
|
||||||
|
* Jeermy, to get the ball rolling with intel GPU driver
|
||||||
|
* Stack, Fix the slurm rest daemon and integrated it with openQA
|
||||||
|
|
||||||
|
## 2023-10-05:
|
||||||
|
* None for this meeting, however we should be working on old business action items
|
||||||
|
|
||||||
|
## 2023-09-21:
|
||||||
|
* Sherif: Get the SIG for drivers
|
||||||
|
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||||
|
* Chris: Bench mark nvidia open vs closed source
|
||||||
|
|
||||||
|
## 2023-09-07:
|
||||||
|
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||||
|
|
||||||
|
## 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
## 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
118
docs/events/meeting-notes/2023-11-30.md
Normal file
118
docs/events/meeting-notes/2023-11-30.md
Normal file
@ -0,0 +1,118 @@
|
|||||||
|
# SIG/HPC meeting 2023-11-30
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Sherif Nagy
|
||||||
|
* Neil Hanlon
|
||||||
|
* Matt Bidwell
|
||||||
|
|
||||||
|
## Discussions:
|
||||||
|
|
||||||
|
* Testing infrastructure
|
||||||
|
* We can get a couple graphics cards, donated from ICHEC, needs to wait for decom
|
||||||
|
* Neil will follow up and start creating a plan
|
||||||
|
* Automation for slurm and other packages
|
||||||
|
* Create a script to create a ticket when an update is available for a package
|
||||||
|
* Query release-monitoring.org API for available updates
|
||||||
|
* No updates on cnode kernel yet
|
||||||
|
* Question on EoL/unsupported artifacts -- would we remove RPM/sources which we know have security vulnerabilities?
|
||||||
|
* schedmd does pull the bad versions' source and rpms, e.g.
|
||||||
|
* A: we don't have really any obligation to remove old artifacts which might have very vulnerable code, as we can't really control anything about the user's system aside from providing the latest, fixed artifacts
|
||||||
|
|
||||||
|
## Action items:
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* Not done
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* No updates
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||||
|
* Not done yet
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Not done yet
|
||||||
|
* Create POC script to create tickets when new slurm is available - Neil
|
||||||
|
|
||||||
|
## Old business:
|
||||||
|
|
||||||
|
## 2023-11-15
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* Not done
|
||||||
|
* Sherif to add Jeremy and Chris to gitusers and sig_hpc - Done
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* No updates
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||||
|
* Not done yet
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Not done yet
|
||||||
|
|
||||||
|
## 2023-11-02
|
||||||
|
* Sherif to work on abit on the wiki - Not done
|
||||||
|
* Sherif to add Jeremy and Chris to the git user groups
|
||||||
|
|
||||||
|
## 2023-10-19:
|
||||||
|
* Sherif to create kernel repo for kernel HPC, kernel-hpc-node, called now kernel-cnode - Done -
|
||||||
|
* Jeermy, to get the ball rolling with intel GPU driver
|
||||||
|
* Stack, Fix the slurm rest daemon and integrated it with openQA
|
||||||
|
|
||||||
|
## 2023-10-05:
|
||||||
|
* None for this meeting, however we should be working on old business action items
|
||||||
|
|
||||||
|
## 2023-09-21:
|
||||||
|
* Sherif: Get the SIG for drivers
|
||||||
|
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||||
|
* Chris: Bench mark nvidia open vs closed source
|
||||||
|
|
||||||
|
## 2023-09-07:
|
||||||
|
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||||
|
|
||||||
|
## 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
## 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
## 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
## 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
## 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
## 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
## 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
## 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
## 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
## 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
161
docs/events/meeting-notes/2023-12-14.md
Normal file
161
docs/events/meeting-notes/2023-12-14.md
Normal file
@ -0,0 +1,161 @@
|
|||||||
|
# SIG/HPC meeting 2023-12-14
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Sherif Nagy
|
||||||
|
* Neil Hanlon
|
||||||
|
* Matt Bidwell
|
||||||
|
* Rich Adams
|
||||||
|
* Chris Simmons
|
||||||
|
* Jeremy Siadal
|
||||||
|
|
||||||
|
## Follow ups
|
||||||
|
|
||||||
|
* No movement on wiki yet, maybe over break
|
||||||
|
* Cnode Kernel - no movement
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Done!
|
||||||
|
* Thank you Sherif
|
||||||
|
* Building warewulf4 for rocky 8 and rocky 9
|
||||||
|
* Can we keep this name w/ openhpc?
|
||||||
|
* Chris - next release will also rename warewulf to warewulf3 to distinguish
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested - No update
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Not done yet
|
||||||
|
* Create POC script to create tickets when new slurm is available - Neil
|
||||||
|
* No movement
|
||||||
|
* Sherif waiting to hear from Jeremy about Intel GPU drivers
|
||||||
|
* Should have heard from them. Jeremy will follow up with them to see what happened
|
||||||
|
|
||||||
|
## Discussions
|
||||||
|
|
||||||
|
* last meeting of 2023; skip Dec 28th. next meeting will be Jan 11 -- needs announcing
|
||||||
|
* Happy holidays!
|
||||||
|
* slurm naming - slurm22 / slurm23 / slurm24
|
||||||
|
* slurm24 is coming out soon
|
||||||
|
* plan to support whatever schedmd is supporting -- two most recent releases
|
||||||
|
* Testing resources for SIG/HPC
|
||||||
|
* NVidia V100s - OK?
|
||||||
|
* Cannot test MIG (multi-instance GPU) with that device
|
||||||
|
* Sherif can take some of these after they decomm their current HPC, but not sure on timeframe
|
||||||
|
* Need a place to host these, maybe RESF can do something
|
||||||
|
* Neil is tracking this, to have better update in January
|
||||||
|
* chris is working on testing for different schedulers
|
||||||
|
* Warewulf -- OpenHPC needs to make naming more consistent
|
||||||
|
* will remove warewulf4 from builds once it's in Rocky and other openhpc distros
|
||||||
|
* Rocky not worrying about v3, openhpc will continue providing that
|
||||||
|
* Slurm / pmix support
|
||||||
|
* on for rocky 9 branch
|
||||||
|
* there is a pmix5, but ... it's broken. Chris is looking at this over holiday break
|
||||||
|
* rocky only has pmix 3.2, so if we need new features we may need to build and release in the SIG
|
||||||
|
* newer versions (4) are backwards compatible, or, are supposed to be
|
||||||
|
|
||||||
|
## Action items
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested
|
||||||
|
* Create POC script to create tickets when new slurm is available - Neil
|
||||||
|
* Change warewulf -> warewulf3 in next openhpc release - Chris
|
||||||
|
* Announce meeting cancelations for December - Neil/Sherif
|
||||||
|
* Look into building pmix4 for rocky and building slurm23.11 w/ pmix support - Sherif
|
||||||
|
* Follow up with Intel Driver team - Jeremy
|
||||||
|
|
||||||
|
## Old business
|
||||||
|
|
||||||
|
### 2023-11-30
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* Not done
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* No updates
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||||
|
* Not done yet
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Not done yet
|
||||||
|
* Create POC script to create tickets when new slurm is available - Neil
|
||||||
|
|
||||||
|
### 2023-11-15
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* Not done
|
||||||
|
* Sherif to add Jeremy and Chris to gitusers and sig_hpc - Done
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* No updates
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||||
|
* Not done yet
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Not done yet
|
||||||
|
|
||||||
|
### 2023-11-02
|
||||||
|
* Sherif to work on abit on the wiki - Not done
|
||||||
|
* Sherif to add Jeremy and Chris to the git user groups
|
||||||
|
|
||||||
|
### 2023-10-19:
|
||||||
|
* Sherif to create kernel repo for kernel HPC, kernel-hpc-node, called now kernel-cnode - Done -
|
||||||
|
* Jeermy, to get the ball rolling with intel GPU driver
|
||||||
|
* Stack, Fix the slurm rest daemon and integrated it with openQA
|
||||||
|
|
||||||
|
### 2023-10-05:
|
||||||
|
* None for this meeting, however we should be working on old business action items
|
||||||
|
|
||||||
|
### 2023-09-21:
|
||||||
|
* Sherif: Get the SIG for drivers
|
||||||
|
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||||
|
* Chris: Bench mark nvidia open vs closed source
|
||||||
|
|
||||||
|
### 2023-09-07:
|
||||||
|
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||||
|
|
||||||
|
### 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
### 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
### 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
### 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
### 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
### 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
### 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
### 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
### 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
### 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
170
docs/events/meeting-notes/2024-01-11.md
Normal file
170
docs/events/meeting-notes/2024-01-11.md
Normal file
@ -0,0 +1,170 @@
|
|||||||
|
# SIG/HPC meeting 2024-01-11
|
||||||
|
|
||||||
|
## Attendees:
|
||||||
|
* Sherif Nagy
|
||||||
|
* Neil Hanlon
|
||||||
|
* Matt Bidwell
|
||||||
|
* Brian Phan
|
||||||
|
* Forrest Burt
|
||||||
|
|
||||||
|
## Follow ups
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* still working
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* still working
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested
|
||||||
|
* Create POC script to create tickets when new slurm is available - Neil
|
||||||
|
* Neil will work on it this month
|
||||||
|
* Change warewulf -> warewulf3 in next openhpc release - Chris
|
||||||
|
* no update
|
||||||
|
* Announce meeting cancelations for December - Neil/Sherif
|
||||||
|
* done
|
||||||
|
* Look into building pmix4 for rocky and building slurm23.11 w/ pmix support - Sherif
|
||||||
|
* Follow up with Intel Driver team - Jeremy
|
||||||
|
* no updates on intel drivers
|
||||||
|
|
||||||
|
## Discussions
|
||||||
|
|
||||||
|
* slurm23 will have two packages for the different versions
|
||||||
|
* slurm22 will probably be EoL by upstream
|
||||||
|
* will create slurm23.11 pacakge to differentiate from slurm, as slurm23.05 is stable
|
||||||
|
* Nvidia drivers - check on status of open vs closed ones, what can we distribute?
|
||||||
|
* Intersection with SIG/AI
|
||||||
|
* Testing of HPC packages - work with Testing team
|
||||||
|
* smoke tests, ensure clusters work, etc
|
||||||
|
* Reach out to Sherif if interested in volunteering to work on this
|
||||||
|
* slurm / pmix5
|
||||||
|
* there is interest in building against latest PMIX, but the latest (version 5) is broken
|
||||||
|
* sounding like this is pretty widespread
|
||||||
|
* no update on this just yet
|
||||||
|
|
||||||
|
## Action items
|
||||||
|
|
||||||
|
* Update wiki
|
||||||
|
* Refine package list of what the SIG publishes, how to use them
|
||||||
|
* Some packages are up for grabs, recruit folks to contribute
|
||||||
|
* Maybe make tickets for these so people can claim them?
|
||||||
|
* Cnode Kernel - no movement
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Create POC script to create tickets when new slurm is available - Neil
|
||||||
|
* Sherif waiting to hear from Jeremy about Intel GPU drivers
|
||||||
|
|
||||||
|
## Old business
|
||||||
|
|
||||||
|
### 2023-12-14
|
||||||
|
|
||||||
|
* No movement on wiki yet, maybe over break
|
||||||
|
* Cnode Kernel - no movement
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Done!
|
||||||
|
* Thank you Sherif
|
||||||
|
* Building warewulf4 for rocky 8 and rocky 9
|
||||||
|
* Can we keep this name w/ openhpc?
|
||||||
|
* Chris - next release will also rename warewulf to warewulf3 to distinguish
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested - No update
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Not done yet
|
||||||
|
* Create POC script to create tickets when new slurm is available - Neil
|
||||||
|
* No movement
|
||||||
|
* Sherif waiting to hear from Jeremy about Intel GPU drivers
|
||||||
|
* Should have heard from them. Jeremy will follow up with them to see what happened
|
||||||
|
|
||||||
|
### 2023-11-30
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* Not done
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* No updates
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||||
|
* Not done yet
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Not done yet
|
||||||
|
* Create POC script to create tickets when new slurm is available - Neil
|
||||||
|
|
||||||
|
### 2023-11-15
|
||||||
|
|
||||||
|
* Sherif to finish/complete work on the wiki
|
||||||
|
* Not done
|
||||||
|
* Sherif to add Jeremy and Chris to gitusers and sig_hpc - Done
|
||||||
|
* Decide what is being put into cnode kernel, what is being removed - Jeremy
|
||||||
|
* No updates
|
||||||
|
* Request SB Certs for HPC cnode kernel from Security - Requested
|
||||||
|
* Requested
|
||||||
|
* Compare spec files for Warewulf vs OpenHPC - Sherif
|
||||||
|
* Not done yet
|
||||||
|
* Investigate what resources are required for testing - Neil
|
||||||
|
* Not done yet
|
||||||
|
|
||||||
|
### 2023-11-02
|
||||||
|
* Sherif to work on abit on the wiki - Not done
|
||||||
|
* Sherif to add Jeremy and Chris to the git user groups
|
||||||
|
|
||||||
|
### 2023-10-19:
|
||||||
|
* Sherif to create kernel repo for kernel HPC, kernel-hpc-node, called now kernel-cnode - Done -
|
||||||
|
* Jeermy, to get the ball rolling with intel GPU driver
|
||||||
|
* Stack, Fix the slurm rest daemon and integrated it with openQA
|
||||||
|
|
||||||
|
### 2023-10-05:
|
||||||
|
* None for this meeting, however we should be working on old business action items
|
||||||
|
|
||||||
|
### 2023-09-21:
|
||||||
|
* Sherif: Get the SIG for drivers
|
||||||
|
* Sherif: Check the names of nvidia drivers "open , dkms and closed source"
|
||||||
|
* Chris: Bench mark nvidia open vs closed source
|
||||||
|
|
||||||
|
### 2023-09-07:
|
||||||
|
* Sherif: Reaching out to AI SIG to check on hosting nvida that drivers that CIQ would like to contribute - Done and waiting to hear from them -
|
||||||
|
|
||||||
|
### 2023-08-24:
|
||||||
|
* Sherif: To push the testing repo file to release package
|
||||||
|
* Sherif: testing / merging the_real_swa scripts
|
||||||
|
|
||||||
|
### 2023-08-10:
|
||||||
|
* Sherif: Looking into the openQA testing - Pending
|
||||||
|
|
||||||
|
### 2023-07-27:
|
||||||
|
* Sherif: Reach out to jose-d about pmix - Done, no feedback yet -
|
||||||
|
* Greg: to reach out to openPBS and cloud charly
|
||||||
|
* Sherif: To update slurm23 to latest - Done -
|
||||||
|
|
||||||
|
### 2023-07-13:
|
||||||
|
* Sherif needs to update the wiki - Done
|
||||||
|
* Sherif to look into MPI stack
|
||||||
|
* Chris will send Sherif a link with intro
|
||||||
|
|
||||||
|
### 2023-06-29:
|
||||||
|
* Sherif release slurm23 sources - Done
|
||||||
|
* Stack and Sherif working on the HPC list
|
||||||
|
* Sherif email Jeremy, the slurm23 source URL - Done
|
||||||
|
|
||||||
|
### 2023-06-15:
|
||||||
|
* Sherif to look int openHPC slurm spec file - Pending on Sherif
|
||||||
|
* We need to get lists of centres and HPC that are moving to Rocky to make a blog post and PR
|
||||||
|
|
||||||
|
### 2023-06-01:
|
||||||
|
* Get a list of packages from Jeremy to pick up from openHPC - Done
|
||||||
|
* Greg / Sherif talk in Rocky / RESF about generic SIG for common packages such as chaintools
|
||||||
|
* Plan the openHPC demo Chris / Sherif - Done
|
||||||
|
* Finlise the slurm package with naming / configuration - Done
|
||||||
|
|
||||||
|
### 2023-05-18:
|
||||||
|
* Get a demo / technical talk after 4 weeks "Sherif can arrange that with Chris" - Done
|
||||||
|
* Getting a list of packages that openHPC would like to move to distros "Jeremy will be point of contact if we need those in couple of weeks" - Done
|
||||||
|
|
||||||
|
### 2023-05-04
|
||||||
|
* Start building slurm - On going, a bit slowing down with R9.2 and R8.8 releases, however packages are built, some minor configurations needs to be fixed -
|
||||||
|
* Start building apptainer - on hold -
|
||||||
|
* Start building singulartiry - on hold -
|
||||||
|
* Start building warewulf - on hold -
|
||||||
|
* Sherif: check about forums - done, we can have our own section if we want, can be discussed over the chat -
|
||||||
|
|
||||||
|
### 2023-04-20
|
||||||
|
* Reach out to other communities “Greg” - on going -
|
||||||
|
* Reaching out for different sites that uses Rocky for HPC “Stack will ping few of them and others as well -Group effort-”
|
||||||
|
* Reaching out to hardware vendors - nothing done yet -
|
||||||
|
* Statistic / public registry for sites / HPC to add themselves if they want - nothing done yet -
|
50
docs/events/meeting-notes/2024-01-25.md
Normal file
50
docs/events/meeting-notes/2024-01-25.md
Normal file
@ -0,0 +1,50 @@
|
|||||||
|
# SIG/HPC Meeting 2024-01-25
|
||||||
|
|
||||||
|
## Attendees
|
||||||
|
|
||||||
|
* Sherif Nagy
|
||||||
|
* Neil Hanlon
|
||||||
|
* Forrest Burt
|
||||||
|
* Chris Simmons
|
||||||
|
|
||||||
|
## Follow Ups
|
||||||
|
|
||||||
|
* Packages
|
||||||
|
* Slurm23.11 - In staging, needs testing
|
||||||
|
* This gives us slurm22, slurm23 (with is 23.05), and slurm23.11
|
||||||
|
* Built with UCX on all except s390x (as UCX is not built for s390x)
|
||||||
|
* Warewulf4 - published
|
||||||
|
* Thank you Brian Phan for testing this!
|
||||||
|
* Lustre - Sherif investigating
|
||||||
|
* PMIX / slurm23
|
||||||
|
* Bug reported upstream a few months back, fix available, seems to be working in OpenHPC
|
||||||
|
* [ ] Chris to track down slurm/pmix on Rocky 8 and see if it's working or not for next meeting
|
||||||
|
* cNode Kernel
|
||||||
|
* No updates yet
|
||||||
|
* SecureBoot Certs - Requested
|
||||||
|
* [ ] Notification for package updates upstream
|
||||||
|
* Wiki Updates - Neil and Sherif will work on this at FOSDEM/CentOS Connect
|
||||||
|
|
||||||
|
## Discussions
|
||||||
|
|
||||||
|
* Next meeting (8 Feb)
|
||||||
|
* Neil and Sherif traveling back from conferences
|
||||||
|
* FOSDEM and CentOS Connect
|
||||||
|
* Forrest and Brian Phan giving presentations on Apptainer/Warewulf
|
||||||
|
* adrianreber from OpenHPC team will be at FOSDEM
|
||||||
|
* Neil wants to nag him about a Mirrormanager bug
|
||||||
|
* Package list - Update
|
||||||
|
* [ ] Neil to create tickets for documentation on packages we've added, update list of what is yet to come
|
||||||
|
* Testing
|
||||||
|
* Brainstorm test scenarios we want to create for slurm, warewulf
|
||||||
|
* Stack is awol due to 👶, so we have some time to decide what we want to have a clear ask to Testing
|
||||||
|
|
||||||
|
### Open Floor
|
||||||
|
|
||||||
|
* N/A
|
||||||
|
|
||||||
|
### Action Items
|
||||||
|
|
||||||
|
* [ ] Chris to track down slurm/pmix on Rocky 8 and see if it's working or not for next meeting
|
||||||
|
* [ ] Neil to create tickets for documentation on packages we've added, update list of what is yet to come
|
||||||
|
* [ ] Notification for package updates upstream
|
35
docs/events/meeting-notes/2024-02-08.md
Normal file
35
docs/events/meeting-notes/2024-02-08.md
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
# SIG/HPC Meeting 2024-02-08
|
||||||
|
|
||||||
|
## Attendees
|
||||||
|
|
||||||
|
* Sherif Nagy
|
||||||
|
* Neil Hanlon
|
||||||
|
* Chris Simmons
|
||||||
|
|
||||||
|
(Neil forgot to take attendence)
|
||||||
|
|
||||||
|
## Follow Ups
|
||||||
|
|
||||||
|
* Slurm 23.11.5 in production
|
||||||
|
* Adjust conflicts and provides for older packages
|
||||||
|
* Meeting with intel on Monday re: GPU drivers; need insight on testin
|
||||||
|
* Monday @4PM Eastern (?) - chris will invite NEil
|
||||||
|
* Secureboot support?
|
||||||
|
* Driver is fully open source
|
||||||
|
* no update from chris on PMIX
|
||||||
|
* no movement on Lustre filesystem yet
|
||||||
|
* Neil to put in [tickets](https://git.resf.org/sig_hpc/meta/issues) actually for [[Meeting/2024-01-25/Rocky/SIG/HPC|Last Meeting]]
|
||||||
|
* Brian Phan and Forrest Burt gave talks on Warewulf/Apptainer
|
||||||
|
* Sherif and Brian met up at FOSDEM and discussed testing for WW, and what we can/should test
|
||||||
|
|
||||||
|
## Discussions
|
||||||
|
|
||||||
|
* (Neil had to leave early)
|
||||||
|
|
||||||
|
### Open Floor
|
||||||
|
|
||||||
|
* N/A
|
||||||
|
|
||||||
|
### Action Items
|
||||||
|
|
||||||
|
* N/A
|
66
docs/events/meeting-notes/2024-02-22.md
Normal file
66
docs/events/meeting-notes/2024-02-22.md
Normal file
@ -0,0 +1,66 @@
|
|||||||
|
# SIG/HPC Meeting 2024-02-22
|
||||||
|
|
||||||
|
## Attendees
|
||||||
|
|
||||||
|
* Sherif Nagy
|
||||||
|
* Neil Hanlon
|
||||||
|
* Alan Marshall
|
||||||
|
* Brian Peters
|
||||||
|
* Chris Simmons
|
||||||
|
* Brian Phan
|
||||||
|
* Forrest Burt
|
||||||
|
|
||||||
|
## Follow Ups
|
||||||
|
|
||||||
|
* NVIDIA GPU driver Testing - Chris
|
||||||
|
* https://github.com/mghpcsim/gpu-testing/tree/master
|
||||||
|
* documented process for configuring instance, installing drivers (open source or proprietary), setting up container runtimes, nvidia container toolkit
|
||||||
|
* Benchmarks using forked toolkit from Lambda labs with Rocky customizations
|
||||||
|
* initial control benchmark (pytorch):
|
||||||
|
* closed drivers slightly (4s) faster
|
||||||
|
* Plan: run benchmarks on progressively newer instances and collect results
|
||||||
|
* Publish results on Wiki
|
||||||
|
* Intel driver - Met with them, went well
|
||||||
|
* Can build this driver into signed kernel modules, add to testing Chris is doing
|
||||||
|
* This will live in SIG/Kernel because it's a kernel module
|
||||||
|
* driver toolkit pieces probably will end up in HPC SIG
|
||||||
|
* Kernel Cnode (for MoS)
|
||||||
|
* Sherif synced with Jeremy
|
||||||
|
* Lots of progress has been made, almost all patches backported
|
||||||
|
* there are couple problematic patches--they're based on SLES kernels, but a bit different enough to be problematic
|
||||||
|
* Pablo will help once the problem is set
|
||||||
|
|
||||||
|
## Discussions
|
||||||
|
|
||||||
|
* Testing - Warewulf, others
|
||||||
|
* Sherif and Brian Phan synced on warewulf testing
|
||||||
|
* Not *just* installibility, upgrade path, etc
|
||||||
|
* What can we use? Multiple things, probably
|
||||||
|
* OpenQA? TMT? Zuul? Whatever OpenHPC uses?
|
||||||
|
* Testing team would also love to get more people involved and participating in building tests
|
||||||
|
* Example tests:
|
||||||
|
* Provision cluster
|
||||||
|
* nodes communicate
|
||||||
|
* etc
|
||||||
|
* Want: have full end to end testing of all components
|
||||||
|
* What tests do we want?
|
||||||
|
* Functional
|
||||||
|
* Create cluster
|
||||||
|
* Create user
|
||||||
|
* Submit job as user
|
||||||
|
* Future:
|
||||||
|
* Slurm accounting/dbd, others
|
||||||
|
* Package tracking - PoI tracker
|
||||||
|
* Neil is looking how we can integrate this
|
||||||
|
* Wiki Updates - Neil and Sherif will work on this at FOSDEM/CentOS Connect
|
||||||
|
* This didn't really happen specifically, but discussions about ensuring Wikis are up to date did happen
|
||||||
|
|
||||||
|
### Open Floor
|
||||||
|
|
||||||
|
* N/A
|
||||||
|
|
||||||
|
### Action Items
|
||||||
|
|
||||||
|
* Sherif to build and release intel driver
|
||||||
|
* Sherif and Brian to work on defining tests that we want to run
|
||||||
|
* Neil to work on package update notifications
|
66
docs/events/meeting-notes/2024-03-07.md
Normal file
66
docs/events/meeting-notes/2024-03-07.md
Normal file
@ -0,0 +1,66 @@
|
|||||||
|
# SIG/HPC Meeting 2024-03-07
|
||||||
|
|
||||||
|
## Attendees
|
||||||
|
|
||||||
|
* Forrest Burt
|
||||||
|
* Brian Phan
|
||||||
|
* Sherif Nagy
|
||||||
|
* Enrico Billi
|
||||||
|
* Neil Hanlon
|
||||||
|
* Jeremy Siadal
|
||||||
|
* Chris Stackpole
|
||||||
|
|
||||||
|
## Old Business
|
||||||
|
|
||||||
|
* Intel Driver -
|
||||||
|
* Sherif is working on this, has a prototype, needs DKMS
|
||||||
|
* Used `make spec` script in the branch to create spec, and import from there
|
||||||
|
* We think that upstream should adopt a different format/packaging methodology
|
||||||
|
* Perhaps [packit](https://packit.dev) could be helpful?
|
||||||
|
* What branch/version to use?
|
||||||
|
* rhel-specific branches say not to use them; use the 'backports' branches instead
|
||||||
|
* sherif appears to be in the right place
|
||||||
|
* Next steps:
|
||||||
|
* Neil to bring dkms from epel into projects
|
||||||
|
* Sherif to upload to public location for review and testing
|
||||||
|
* Jeremy to work on testing with some latest hardware
|
||||||
|
* AI SIG
|
||||||
|
* where will userspace tools live? HPC? AI? Both?
|
||||||
|
* Neil: it should be reasonable for us to have the ability to easily release a package in multiple SIGs
|
||||||
|
* NVidia GPU driver Testing -
|
||||||
|
* Did not get time to review [Chris's work](https://github.com/mghpcsim/gpu-testing/tree/master) - will try to review this cycle
|
||||||
|
* Kernel Cnode / MoS
|
||||||
|
* re-actioning - Jeremy to work on once he has some time
|
||||||
|
|
||||||
|
## New Business
|
||||||
|
|
||||||
|
* Testing Warewulf - Brian
|
||||||
|
* Current plan: put the tests upstream into Warewulf repo, Testing team can pull from / engage with upstream
|
||||||
|
* What precisely are we going to test?
|
||||||
|
* Functional/E2E tests -- provision a small cluster, etc (see last week's [discussions](https://sig-hpc.rocky.page/events/meeting-notes/2024-02-22/#discussions))
|
||||||
|
* Future work can include e.g. slurm
|
||||||
|
* Chris to check on status of slurm
|
||||||
|
* Packages to bring in
|
||||||
|
* [List](https://sig-hpc.rocky.page/packages/) on the wiki; needs updating (along with the rest of the wiki)
|
||||||
|
* if anyone wants to bring something in, has questions, etc. Please ask/get in touch!
|
||||||
|
* Neil to update the wiki
|
||||||
|
|
||||||
|
## Open Floor
|
||||||
|
|
||||||
|
* Vulnerability in [lustre](http://lists.lustre.org/pipermail/lustre-announce-lustre.org/2024/000270.html) - related to user namespaces
|
||||||
|
* Sherif was working on lustre-server, but it's a beast
|
||||||
|
* DDN already builds RPMS, but... is it worth it to rebuild vs just use upstream?
|
||||||
|
* Sherif: thinks it makes sense to rebuild against our specific user/kernel space
|
||||||
|
* there are lustre-server for 8, but not 9, it appears.. why?
|
||||||
|
* documentation supports this but again.. why?
|
||||||
|
* Sherif to look into why lustre-server exists for 8 but not 9
|
||||||
|
* Next meeting in two weeks on Thursday, March 1
|
||||||
|
|
||||||
|
## Action Items
|
||||||
|
|
||||||
|
* [ ] Chris to check on status of slurm
|
||||||
|
* [ ] Neil to update the wiki
|
||||||
|
* [ ] Sherif to look into why lustre-server exists for 8 but not 9
|
||||||
|
* [ ] Neil to bring dkms from epel into projects
|
||||||
|
* [ ] Sherif to upload to public location for review and testing
|
||||||
|
* [ ] Jeremy to work on testing with some latest hardware
|
34
docs/events/meeting-notes/2024-03-21.md
Normal file
34
docs/events/meeting-notes/2024-03-21.md
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
# SIG/HPC Meeting 2024-03-21
|
||||||
|
|
||||||
|
## Attendees
|
||||||
|
|
||||||
|
* Neil Hanlon
|
||||||
|
* Sherif Nagy
|
||||||
|
* Brian Phan
|
||||||
|
* Forrest Burt
|
||||||
|
|
||||||
|
## Follow Ups
|
||||||
|
|
||||||
|
* Intel GPU driver imported and built in SIG/Kernel 'kernel-drivers' repo.
|
||||||
|
* https://dl.rockylinux.org/stg/sig/9/kernel/x86_64/kernel-common/Packages/i/intel-i915-dkms-1.23.6.42.230425.56-1.x86_64.rpm
|
||||||
|
* Warewulf 4.5 released upstream
|
||||||
|
* Sherif looking into bringing update to SIG
|
||||||
|
* Running into issue on Rocky 9
|
||||||
|
* Testing - CIQ will be upstreaming a test suite
|
||||||
|
* Nvidia driver GPU benchmarking - re-action reviewing the work
|
||||||
|
* Did not get time to review [Chris's work](https://github.com/mghpcsim/gpu-testing/tree/master) - will try to review this cycle
|
||||||
|
* Lustre server
|
||||||
|
* re-actioning; Sherif has not looked into it yet
|
||||||
|
* Wiki Content - still need to populate this. Can people from the SIG help?
|
||||||
|
* Packages - have some 'easy' ones
|
||||||
|
|
||||||
|
## Open Floor
|
||||||
|
|
||||||
|
* n/a
|
||||||
|
|
||||||
|
## Action Items
|
||||||
|
|
||||||
|
* [ ] Neil to bring in dkms to kernel-drivers to SIG/Kernel
|
||||||
|
* [ ] See if Alan would be willing to work on this
|
||||||
|
* [ ] Neil to look into resourcing some people to work on this
|
||||||
|
* [ ] Neil to make tickets for all packages we are looking to bring in, rank priority and ease
|
@ -1,16 +1,12 @@
|
|||||||
# SIG/HPC Wiki
|
# SIG/HPC Wiki
|
||||||
|
|
||||||
## Links
|
This SIG is aiming to provide various HPC packages to support building HPC cluster using Rocky Linux systems
|
||||||
|
|
||||||
## Responsibilities
|
## Responsibilities
|
||||||
|
|
||||||
|
Developing and maintaining various HPC related packages, this may include porting, optimized and contributing to upstream sources to support HPC initiative
|
||||||
|
|
||||||
## Meetings / Communications
|
## Meetings / Communications
|
||||||
|
|
||||||
## Members
|
We are meeting on bi-weekly bases on [Google meet for now](https://meet.google.com/hsy-qnoe-dxx) and you may check [RESF community calendar here](https://calendar.google.com/calendar/u/0/embed?src=c_2e1oqh6t0i6sqhja5nu9lq8lgo@group.calendar.google.com) also check [Contact US](contact.md) page to reach us
|
||||||
|
|
||||||
## Project layout
|
|
||||||
|
|
||||||
mkdocs.yml # The configuration file.
|
|
||||||
docs/
|
|
||||||
index.md # The documentation homepage.
|
|
||||||
... # Other markdown pages, images and other files.
|
|
||||||
|
13
docs/installation.md
Normal file
13
docs/installation.md
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
# Repo Installation
|
||||||
|
|
||||||
|
"""This page is still under construction"""
|
||||||
|
|
||||||
|
For Rocky 8 and 9, `dnf install rocky-release-hpc` will install the required repos
|
||||||
|
|
||||||
|
# Slurm installation:
|
||||||
|
|
||||||
|
For Rocky 9: `dnf install slurm22` or `dnf install slurm23`
|
||||||
|
|
||||||
|
For Rocky 8: you need to enable PowerTools repo first, then `dnf install slurm22` or `dnf install slurm23`
|
||||||
|
|
||||||
|
Slurm is divided into multiple packages, so `dnf search slurm` might be a good idea to fetch whatever packages you need
|
31
docs/packages.md
Normal file
31
docs/packages.md
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
# SIG/HPC Packages
|
||||||
|
|
||||||
|
Those are some of the packages that we are thinking to maintain and support within this SIG
|
||||||
|
|
||||||
|
* Lustre server and client
|
||||||
|
* Slurm
|
||||||
|
* Apptainer
|
||||||
|
* Easybuild
|
||||||
|
* Spack
|
||||||
|
* opempi build slurm support
|
||||||
|
* Lmod
|
||||||
|
* conda
|
||||||
|
* sstack
|
||||||
|
* fail2ban - in EPEL not sure if it's fit in this SIG -
|
||||||
|
* glusterfs-server - Better suited under SIG/Storage -
|
||||||
|
* glusterfs-selinux - Better suited under SIG/Storage -
|
||||||
|
* Cython
|
||||||
|
* genders
|
||||||
|
* pdsh
|
||||||
|
* gcc (latest releases, parallel install)
|
||||||
|
* autotools
|
||||||
|
* cmake
|
||||||
|
* hwloc (this really needs to support parallel versions)
|
||||||
|
* libtool
|
||||||
|
* valgrind (maybe)
|
||||||
|
* charliecloud
|
||||||
|
* Warewulf (if all config options are runtime instead of pre-compiled)
|
||||||
|
* magpie
|
||||||
|
* openpbs
|
||||||
|
* pmix
|
||||||
|
* NIS : ypserv, ypbind, yptools and a correspdonding nss_nis (took the source rpms from fedora and recompiled them for R9)
|
Loading…
Reference in New Issue
Block a user