Let us inform you about the following operational news of the MetaCentrum & CERIT-SC infrastructures:
New GPU cluster for artificial intelligence and machine learning
- Integration of clusters and disk array of the Institute of Botany AS CR in Průhonice
- Moving the zenon cluster (hde.cerit-sc.cz) to OpenStack, upgrade to Debian10
1) Testing the new GPU cluster for artificial intelligence - adan.grid.cesnet.cz (1952 CPU) - with 192GB RAM, 2x 16-core Xeon and 2x nVidia Tesla T4 16GB
MetaCentrum was extended with a new GPU cluster adan.grid.cesnet.cz (location Biocev, owner CESNET), 61 nodes with the following specification (each):
- 32x Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
- RAM: 192 GB
- Disk: 4x 240GB SSD
- GPU: 2x nVidia Tesla T4 16GB s podporou AI
It is currently the most powerful cluster supporting artificial intelligence in the Czech Republic. It is available in TEST mode via the 'adan' queue (reserved for AI testers), the 'gpu' queue and short standard queues. If you are interested in becoming an AI tester (access to the 'adan' queue), contact us at meta (at) cesnet.cz.
Tip: If you encounter a GPU card compatibility issue, you can limit the selection of machines with a certain generation of cards using the gpu_cap=[cuda20,cuda35,cuda61,cuda70,cuda75] parameter.
2) Integration of clusters and disk array of the Institute of Botany AS CR Průhonice
- MetaCentrum was extended with a new cluster carex.ibot.cas.cz (location Průhonice, owner Institute of Botany AC CR), 8 nodes with the following specification (each):
- 8x AMD EPYC 7261 8-Core Processor
- RAM: 512 GB
- Disk: 2x 960GB NVMe
- Cluster draba.ibot.cas.cz (location Průhonice, owner Institute of Botany AC CR), 240 CPU cores with the following specification:
- 80x Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
- RAM: 1536 GiB
- Disk: 2x 960GB NVMe
- The machine is designed for jobs with high memory consumption (up to 1.5 TB).
In addition, the front end tilia.ibot.cas.cz (with the alias tilia.metacentrum.cz) and the/storage/pruhonice1-ibot/home disk array (dedicated to the ibot group) were put into operation.
Clusters are available through the 'ibot' queue (reserved for cluster owners). After testing, it is likely to be accessible through short standard queues.
The usage rules are available on the cluster owner's page: https://sorbus.ibot.cas.cz/
The cluster zenon.cerit-sc.cz (1888 CPUs, 60 nodes) is currently moving to OpenStack and will be accessible via wagap-pro PBS server in a few days. At the same time, the operating system is being upgraded to Debian10.
3) Moving the zenon cluster (hde.cerit-sc.cz) to OpenStack, upgrade to Debian10
The cluster will be available in the same way as before (PBS wagap-pro server, common queues).
Compatibility issues with some Debian10 applications are continually resolved by recompiling new SW modules. If you encounter a problem with your application, try adding the debian9-compat module to the beginning of the submission script. If you experience any problem with libraries or applications compatibility, please, report it to firstname.lastname@example.org.
List of nodes with OS Debian9/Debian10/Centos7 are available in the PBSMon application: