Operational news of the MetaCentrum & CERIT-SC infrastructures
Operational news of the MetaCentrum & CERIT-SC infrastructures
Let us inform you about the following operational news of the MetaCentrum & CERIT-SC infrastructures:
- New cluster glados.cerit-sc.cz with GPU cards NVIDIA 1080Ti available (CERIT-SC)
- Running jobs on OS Debian9 (CERIT-SC)
- Change in property settings (arien-pro i wagap-pro)
- Automatic scratch cleaning on the frontends
- New HW for ELIXIR-CZ
1) New cluster glados.cerit-sc.cz with GPU card available (CERIT-SC)
MetaCentrum was extended with a new SMP cluster glados[1-17].cerit-sc.cz (location Brno, owner CERIT-SC), 680 CPU in 17 nodes, each node with the following specification:
- CPU: 2x Intel Xeon Gold 6138 (2x 20 Core) 2.0 GHz
- RAM: 384 GB
- Disk: 2x 2TB SSD
- SPECfp2006 performance of each node: 1370 (34,25 per core)
- 2x GPU card Nvidia 1080 Ti available in glados[10-17]
- SSD scratch only, specify in qsub!
- Actually it supports up to 24 hour jobs only
- OS debian9
The cluster can be accessed via the conventional job submission through PBS Pro batch system (@wagap-pro server) in default queue. Only short jobs are supporting from the beginning.
- To submit GPU job in CERIT-SC (server @wagap-pro) use parametr gpu=1:
$ qsub ... -l select=1:ncpus=1:gpu=1 ...
- Do not forget specify scratch=ssd and os=debian9 in your qsub in all cases:
$ qsub -l walltime=1:0:0 -l select=1:ncpus=1:mem=400mb:scratch_ssd=400mb:os=debian9 ...
2) Running jobs on OS Debian9 (CERIT-SC)
CERIT-SC has extended the number of clusters with the new Debian9 OS (all new machines and some older ones). We are going to disable actual Debian8 setting in the default queue at @wagap-pro next week. After that date, if you do not explicitly specify the required OS in the qsub, the scheduling system selects any of those available in the queue.
- To submit job on Debian9 machine, please use "os=debian9" in job specification
zuphux$ qsub -l select=1:ncpus=2:mem=1gb:scratch_local=1gb:os=debian9 …
- Similarly for OS Debian8 use "os=debian8"
zuphux$ qsub -l select=1:ncpus=2:mem=1gb:scratch_local=1gb:os=debian8 …
- Please, note OS of special machines available in special queues may differ, e.g. urga, ungu (uv@wagap-pro) and phi (phi@wagap-pro) are running on CentOS 7.
If you experience any problem with libraries or applications compatibility, please, report it to meta@cesnet.cz.
Tip: Adding the module debian8-compat could solve most of the compatibility issues.
List of nodes with OS Debian9/Debian8/Centos7 are available in PBSMon application:
https://metavo.metacentrum.cz/pbsmon2/props?property=os%3Ddebian9
https://metavo.metacentrum.cz/pbsmon2/props?property=os%3Ddebian8
https://metavo.metacentrum.cz/pbsmon2/props?property=os%3Dcentos7
3) Change in property settings (arien-pro + wagap-pro)
We are going to unify properties of the machines in both the @arien-pro and @wagap-pro environments in April.
Operating system
We start with consistent labeling of the machine operating system with the parameter os=<debian8, debian9, centos7>
The original features of centos7, debian8, and debian9 are gradually canceled on the worker nodes (as PBS Torque residue). To select the operating system in the qsub command, follow the instructions in paragraph 2 above.
4) Automatic scratch cleaning on the frontends
Due to frequented problems with full scratch on frontends from last few months, we have implemented an automatic data cleaning (older than 60 days) also on frontends. Do not leave important data in the scratch directory on frontends. Transfer them to / home directories.
5) New HW for ELIXIR-CZ
MetaCentrum was extended also with HD and SMP clusters in Prague and in Brno (owner ELIXIR-CZ). The clusters are dedicated to members of ELIXIR-CZ national node:
• elmo1.hw.elixir-czech.cz - 224 CPU in total, SMP, 4 nodes with 56 CPUs, 768 GB RAM (Praha UOCHB)
• elmo2.hw.elixir-czech.cz - 96 CPU in total, HD, 4 nodes with 24 CPUs, 384 GB RAM (Praha UOCHB)
• elmo3.hw.elixir-czech.cz - 336 CPU in total, SMP, 6 nodes with 56 CPUs, 768 GB RAM (Brno)
• elmo4.hw.elixir-czech.cz - 96 CPU in total, HD, 4 nodes with 24 CPUs, 384 GB RAM (Brno)
The cluster can be accessed via the conventional job submission through PBS Pro batch system (@wagap-pro server) in the priority queue elixircz. Membership in this group is available for persons from academic environment of the Czech Republic and/or their research partners from abroad with research objectives directly related to ELIXIR-CZ activities. More information about ELIXIR-CZ services can be found at wiki https://wiki.metacentrum.cz/wiki/Elixir
Other MetaCentrum users can access new clusters via the conventional job submission through PBS Pro batch system (@wagap-pro server) in default queue (with maximum walltime limit -- only short jobs).
Queue description and setting: https://metavo.metacentrum.cz/pbsmon2/queue/elixircz
Qsub example:
$ qsub -q elixircz@arien-pro.ics.muni.cz -l select=1:ncpus=2:mem=2gb:scratch_local=1gb -l walltime=24:00:00 script.sh
Quickstart: https://wiki.metacentrum.cz/w/images/f/f8/Quickstart-pbspro-ELIXIR.pdf
The new clusters are operating with Debian9 OS. If you experience any problem with libraries or applications compatibility, please, report it to meta@cesnet.cz.
Tip: Adding the module debian8-compat could solve most of the compatibility issues.
Ivana Křenková, Fri Apr 06 15:35:00 CEST 2018

