Operational news of the MetaCentrum & CERIT-SC infrastructures

Operational news of the MetaCentrum & CERIT-SC infrastructures

Let us inform you about the following operational news of the MetaCentrum & CERIT-SC infrastructures:

  1. New GPU server grimbold with 2x nVidia Tesla P100 a glados1 extension with 1x nVidia TITAN V
  2. OS Debian9 upgrade progress
  3. New Amber modules available


1) New GPU server grimbold with 2x nVidia Tesla P100 a glados1 extension with nVidia TITAN V

  • MetaCentrum was extended with a new GPU server grimbold.ics.muni.cz (location Brno, owner CESNET), 32 CPU with the following specification:
    • CPU: 2x 16-core Intel Xeon Gold 6130 (2.10GHz)
    •  RAM: 196 GB
    •  Disk: 2x 4TB 7k2 SATA III
    •  GPU: 2x nVidia Tesla P100 12GB
    •  OS debian9

The cluster can be accessed via the conventional job submission through PBS Pro batch system in gpu and default short queues. Only short jobs are supporting from the beginning.

  •  A new nVidia GV100 TITAN V GPU card was recently added to the glados1.cerit-sc server.
    Due to compatibility problems with some SW, this card is available in a special gpu_titan queue on the wagap-pro PBS server.   

All GPUs servers are already running on Debian9, in case of compatibility issues with Debian9, try adding debian8-compat module.

If you encounter a GPU card compatibility issue, you can limit the selection of machines with a certain generation of cards using the gpu_cap=[cuda20,cuda35,cuda61,cuda70] parameter.

Currently, the following GPUs queues are available:
  • gpu (arien-pro + wagap-pro, with job sharing among both queues)
  • gpu_long (only arien-pro)
  • gpu_titan (arien-pro + wagap-pro)

  

2) OS Debian9 upgrade progress

The upgrade of Debian8 machines on Debian9 will be completed in both planning systems very soon (with the exception of old machines running Debian8 OS at CERIT-SC -- already after the warranty --  which will be decommissioned probably in the autumn).

Compatibility issues with some Debian9 applications are continually resolved by recompiling new SW modules. If you encounter a problem with your application, try adding the debian8-compat module to the beginning of the submission script.

If you experience any problem with libraries or applications compatibility, please, report it to meta@cesnet.cz.

Machines with other OSs (centos7) will continue to be available through special queues: urga, ungu (uv@wagap-pro queue) and phi (phi@ agap-pro queue)

List of nodes with OS Debian9/Debian8/Centos7 are available in the PBSMon application:

https://metavo.metacentrum.cz/pbsmon2/props?property=os%3Ddebian9
https://metavo.metacentrum.cz/pbsmon2/props?property=os%3Ddebian8
https://metavo.metacentrum.cz/pbsmon2/props?property=os%3Dcentos7

  

3) New Amber modules available

The new amber-14-gpu8 and amber-16-gpu modules are available for all versions of binaries, not only for GPUs (parallel versions and GPU versions are standard by .MPI or .cuda and .cuda.MPI), and are compiled for os=debian9.


All GPUs servers are already running under Debian9, but if the GPU is not explicitly required during the job submission, os=debian9 parametr is required until any Debian8 machine is running.

We recommend using these new modules (are better optimized for running on Debian9 and GPU or MPI jobs than the older amber modules).

 

 


Ivana Křenková, Fri Aug 10 15:35:00 CEST 2018