Planed service maintenance of zigur and zapat clusters and disk array /storage/jihlava1-cerit/
Due to HW problems (being solved with original supplier), the zigur and zapat clusters will be available 1 month later, in the second half of October.
With many thanks for understanding.
--
Dear users,
From August 18, due to moving to Brno, zigur and zapat clusters and disk array /storage/jihlava1-cerit/ will not be available temporarily.
The clusters are covered by maintenance contract therefore the move will be done by the original supplier, approx. time of moving is a month (144 nodes of cluster plus disk array).
- The clusters will not be available all the time (approx. 1 month). The walltime limit in the queues will decrease gradually to prevent running any job during the outage. Remaining running jobs will be killed on switching the machines off.
- Current data in /storage/jihlava1-cerit/ will be temporary available only for reading since August 14, 11 PM.
- The data will be moved to storage-brno4-cerit-hsm.metacentrum.cz (CERIT-SC's HSM), they will be available for reading and writing since August 18 via symlink /storage/jihlava1-cerit/home/$LOGIN.
- The link /storage/jihlava1-cerit/home/$LOGIN will point to /auto/brno4-cerit-hsm/fineus/home/$LOGIN (after having finished the data transfer)
- Afterwards, the disk array will be available in Brno as /storage/brno7-cerit/ (fineus-home.cerit-sc.cz). PLEASE NOTE: the original data will not be copied back; they will remain accessible in CERIT-SC's HSM. The users are recommended to move the date elsewhere. In case of huge data amount, please contact us at support@cerit-sc.cz to schedule optimal transfer.
Influence on the running jobs:
- the jobs that work with the data saved on (or will save data to) another disk array will not be influenced
- the jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures ), and which will try to save the resulting data into /storage/brno1 (/storage/home) during the outage, will not be influenced as well (you'll find the resulting data in the scratch space of the relevant nodes)
- the jobs that work directly with the data saved in /storage/brno1 (/storage/home) or the jobs that will not check the success of copying-out the data into this array, will most probably crash. If you have some critical/long-term computations, that may be influenced by the outage, let us know -- we'll try to suspend your computation during the outage (however, the success of the suspend process cannot be guaranteed)
With many thanks for understanding,
Ivana Krenkova
MetaCentrum & CERIT-SC
Ivana Křenková, Thu Oct 01 15:50:00 CEST 2015