22.-27.4.2021 - HW upgrade of the /storage/plzen1/

Update April 26, 2021 - data is transferred to the new disk array. But there are occasional problems with the stability of the new disk array reported. We are working intensively to solve the stability problem. Please be patient.

Please check, whether your data on the new storage is complete. If not, you can copy it from the old storage, which has been renamed to storage-plzen1a.metacentrum.cz.

Please keep in mind that the storages cannot be operated interactively in a shell (see https://wiki.metacentrum.cz/wiki/Working_with_data#ssh_protocol). You can list the content of your home directory by the command

ssh user_name@storage-plzen1a.metacentrum.cz ls

You can fetch the data then

scp  user_name@storage-plzen1a.metacentrum.cz:~/some_directory .

From Thursday 22 to Sunday 25 April, the old disk array storage-plzen1.metacentrum.cz (/storage/plzen1/), serving as the /home for Pilsen's clusters, will be upgraded to a new hardware. Due to the huge amount of data, we estimate that the final synchronization will take several days, so please be patient. Try to limit the work on this disk array.

  • During the synchronization, the /storage/plzen1/ will be fully accessible (RW), except the final synchronization the last day.
  • During the upgrade, new jobs will not start on alfrid, konos, ida, kirke, minos, nympha clusters. The running jobs using the /storage/plzen1/ will be terminated with the final data synchronization.
  • After copying is completed, the new disk array will be available on the same symlink as the old disk array, from the user's point of view, nothing changes:
  • After the upgrade, the data will be physically located in the following storage (the name remains the same as in the past):
  • The new storage has 3 times more capacity than the old storage (1.1 PB), among other things, it solves the problem of running out of space.
  • The new storage will serve as the /home for Pilsen's clusters.

Influence on the running jobs:

  • The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
  • The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Beginners_guide#Run_batch_jobs), and which will try to save the resulting data into /storage/plzen1/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
  • Data of the jobs that work directly with the data saved in /storage/plzen1/ (not recommended) will be terminated.

Backup policy reminder

Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.

List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery

With apologies for the inconvenience and with thanks for your understanding.




Ivana Křenková, 15. 4. 2021