Outages
You can read this as RSS feed.
11.3.2026 - LLM Service Alert: kimi-k2.5 outage
update March 16: model kimi-k2.5 is back in operation
--
LLM model kimi-k2.5 is currently offline due to a critical hardware failure.
Because kimi-k2.5 is exceptionally large, we do not possess alternative backup hardware with the capacity to run it. We cannot migrate the workload, meaning there is no immediate fallback option.
We sincerely apologize for the disruption to your work.
MetaCentrum
Ivana Křenková, Wed Mar 11 03:00:00 CET 2026
26/02/2026 - G2 OpenStack Cloud outage - API + web
update: The outage was resolved at approx. 10:45 a.m., and the cloud is back up and running. Known root cause: HW failure of Juniper switch.
--
Dear users,
today, 26.2.2026, the instance of e-INFRA CZ G2 OpenStack cloud in Brno [1] is unavailable.
There is an unplanned outage. Outage affects all API services and web. The main G1 OpenStack cloud in Brno [2] is not affected.
Team MetaCentrum Cloud (cloud@metacentrum.cz)
[1] https://brno.openstack.cloud.e-infra.cz/
[2] https://cloud.metacentrum.cz/ https://cloud.muni.cz/
Ivana Křenková, Thu Feb 26 03:00:00 CET 2026
2.1.2026 - Unscheduled outage of the frontend zenith.cerit-sc.cz
Dear users,
Today an unplanned hardware outage occurred on the Zenith frontend. While the frontend is down, you can use the other frontends:
https://docs.metacentrum.cz/en/docs/computing/concepts#frontends-storages-homes
The storage array /storage/brno12‑cerit is also accessible from the other frontends.
We are currently working to resolve the issue as quickly as possible and will keep you informed of any further developments.
Thank you for your understanding,
The MetaCentrum team
Ivana Křenková, Fri Jan 02 15:00:00 CET 2026
29.10.2025 - Unplaned outage on MENDELU server room
Update 5PM: Clusters and S3 cluster are back in operation
--
MetaCenter team
Ivana Křenková, Wed Oct 29 13:00:00 CET 2025
9.10.2025 - Planned Outage of Repeat Explorer Galaxy
Starting Thursday at 9:00 AM, the RepeatExplorer Galaxy service will be unavailable due to an upgrade of the operating system, Galaxy, and Ansible playbooks. The expected downtime is several hours.
We apologize for any inconvenience,
MetaCenter team
Ivana Křenková, Tue Oct 07 13:00:00 CEST 2025
30.8.2025, - Outage of /storage/brno2/
Dear users,
Thank you for your understanding,
MetaCenter team
Ivana Křenková, Sun Aug 31 13:00:00 CEST 2025
15.8.2025 - Planned power outage in the ICS MUNI server room
On Friday, August 15, there will be a planned power outage in part of the hall at Masaryk University's IT Center. The outage will affect the tyra, aman, and zenon clusters.
Thank you for your understanding.
MetaCentrum
Ivana Křenková, Wed Aug 13 03:00:00 CEST 2025
13.-18. 8. 2025 - Planned prophylaxis at the University of West Bohemia in Pilsen
This week, a planned annual maintenance of the information technology system is taking place at the University of West Bohemia in Pilsen (ZČU). This maintenance may cause occasional outages of systems located in this location.
Our team is working to minimize the impact on users.
Thank you for your understanding.
Sincerely,
MetaCentra team
Ivana Křenková, Tue Aug 12 13:00:00 CEST 2025
7.7.2025, from 10AM - Outage of /storage/brno2/
Update 14th July, 9:00 AM
The disk array is back online, but in a degraded mode. Currently, one of the disk arrays is running on a single controller, which means that performance will be lower than usual for some time. We appreciate your understanding and cooperation in this matter.
Ideally, we would like users to minimize changes (new data and deletions) on the disk array over the next few hours.
We recommend checking directories that were written to on Sunday morning. Although data loss is expected to be minimal, we cannot rule out the possibility entirely.
Thank you for your understanding, and we apologize for any inconvenience caused.
--
Dear users,
Thank you for your understanding,
MetaCenter team
Ivana Křenková, Sun Jul 13 13:00:00 CEST 2025
26.6.2025- - Decommision of the ida and nympha clusters
Cluster IDA and Nympha have been shut down and taken out of operation.
We're pleased to announce that during the summer break, we're preparing to deploy new hardware that will replace the current Cluster IDA and Nympha. This upgrade will provide higher performance and reliability for our services.
Thank you for your understanding and patience.
tým MetaCentra
Ivana Křenková, Fri Jun 27 15:00:00 CEST 2025
30.4.2025 - Intermittent failure of frontend network connection zenith.cerit-sc.cz
Dear users,
we've been dealing with a network outage on the zenith frontend since yesterday. Until the situation is resolved, please use other frontends:
https://docs.metacentrum.cz/en/docs/computing/concepts#frontends-storages-homes
Thanks for your understanding,
MetaCenter team
Ivana Křenková, Wed Apr 30 15:00:00 CEST 2025
9.4.2025 - Planned power outage in the UOCHB server room
On Wednesday, April 9, from 8:00 a.m. to 1:00 p.m., the power supply will be disconnected on the UOCHB server room. The outage will affect elwe1-20, elmo1-[1-4], elmo2-[1-4], eluo1-[1-6] and the data storage facility storage-praha5. After the power outage is over, we will use the downtime to upgrade the disk storage.
Thank you for your understanding,
MetaCenter User Support
Ivana Křenková, Tue Apr 08 03:00:00 CEST 2025
7.7.2025, from 10AM - Scheduled outage of frontend zenith.cerit-sc.cz
Dear users,
On Monday, July 7th, the zenith frontend will experience a brief outage due to a hardware migration. During this time, you can access our services through alternative frontends:
https://docs.metacentrum.cz/en/docs/computing/concepts#frontends-storages-homes
We apologize for any inconvenience this may cause and appreciate your understanding.
MetaCenter team
Ivana Křenková, Mon Apr 07 15:00:00 CEST 2025
6.2.2025 - Login Issue with MetaCentrum Web Services - users from Charles university
Since yesterday, some users have reported login issues with our web services, specifically an error message indicating the need to set up multi-factor authentication (MFA):
"Authentication attempt for your account is denied, because your account is not yet configured to go through multifactor authentication. Contact the user support for assistance, make sure your account is enrolled and eligible for multifactor authentication and try again."
We are working on a solution with identity administrators at Charles University. According to their statement, you can resolve the issue by setting up MFA at the following link:
�� https://ldapuser.cuni.cz/idportal/mfa
Temporary workaround: You can still access MetaCentrum web services by selecting "e-INFRA CZ password" from the list of institutions and using your MetaCentrum login and password.
Thank you for your understanding.
Ivana Křenková, MetaCenter users support
Ivana Křenková, Thu Feb 06 03:00:00 CET 2025
27/01/2025 - G2 OpenStack Cloud outage
Actualization: 12PM -- G2 OpenStack is back in full operation
--
Dear users,
today, 27.1.2025 (Monday), the new instance of e-INFRA CZ G2 OpenStack cloud in Brno [1] is unavailable.
There is an unplanned outage. Outage affects all API services, running virtual servers remain functional. The main G1 OpenStack cloud in Brno [2] is not affected.
Team MetaCentrum Cloud
[1] https://brno.openstack.cloud.e-infra.cz/
[2] https://cloud.metacentrum.cz/ https://cloud.muni.cz/
Ivana Křenková, Mon Jan 27 03:00:00 CET 2025
22/01/2025 - Planned outage of Galaxy
Dear users,
On Wednesday, January 22, 2025, there will be a scheduled update to the Galaxy Portal at https://usegalaxy.cz. During this time, there may be a temporary loss of availability.
We will do our best to minimize the downtime and apologize for any inconvenience this may cause. Thank you for your understanding.
Have a nice day,
Galaxy Team
Ivana Křenková, Tue Jan 21 03:00:00 CET 2025
16.1.2025 10-11PM - Unplanned outage of the network connection in Pilsen
On January 16th from 10PM to 11PM, a scheduled maintenance will be performed on the CESNET network infrastructure in Plzeň, which will last approximately 10 minutes.
During this time, the following services will be unavailable:
- Clusters and frontends in the Plzeň location (ida, kirke, konos, nympha, alfrid)
- Disk storage /storage/plzen1
Ivana Křenková, Wed Jan 08 14:00:00 CET 2025
11.12.2024 - /storage/brno12-cerit/ and frontend zuphux outage
Actualization 10:50 AM
the storage is back in operation
--
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Wed Dec 11 09:00:00 CET 2024
18.-22.10.2024 - Unplanned outage of the network connection in Pilsen at the NITS hall
Since this afternoon, due to a network connection failure, the clusters konos and kubus, located in the NTIS hall, are unavailable. A new switch will be provided within the next week.
Ivana Křenková, Fri Oct 18 14:00:00 CEST 2024
12.10.2024 - /storage/brno12-cerit/ outage
update 1PM:
the disk array is back in operation
--
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Sat Oct 12 15:00:00 CEST 2024
18-19.8.2024 - /storage/brno12-cerit/ and frontend zuphux outage
update 26.8., 15 PM: the disk array is back in operation and the data should be readable. Please report any problems. Thank you for your understanding.
update 26.8. from 10:30 AM: during this morning the disk array will be briefly unavailable, we are trying to re-access unreadable data. We apologize for the inconvenience.
update 20.8.:
We regret to inform you that we have been experiencing significant hardware issues with the /storage/brno12-cerit/ directory since Sunday.
A small part of the data in /storage/brno12-cerit is now inaccessible due to a failure on one of the disk arrays, attempting to read it is showing up as an Input/Output error (in terms of blocks of data this is about 1.1%, but since large files over 4MB are spread across multiple devices it is more likely that at least some of them are affected). The fault is being addressed by the manufacturer's support. So far, the data is not definitively lost, but we don't currently know when it will be made available, or whether it will all be OK in the end. If you need some of them quickly, it may be more efficient reload the data (if it was primary input) or recalculate what is needed.
Otherwise, right now /storage/brno12-cerit is running normally, and there's no particular reason to assume that other data is more at risk than usual (however, given the size of the repository, this is not independently backed up, certainly it is not intended for archival or otherwise irreplaceable data), except that there may still be some limitations on operation while the broken piece of hardware is repaired.
Please note that due to the priority to increase the maximum capacity offered, it is not possible to perform a full backup of all data on storage of this size.
To ensure full backups we would need to at least double the funding to purchase suitable HW. As the archive purposes cover the disk arrays of the CESNET Data Care departement, and the branch repositories are also being prepared within the EOSC project, we only backup on our disk arrays in the form of snapshots. These offer some protection in case a user inadvertently deletes some of his files. In general, data that existed same days before the accident can be restored. However, snapshots are stored on the same disk arrays as the data itself, so in the event of a hardware failure these backups may be lost :-(
https://docs.metacentrum.cz/data/metacentrum-backup/
We are very sorry, we try to do our best to get back the lost data together with the HW vendor.
If you need it very urgently, please send the jobs to the system once again. We are able to make your priority higher (to start jobs as soon as possible), if needed.
Thank you for your understanding.

-.
update 19.8.: update 19.8.: the disk array is only working in limited mode, with short outages. If possible, limit work on this array. We are trying to stabilize the situation.
update 18.8. at 8PM: the storage is back in operation
--
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Sun Aug 18 15:00:00 CEST 2024
27.6.2024 - Unplanned network failure in Brno
Dear user,
A while ago, there was a network failure on the local network in Brno (a broken cable in Mendel University), which caused the unavailability of some computing clusters in this location (tyra+aman+zenon). We have reported the outage and are waiting for a replacement internet connection.
With apologies and thanks for your understanding
MetaCentrum team
Ivana Křenková, Thu Jun 27 10:16:00 CEST 2024
from January 2024 - Deccomission of archive /storage/du-cesnet/
In the archive repository /storage/du-cesnet/ (du4.cesnet.cz) a mechanical failure of the tape robot occurred in winter. Data is still being transferred to the object storage and access to the data on the tapes is very limited. After discussion with DU colleagues, we removed access to the mentioned storage from our machines (to speed up the transfer). If you need your data as a priority, please contact CESNET data storage at du-support@cesnet.cz.
We apologize for the inconvenience.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Fri May 24 15:00:00 CEST 2024
23.5.2024 - /storage/brno12-cerit/ and frontend zuphux outage
update: 23.5. at 9:30 a.m. back in operation
--
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Thu May 23 15:00:00 CEST 2024
13.5.2024 - /storage/brno12-cerit/ and frontend zuphux outage
Update May 13, 11:30: storage is fully back in operation
---
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Mon May 13 15:00:00 CEST 2024
19-24.4.2024 - Scheduled maintenance of network
Dear users,
on 19 - 21 April and 24 April in the afternoon/evening/night hours, software upgrades will take place in the backbone routers of the network. The outage will be at the times indicated and between 30 - 60 minutes (see attached schedule).
=======================================================================
*Friday 19.4.2023 17:00 - 21:00 * - Prague-Sitel, Plzeň1,2
*Friday 19.4.2023 20:00 - 00:00* - Jihlava
*Saturday 20.4.2023 15:00 - 19:00* - Prague - ÚMG - UJV Řež
*Saturday 20.4.2023 19:00 - 00:00* - Olomouc1,2 - České Budějovice
*Sunday 21.4.2023 00:00 - 05:00 - *Prague1 - Brno1
*Wednesday 24.4.2023 00:00 - 05:00 - *Praha2 - Brno2
We apologize for any inconvenience,
MetaCentrum
Ivana Křenková, Fri Apr 19 10:16:00 CEST 2024
11.3.2024 up to 6PM - Scheduled maintenance of the MetaCentrum Cloud
Dear user of MetaCentrum Cloud [1],
Today 11.3.2024 (Monday) in the morning and part of the afternoon (until approx. 18:00) the new instance of e-INFRA CZ G2 OpenStack cloud in Brno [1] is and will continue to be unavailable, there was an unplanned outage caused by planned cloud maintenance. Outage affects all API services, already running virtual servers remain functional. The main G1 OpenStack cloud in Brno [2] is not affected.
[1] https://brno.openstack.cloud.e-infra.cz/
[2] https://cloud.metacentrum.cz/ https://cloud.muni.cz/
We apologize for any inconvenience,
MetaCentrum Cloud team
Ivana Křenková, Mon Mar 11 10:16:00 CET 2024
7.3.2024 - /storage/brno12-cerit/ and frontend zuphux outage
Status update: as of 10AM, the disk array is back will full functionality
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Thu Mar 07 15:00:00 CET 2024
7.2.2024 - /storage/brno12-cerit/ and frontend zuphux outage
update 11:50 AM - the disk array is now fixed and available again
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Wed Feb 07 15:00:00 CET 2024
3. 2. 2024 from 9 AM - Short outage of /storage/brno2/
Due to maintenance there will be a short outage on the /storage/brno2/ disk array on Saturday 19. 2. from 9 am.
During the outage it won't be possible to log in to the skirit, perian and onyx frontends and the PBS server meta-pbs.metacentrum.cz won't submit new jobs to the Brno cluster.
OnDemand will also be affected (using the home directory of /storage/brno2/).
We apologize for any inconvenience.
Ivana Křenková, Fri Feb 02 13:46:00 CET 2024
11. 1. 2024, 15-15:45 - brno2 outage
Dear users,
currently the brno2 storage is down due to yet unspecified disc error. This means also the skirit frontend is not accessible.
We are investigating the cause. If possible, use other storages and frontends meanwhile.
Thank you for your understanding,
your MetaCentrum Team
Ivana Křenková, Thu Jan 11 15:00:00 CET 2024
24/08/2023 - Planned outage of Galaxy
Dear users,
The https://usegalaxy.cz service will be migrated to the more stable environment of VMWare cluster on Thursday Aug 24. Existing user data will be migrated as well.
The service will become unavailable from 10 am CEST (after that time we do not guarantee correct migration of newer data, though), and the outage is expected to end in early afternoon. However, the IP address and DNS records are going to be changed as well, their propagation will take some time. Therefore, the service is expected to be fully available again from Friday Aug 25.
With apologies and thanks for understanding
Galaxy MetaCenter Team
Ivana Křenková, Wed Aug 23 03:00:00 CEST 2023
1/08/2023 - Planned outage of elmo frontend
Dear users,
on the 1st of September the elmo.elixir-czech.cz will be on downtime.
To access computational resources, please use any other frontend, see https://docs.metacentrum.cz/basics/concepts/#frontends-storages-homes
With apologies and thanks for understanding
MetaCenter Team
Ivana Křenková, Tue Aug 01 03:00:00 CEST 2023
14.07.2023 4PM - Planned outage of data connection in Pruhonice
Dear user,
This afternoon (14 July) after 4PM there will be a short outage of data connection in Průhonice (ibot cluster). We have limited the submission of new jobs to this cluster, we will resume traffic as soon as the network connection is restored.
Running jobs that copy output back to the disk array will fail to do so, and data will remain in the scratch on the appropriate node where it was running. The data on the compute nodes can be accessed from any frontend using the following shortcut:
go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME
FOR EXAMPLE
tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz
With apologies and thanks for understanding
MetaCenter Team
Ivana Křenková, Fri Jul 14 10:16:00 CEST 2023
7-10.7.2023 - Unplanned disk array failure /storage/brno1-cerit/
Update: the storage is slow, we are working on a fix
------
Dear user,
Today afternoon (7 July) there was a HW failure of the /storage/brno1-cerit/ disk array. We are working on getting it back up and running in cooperation with the supplier.
Running jobs that copy output back to the array fail to do this, and the data remains in the scratch on the appropriate node where it was running. To access the data on the compute nodes, use the following shortcut:
go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME
FOR EXAMPLE
tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz
You can use other frontends (https://wiki.metacentrum.cz/wiki/Frontend) and disk arrays during the outage.
With apologies and thanks for understanding
MetaCenter Team
Ivana Křenková, Fri Jul 07 10:16:00 CEST 2023
20.6.2023 5-10PM - Scheduled maintenance of the MetaCentrum Cloud
Dear user of Cloud MetaCentrum [1],
There will be a reconfiguration of the Metacenter OpenStack cloud block storage in order to increase its capacity scheduled on Tuesday 20.6. between 5:00 PM and 10:00 PM CET.
From our experience we know that even little configuration change may cause a short outage (10-30 minutes) in relation with approximately 3K volumes that are now allocated. VMs operations will not be affected, the Main OpenStack API will be available as well as the Horizon UI, Cinder block storage and API will be temporarily unavailable preventing volumes creation.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
Ivana Křenková, Tue Jun 20 10:16:00 CEST 2023
19. 6. 2023 - Hardware failure of the storage brno2
Dear users,
we are sorry to announce that due to hardware failure the storage brno2 is down.
Consequently it is not possible to log in to frontends skirit, perian and onyx.
Currently we cannot tell whether/when the storage will be up again.
We will update you in this matter as soon as possible.
If you have any questions concerning your data and running jobs, contact us at meta@cesnet.cz.
We are very sorry for the inconvenience,
your MetaCentrum team.
Ivana Křenková, Mon Jun 19 14:30:00 CEST 2023
12.-15. 5. 2023 - Planned outage of luna cluster, luna frontend and the storage-praha6-fzu disk array
Dear users,
On 12-15 May, there will be a planned shutdown of most servers in the server room at the FZÚ AV ČR due to regular annual inspection of the electricity. The outage will include all nodes of the luna cluster, including the luna frontend and the storage-praha6-fzu disk array. The outage will also be used to replace faulty RAM in some servers.
We apologize for the inconvenience,
Your MetaCentrum support team.
Ivana Křenková, Thu May 04 14:30:00 CEST 2023
18-24.3.2023 - Unplanned disk array failure /storage/brno2/
Update 03/27/2023: There is another problem, it will be fixed in a few hours. Please be patient. The disk array was returned to service in the afternoon the same day.
Update 03/24/2023: The /storage/brno2/ disk array is back in full operation. Data remains intact.
-----------
Dear user,
On Saturday afternoon (18 March) there was a HW failure of the /storage/brno2/ disk array. We are working on getting it back up and running in cooperation with the supplier. We are not yet able to say when the array will be operational. The supplier is proceeding carefully so that we do not lose the stored data.
It is not possible to log in to frontends where this array serves as /home (skirit, onyx) and the disk array cannot be accessed from elsewhere (from other frontends or nodes). OnDemand is also affected.
Running jobs that copy output back to the array fail to do this, and the data remains in the scratch on the appropriate node where it was running. To access the data on the compute nodes, use the following shortcut:
go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME
FOR EXAMPLE
tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz
You can use other frontends (https://wiki.metacentrum.cz/wiki/Frontend) and disk arrays during the outage.
With apologies and thanks for understanding
MetaCenter Team
Ivana Křenková, Sat Mar 18 10:16:00 CET 2023
20-21.10.10.2022 - Unplanned network failure in Brno
update
Metacentrum OpenStack (CESNET_MCC), Status 2022-10-21 9:00
Openstack is functional, but limited amount of servers/hypervisors running around 40 VMs are without a network. We are working on VM migrations where possible.
---
Dear user,
Today we are experiencing numerous short-term outages on the local network in Brno, which are causing short-term unavailability of the cerit-pbs scheduling system and some machines. The cause is being investigated by local network specialists.
With apologies and thanks for your understanding
MetaCentrum team
Ivana Křenková, Thu Oct 20 10:16:00 CEST 2022
1.9.2022 - Planned outage of lex, krux, zubat cluster and brno14-ceitec storage
Dear users,
on Thursday 1st of September there will be power outage in the CEITEC server room. Consequently the cl;usters krux, lex and zubat as well as brno14-ceitec storage will be inaccessible. The downtime is planned to last between 5 a.m. and 12 a.m.
Jobs running on the affected clusters will be held by PBS to be run after the outage is over and no action on users' side is needed.
Jobs running elsewhere may be affected if they copy data to/from brno14-ceitec storage while the storage is down. If your jobs fail due to this reason at start, resubmit them after the outage is over. If your finising jobs fail due to the inability to copy results to brno14-ceitec, please fetch the files manually from scratch directory.
We apologize for the inconvenience,
your MetaCentrum support team.
Ivana Křenková, Tue Aug 23 14:30:00 CEST 2022
14.7.2022 - Planned outage of /storage/liberec3-tul, charon frontend and charon cluster
Dear users,
on Thursday 14th July there will be power outage due to maintenance in the facilities of Technical university of Liberec. Consequently /storage/liberec3-tul, charon.nti.tul.cz frontend and charon cluster will be powered down. The downtime is planned to last the whole day.
No action is needed on the users' side. Jobs whose walltime would collide with the start of downtime will be held by PBS to be run after the outage is over.
We apologize for the inconvenience,
your MetaCentrum support team.
Ivana Křenková, Mon Jul 11 14:30:00 CEST 2022
1.7.2022 - Unplanned outage of the old /storage/brno6/ (disks failure)
Dear users,
Due to an unplanned crash of the /storage/brno6/ disk array, which we were going to shut down in the next few days due to its age, we are forced to speed up this process. Most of your data from the /storage/brno6/ array can be found in the /storage/brno2/home/LOGIN/brno6/ directory.
The last full synchronization took place during the night from Wednesday to Thursday, and another partial synchronization took place during the downtime. Some of the data you uploaded to the array in the last few hours may not have been copied yet.
If we can get the old array back up and running, we will try to sync the newest data. Finally, the /storage/brno6/ disk array HW will be decommissioned without replacement, for working with data in Brno please use the /storage/brno2/ disk array, where the data have been transferred or any other disk array available in MetaCenter. Symlink /storage/brno6/ leads to the old field in the violation and will be deleted together with the HW shutdown.
We apologize for any inconvenience,
MetaCentrum
Ivana Křenková, Fri Jul 01 10:16:00 CEST 2022
24.6.2022 2-4PM - Scheduled maintenance of the MetaCentrum Cloud
Dear user of Cloud MetaCentrum,
There is planned load and performance cloud infrastructure testing scheduled on Friday 2022-06-24 from 14:00 to 16:00 (CEST).
Planned testing scenarios should not affect/interrupt any cloud functionality, but will result in extensive infrastructure load visible to end users as additional OpenStack API and UI latences.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
Ivana Křenková, Thu Jun 23 10:16:00 CEST 2022
2.6.2022 - HW upgrade of the following disk arrays: /storage/praha1/ = /storage/vestec1-elixir/
update 3. 6. 2022 3 PM
After upgrading the disk array, there were problems with the new file system. The problem has been fixed and the array is available again, you can start using it.
Disk array upgrade of the /storage/praha1/ = /storage/vestec1-elixir/
On Thuersday, June 2, the disk arrays will be upgraded in Prague (capacity, redundancy, and speed increase), during which it will be necessary to stop the arrays for a short time.
If everything goes according to plan, short outages of the storage-vestec1 (= praha1) array can be expected. In the coming days, there should be a significant increase in available capacity.
We will try to minimize the impact on running jobs as much as possible.
At the same time, the quota for the size of stored data will be increased to 0.5T -> 2TB and the quota for the number of files to 2 million.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Tue May 24 10:16:00 CEST 2022
23.5.2022 - Unplanned power failure in Brno
update 24. 5. 2022
All OpenStack services are now available after the unplanned power outage from 2022-05-22.
You may now start your VMs. If you experience any issues, please contact us at cloud@metacentrum.cz.
We apologize for any inconvenience.
--
Dear user,
During the night of 22nd to 23rd May, there was an unplanned power failure in data centre A510 (FI MU Brno). The backup power supply did not come on.
Most of the systems in the datacenter are running again, the problem occures in MetaCentrum Cloud.
The outage also affects the zuphux.cerit-sc.cz frontend, some clusters and Rancher (Kubernetes), which run from the cloud.
We apologize for any inconvenience,
MetaCentrum team
Ivana Křenková, Mon May 23 10:16:00 CEST 2022
13.4.2022 12AM -8PM - Scheduled outage of the MetaCentrum Cloud
Dear user of Cloud MetaCentrum,
On Wednesday, April 13, 2022, at 12:00 AM to 8:00 PM, a power outage is planned for part of the A510 datacenter. The outage should be uneventful (thanks to the backup power supply) and should last 1-2 hours. We do not anticipate any issues, but during a full outage, selected user vm's in openstack may be unavailable.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
Ivana Křenková, Tue Apr 12 10:16:00 CEST 2022
7.-8.4.2022 - Scheduled outage of the MetaCentrum Cloud
Update:
The MetaCentrum OpenStack cloud [1] is experiencing an unplanned series of network outages after yesterday's reconfiguration of HW network elements. The estimated time when outages may still occur is Friday, April 8, 2022 from 8:00 AM to 8:00 PM.
This is an extension of the announced outage scheduled for April 7, 2022.
Thank you for your understanding,
MetaCenter Cloud Team
--
Dear user of Cloud MetaCentrum,
Let us inform you that Metacentrum OpenStack cloud [1] planned networking maintenance is scheduled on Thursday 2022-04-07 from 7:00 to 20:00 (CEST). We plan to improve network stability by upgrading cloud network switches firmware and reconfiguration. We expect OpenStack cloud API and UI functionality will be unaffected. Selected cloud hypervisors (and there located cloud user VMs) may suffer from short networking outages.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
Ivana Křenková, Wed Apr 06 10:16:00 CEST 2022
28.3.2022 - Scheduled outage storage-praha5-elixir disk array
On Monday, March 28, the storage-praha5-elixir disk array will be upgraded (capacity, redundancy, and speed increase, OS upgrade, IP addresses change). The storage will be temporarily shut down during the upgrade. Occasional unavailability of the storage can be expected during the day. We do not recommend using the field at that time.
Sorry for the inconvenience,
MetaCentrum
Ivana Křenková, Tue Mar 22 10:16:00 CET 2022
4.3.2022 2-4 PM - Scheduled outage of the MetaCentrum Cloud
Dear user of Cloud MetaCentrum,
Let us inform you that Metacentrum OpenStack cloud [1] planned outage is scheduled on Friday 2022-03-04 from 14:00 to 16:00 (CET). The planned cloud improvements are migration of core controller servers to another resource pool and also production ipv6 address support.
We expect OpenStack cloud API and UI downtime will be up to 15 minutes. Users' running virtual servers will not be affected.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
Ivana Křenková, Wed Mar 02 10:16:00 CET 2022
26.1.2022 - HW upgrade of the following disk arrays: /storage/praha1/, /storage/vestec1-elixir/, and /storage/praha5-elixir/
Disk array upgrade of the /storage/praha1/, /storage/vestec1-elixir/, and /storage/praha5-elixir/
On Wednesday, January 26, the disk arrays will be upgraded in Prague (capacity increase), during which it will be necessary to stop the arrays for a short time.
If everything goes according to plan, short outages of the storage-vestec1 (= praha1) array in the morning and storage-praha5-elixir in the afternoon can be expected. In the coming days, there should be a significant increase in available capacity.
We will try to minimize the impact on running jobs as much as possible.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Tue Jan 25 10:16:00 CET 2022
21.1.2022 - Cluster krux, zubat, lex outage
Cluster krux, zubat, lex, frontend perian and brno9-ceitec outage
Last night, there was a cooling failure in the CEITEC server room, where the krux, zubat and lex computing nodes are located. These clusters are temporarily down. They will be returned back to operation after the cooling fault has been rectified.With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Fri Jan 21 10:16:00 CET 2022
12.1.2022 - Scheduled outage of the MetaCentrum Cloud
Dear user of Cloud MetaCentrum,
let us inform you about the planned upgrade of cloud 'Cloud MetaCentrum' (OpenStack) infrastructure which is scheduled on 12.1.2022 (DD.MM.YYYY) from 9:00 to 16:00. This upgrade is due preparation for adding support of IPv6 protocol.
We don't expect any issues. But any feedback about problems during upgrade is welcome.
We apologize for any inconvenience,
MetaCentrum Cloud team
Ivana Křenková, Mon Jan 10 10:16:00 CET 2022
16.12.2021 - Cluster krux, zubat, lex, frontend perian and brno9-ceitec outage
Cluster krux, zubat, lex, frontend perian and brno9-ceitec outage
On Thursday 16th starting at 7:00 a.m. there will be planned power outage in CEITEC server room. Consequently the clusters krux, zubat and lex, as well as perian frontend and brno9-ceitec storage, will be down. The planned outage duration is till 12 a.m.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Mon Dec 13 10:16:00 CET 2021
1.-2.12.2021 - HW upgrade of the /storage/brno6/
HW upgrade of the /storage/brno6/
From Wednesday December 12 (6 PM) to Thuersday December 2 (12 AM), the old disk array /storage/brno6/, will be upgraded to a new hardware. Try to limit the work on this disk array. Running processes that use long-running files directly in /storage/brno6 may crash after switching.
- During the synchronization, the /storage/brno6/ will be fully accessible (RW), except the final synchronization the last day.
- After copying is completed, the new disk array will be available on the same symlink as the old disk array, from the user's point of view, nothing changes:
/storage/brno6/
- After the upgrade, the data will be physically located in the following storage (the name remains the same as in the past):
storage-brno6.metacentrum.cz
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Tue Nov 30 10:16:00 CET 2021
21.10.2020 - Scheduled outage of the MetaCentrum Cloud
Dear user of Cloud MetaCentrum,
Let us inform you about the planned outage of the API and dashboard component in cloud 'Cloud MetaCentrum' (OpenStack). This scheduled outage is due to an reverse proxy upgrade. This outage affects API and dashboard access to Openstack, virtual machines should not be affected. The outage is scheduled on 21.10.2021 (DD.MM.YYYY) in the time of 8:30 am - 16:00 am CEST (UTC+2:00).
We apologize for any inconvenience,
MetaCentrum Cloud team
Ivana Křenková, Thu Oct 14 10:16:00 CEST 2021
5.10.2021 - Unexpected outage of /storage/budejovice1/ and cluster hiildor
The disk array /storage/budejovice1/home / and cluster hidlor are temporarily unavailable due to an unplanned power failure.
We try to locate and correct the defect in cooperation with local administrators.
We apologize for any inconvenience caused.
Ivana Křenková, Tue Oct 05 10:16:00 CEST 2021
5.10.-7.10.2021 - Luna cluster, luna frontend and storage-praha6-fzu planned outage
due to hardware upgrade there will be a planned outage from Tuesday 5th october, 7 a.m., till Thursday 7th october, 12 a.m. The luna cluster, luna frontend and storage-praha6-fzu will not be available during the outage.
We apologize for any inconvenience caused.
Ivana Křenková, Mon Oct 04 10:16:00 CEST 2021
27.8.2021 - Unexpected outage of /storage/budejovice1/
The disk array /storage/budejovice1/home / is temporarily unavailable due to an unplanned network failure.
We try to locate and correct the defect in cooperation with local administrators. The /storage/budejovice1/ is temporarily unavailable. The storage itself is fully functional, you just can't access the data. We are unable to estimate downtime at this time.
We apologize for any inconvenience caused.
Ivana Křenková, Thu Aug 26 10:16:00 CEST 2021
29.7.-1.8.2021 - HW upgrade of the /storage/brno2/
Updated July 30, 2021
Data is transferred to the new HW, in case of problems do not hesitate to contact.
Quotas have been set for the number and size of files, by default 3 TB and 2 million files.
HW upgrade of the /storage/brno2/
From Thursday July 29 to Sunday April 1, the old disk array /storage/brno2/, will be upgraded to a new hardware. Due to the huge amount of data, we estimate that the final synchronization will take several days, so please be patient. Try to limit the work on this disk array.
- During the synchronization, the /storage/brno2/ will be fully accessible (RW), except the final synchronization the last day.
- After copying is completed, the new disk array will be available on the same symlink as the old disk array, from the user's point of view, nothing changes:
/storage/brno2/
- After the upgrade, the data will be physically located in the following storage (the name remains the same as in the past):
storage-brno2.metacentrum.cz
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- Data written to /storage/brno2/ during synchronization may remain untransferred to the original field, storage-brno6: ~ /../ fsbrno2 / home / $ LOGNAME, and you will need to copy it individually.
Backup policy reminder
Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.
List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Thu Jul 22 10:16:00 CEST 2021
22.-27.4.2021 - HW upgrade of the /storage/plzen1/
Update April 26, 2021 - data is transferred to the new disk array. But there are occasional problems with the stability of the new disk array reported. We are working intensively to solve the stability problem. Please be patient.
Please check, whether your data on the new storage is complete. If not, you can copy it from the old storage, which has been renamed to storage-plzen1a.metacentrum.cz.
Please keep in mind that the storages cannot be operated interactively in a shell (see https://wiki.metacentrum.cz/wiki/Working_with_data#ssh_protocol). You can list the content of your home directory by the command
ssh user_name@storage-plzen1a.metacentrum.cz ls
You can fetch the data then
scp user_name@storage-plzen1a.metacentrum.cz:~/some_directory .HW upgrade of the /storage/plzen1/
From Thursday 22 to Sunday 25 April, the old disk array storage-plzen1.metacentrum.cz (/storage/plzen1/), serving as the /home for Pilsen's clusters, will be upgraded to a new hardware. Due to the huge amount of data, we estimate that the final synchronization will take several days, so please be patient. Try to limit the work on this disk array.
- During the synchronization, the /storage/plzen1/ will be fully accessible (RW), except the final synchronization the last day.
- During the upgrade, new jobs will not start on alfrid, konos, ida, kirke, minos, nympha clusters. The running jobs using the /storage/plzen1/ will be terminated with the final data synchronization.
- After copying is completed, the new disk array will be available on the same symlink as the old disk array, from the user's point of view, nothing changes:
/storage/plzen1/
- After the upgrade, the data will be physically located in the following storage (the name remains the same as in the past):
storage-plzen1.metacentrum.cz
- The new storage has 3 times more capacity than the old storage (1.1 PB), among other things, it solves the problem of running out of space.
- The new storage will serve as the /home for Pilsen's clusters.
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Beginners_guide#Run_batch_jobs), and which will try to save the resulting data into /storage/plzen1/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
- Data of the jobs that work directly with the data saved in /storage/plzen1/ (not recommended) will be terminated.
Backup policy reminder
Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.
List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Thu Apr 15 10:16:00 CEST 2021
3. 2. - HW upgrade of the /storage/praha1/, /storage/praha6-fzu/, unavailability of adan, luna, and tarkil clusters
HW upgrade of the storage-praha1.metacentrum.cz
On Wednesday, February 3, the old storage array storage-praha1.metacentrum.cz /storage/praha1/, serving as the /home for Prague's clusters, will be upgradet to a new hardware.
- The data stored in the storage may not be accessible due to migration to another storage, the clusters luna, tarkil, and adan will be switch off. Try to limit the work on this disk array, the newly written data during the outage may not be available on the new array. After the outage, it will be possible to transfer the data. Please check.
-
The data will be physically placed in the storage storage-vestec1-elixir.metacentrum.cz with the symlink to /storage/praha1/
- Further, the storage /storage/praha6-fzu will not be available during HW upgrade
-
The new storage will serve as the /home for Prague's clusters.
-
Old disk array will be temporary accessible as storage-praha1.metacentrum.cz
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available athttps://wiki.metacentrum.cz/wiki/Beginners_guide#Run_batch_jobs), and which will try to save the resulting data into /storage/praha1 during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
- Data of the jobs that work directly with the data saved in /storage/praha1/ (not recommended) will be terminated.
Backup policy
Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.
List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Fri Jan 29 10:16:00 CET 2021
5.-6.12.2020 - Planned electricity outages in Prague server room
Dear users,
let us inform you that on Saturday Dec 5 and Sunday Dec 6 will occurre a planned outage in Pague server room due to the repair of electrical wiring.
Tarkil cluster will be shut down for the duration of the repair. We will try to keep the /storage/praha1/ disk array in operation from a backup source.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Yours,
MetaCentrum
Ivana Křenková, Tue Nov 24 15:50:00 CET 2020
22.10.2020 - Unexpected network outage in Pilsen's and Ceske Budejovice server rooms
Dear users,
let us inform you that due to todays unexpected network outage in Pilsen and Ceske Budejovice. Some frontends, clusters and disk arrays maight be unavailable. We work on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Ivana Křenková, Thu Oct 22 15:50:00 CEST 2020
22-23.9.2020 - Unexpected power outage on bthe Prague ELIXIR-CZ server room
Cooling system is being serviced, so hopefully approaches will be possible soon.
We apologize for any inconvenience caused.
Ivana Křenková, Wed Sep 23 10:16:00 CEST 2020
15-16. 10. 2020 - Planned short network outages in Prague server room
The upgrade will result in approximately 30-minute network outages on individual routers.
Tuesday, 15 September, from 22:00 to 01:00
- connection for cluster TARKIL - L2 connection to cluster ARUBA
- connection for cluster SKURUT FZU - global table - primary
- connection for cluster SKURUT FZU - L3 VPN LHCONE - backup
Wednesday, 16 September, from 20:00 to 23:00
- connection for cluster SKURUT - global table - backup
- connection for cluster SKURUT - L3 VPN LHCONE - primary
- connection for Elixir cluster at UOCHB
- connection for cluster at (luna, kalpa) FZU
- GEANT connection to LHCONE
We assume that the outage will occur about half an hour after the beginning of the time slot.
We apologize for the inconvenience.
Your MetaCentrum
Ivana Křenková, Fri Sep 11 10:16:00 CEST 2020
2-3. 8. 2020 - Unexpected outage of /storage/praha1/
The disk array /storage/praha1/home / is temporarily unavailable due to an unplanned HW/SW failure.
The outage also affected the frontend tarkil, as well as computing clusters with home directory on the disk array
(adan, luna, kalpa, tarkil, ...)
We apologize for any inconvenience caused.
Ivana Křenková, Sun Aug 02 10:16:00 CEST 2020
16.7.2020 - Scheduled outage of the MetaCentrum Cloud
Dear user of Cloud MetaCentrum,
Let us inform you about the planned outage of the network overlay in
cloud 'Cloud MetaCentrum' (OpenStack). This scheduled outage is
necessary due to an upgrade of the network overlay which cannot be
performed without downtime. The outage is scheduled on 16.07.2020
(DD.MM.YYYY) in the time of 8:00 am - 12:00 pm CEST (UTC+2:00). During
the outage, you will not be able to access your machines, nor your
machines will be able to access the internet. The computation of your
machines should not be affected.
We apologize for any inconvenience,
MetaCentrum Cloud team
Ivana Křenková, Thu Jul 09 10:16:00 CEST 2020
27. 5. 2020 - Scheduled outage of the MetaCentrum Cloud
Dear user of MetaCentrum Cloud.
Due to an upgrade of MetaCentrum Cloud (OpenStack) from Stein to Train release, OpenStack control plane will be unavailable on May 27th 2020. Outage will start at 8:00 AM CET and will continue until 6:00 PM CET of the same day. During the time of upgrade, the OpenStack API (including dashboard) will not be accessible. Virtual instances should be accessible and working throughout the outage, however it is not recommended to plan critical processes during that time.
Ivana Křenková, Thu May 14 10:16:00 CEST 2020
16. - 17. 5. 2020 - planned outage of all worker nodes luna and storage at Prague Slovanka
We would like to inform you about a planned outage of all worker nodes luna at the weekend May, 16-17. The outage is due to a planned electricity shortage at the locality Slovanka.
We are going to shutdown all worker nodes luna on Saturday, May 16, morning at 6:00. The worker nodes luna will be available again on Monday, May 18, morning.
The disk arrays /storage/praha4-fzu/home and /storage/praha6-fzu/home/ will be on outage too.
Thank you for your understanding.
Best regards
MetaCentrum
Ivana Křenková, Mon May 11 10:16:00 CEST 2020
23. 4. 2020 - Unexpected outage of /storage/budejovice1/
The disk array storage-budejovice1.metacentrum.cz / storage / budejovice1 / home / is temporarily unavailable due to an unplanned HW/SW failure which has occured today in the night.
The outage also affected the frontend hildor, as well as computing clusters with home directory on the disk array.
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Working_with_data/Working_with_data_in_a_job), and which will try to save the resulting data into /storage/budejovice1/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
- Data of the jobs that work directly with the data saved in /storage/budejovice1/ (not recommended) will be terminated.
We apologize for any inconvenience caused.
Ivana Křenková, Thu Apr 23 10:16:00 CEST 2020
19.2.2020 - Outage of disk fields /storage/brno2 and /storage/brno6
Due to maintenance reasons there will be outage on disk fields /storage/brno2 and /storage/brno6 on 19. 2. between 13 and 14 PM. During the outage it will not be possible to log on to skirit and perian frontends and the PBS server meta-pbs.metacentrum.cz won't submit new jobs to Brno clusters.
We apologize for any inconvenience.
Ivana Křenková, Wed Feb 19 13:46:00 CET 2020
11.2.2020 - Expected outage of cluster charon
Please note that on 11.2, 10-14 h, there will be planned outage on the computational node charon.nti.tul.cz.
We apologize for any inconvenience.
Ivana Křenková, Tue Feb 11 13:46:00 CET 2020
12.2.2020 - Outage of PBS-server, PBSmon application and partial outage of OpenStack
Update: After noon, the network problem was resolved.
Repeated short failures of the university network segment in Brno cause failure of the cerit-pbs PBS server, non-updating of PBSmon application and partial outages of OpenStack.
We're working to fix the issue.
We apologize for any inconvenience.
Ivana Křenková, Tue Feb 11 13:46:00 CET 2020
14-16.1.2019 - Expected outage of the clusters carex, draba, and the /storage/pruhonice1-ibot/ disk array
let us inform us about the scheduled outage of the clusters carex.ibot.cas.cz and draba.ibot.cas.cz and the /storage/pruhonice1-ibot/ disk array in Průhonice on January 14 - 16 due to the planned HW upgrade.
We apologize for any inconvenience.
Ivana Křenková, Tue Jan 07 13:46:00 CET 2020
16.12.2019 - Expected outage 'Cloud2 MetaCentrum' (OpenStack)
Dear MetaCentrum Cloud user,
let us inform you about the scheduled outage of MetaCentrum Cloud
(OpenStack) on December 16th (Monday) 2019 due to a major upgrade of the
OpenStack control plane (from Rocky version to Stein). The outage will
start at 7:00 AM (CET, UTC+1:00) and will continue until 6:00 PM of the
same day. During the time of upgrade, the OpenStack API (including
dashboard) will not be accessible. Virtual machines should be accessible
throughout the outage, however it is not recommended to run critical
processes during that time.
Thank you for your patience.
We apologize for any inconvenience.
Ivana Křenková, Tue Dec 03 13:46:00 CET 2019
30. 10. 2019 3-4 PM - Unexpected outage in the UOCHB server room
We apologize for any inconvenience caused.
Ivana Křenková, Wed Oct 30 10:16:00 CET 2019
21. - 22. 10. 2019 - Unexpected outage of /storage/brno2/
• storage-brno2.metacentrum.cz /storage/brno2/ disk array will be temporarily unavailable, data will be moved to another array. Data will be available again from 18PM on /storage/brno6/ disk array, with original symlink /storage/brno2/. We plan to keep the frontends accessible at all times
at least one of the copies, but in the meantime the latest data will not be in the new location yet, please do not use the /storage/brno2/ until the event ends.
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Working_with_data/Working_with_data_in_a_job), and which will try to save the resulting data into /storage/brno7-cerit/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
- Data of the jobs that work directly with the data saved in /storage/brno2/ (not recommended) will be terminated.
- Running frontend sessions that access /storage/brno2/ will be terminated just like jobs running inside /storage/brno2/.
- Next, there will be about 15 minutes of network outage due to moving the router to another location.
We apologize for any inconvenience caused.
Ivana Křenková, Fri Oct 18 10:16:00 CEST 2019
4.9.2019 7-12 AM - Expected outage 'Cloud2 MetaCentrum' (OpenStack)
Dear user of Cloud2 MetaCentrum,
Let us inform you about the planned outage of the network overlay in cloud 'Cloud2 MetaCentrum' (OpenStack). This scheduled outage is necessary due to an upgrade of the network overlay which cannot be performed without downtime. The outage is scheduled on 4. 9. 2019 (DD.MM.YYYY) in the time of 7:00 am - 12:00 am CEST (UTC+2:00).
During the outage, you will not be able to access your machines, nor your machines will be able to access the internet. The computation of your machines should not be affected.
We apologize for any inconvenience.
Ivana Křenková, Thu Aug 29 13:46:00 CEST 2019
21.8.2019 7-10 AM - Expected outage 'Cloud2 MetaCentrum' (OpenStack)
Dear user of Cloud2 MetaCentrum,
Let us inform you about the planned outage of the network overlay in cloud 'Cloud2 MetaCentrum' (OpenStack). This scheduled outage is necessary due to an upgrade of the network overlay which cannot be performed without downtime. The outage is scheduled on 21.08.2019 (DD.MM.YYYY) in the time of 7:00 am - 10:00 am CEST (UTC+2:00).
During the outage, you will not be able to access your machines, nor your machines will be able to access the internet. The computation of your machines should not be affected.
We apologize for any inconvenience.
Ivana Křenková, Tue Aug 13 13:46:00 CEST 2019
17.7.2019 5-7AM - Expected outage du2.cesnet.cz (/storage/jihlava2-archive/)
Let us inform you that due to a planned central diesel gregate revision in Jihlava's server room the du2.cesnet.cz (/storage/jihlava2-archive/) and ceph object storage will be temporarly unavailable on Wednesday, 17 July between 5 and 7 AM.
We apologize for any inconvenience caused.
MetaCentrum
Ivana Křenková, Tue Jul 16 13:46:00 CEST 2019
20.6.2019 - Unexpected outage in Bnro's server room (most of clusters and /storage, old MetaCloud)
Dear users,
let us inform you that due to todays unexpected network outage in Brno's server room some Brno's clusters and disk arrays are unavailable. We work on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Ivana Křenková, Thu Jun 20 15:50:00 CEST 2019
26.4.2019 - Unexpected outage in CERIT-SC's server room in Brno (most of clusters and /storage/brno3-cerit/)
Dear users,
let us inform you that due to todays unexpected heating outage (early morning) in Brno's server room some CERIT-SC clusters and disk array are unavailable. We work on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Ivana Křenková, Fri Apr 26 15:50:00 CEST 2019
10.4.2019 - Unexpected failure of du2.cesnet.cz (/storage/jihlava2-archive/)
we are actually facing a power failure in Jihlava. Therefore, the du2.cesnet.cz (/storage/jihlava2-archive/) is not available.
We apologize for any inconvenience caused.
Ivana Křenková, Wed Apr 10 10:16:00 CEST 2019
12.3.2019 - Unexpected outage in CERIT-SC's server room in Brno (cluster zefron, uv and /storage)
Dear users,
let us inform you that due to todays unexpected power or network outage (2 PM) in Prague's server room some CERIT-SC clusters and disk array are unavailable. We work on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Ivana Křenková, Tue Mar 12 15:50:00 CET 2019
8.3.2019 10-11AM - Planed 10 mins network outage in Prague FZU
Dear users,
let us inform you that due to a planned central switch firmware upgrade in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home will be aprox. 10 minutes unavailable on Wednesday, 20 February between 10 and 11 AM.
We apologize for any inconvenience caused.
MetaCentrum
Ivana Křenková, Thu Mar 07 15:50:00 CET 2019
20.2.2019 9AM-9PM - Planed power outage in Prague FZU (cluster luna, kalpa and )
Dear users,
let us inform you that due to a planned network connectivity upgrade in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home will be unavailable on Wednesday, 20 February.
We apologize for any inconvenience caused.
MetaCentrum
Ivana Křenková, Fri Feb 15 15:50:00 CET 2019
28.1.2019 - Unexpected failure of /storage/praha1/ file system
we are actually facing a problem with /storage/praha1/ file system. Unfortunately, some machines with /home on this storage (luna, tarkil) are not working properly. We apologize for any inconvenience caused.
Ivana Křenková, Mon Jan 28 09:16:00 CET 2019
9.-11.1. - Decomission of the /storage/brno7-cerit/, recovery of the /storage/brno6/
Decommission of the storage-brno7-cerit.metacentrum.cz
On Wednesday, January 9, the old storage array storage-brno7-cerit.metacentrum.cz (/storage/brno7-cerit /) will be shut down.
- From January 9 to 11, the data stored in this field will not be accessible due to migration to another storage.
- From Friday (January 11) the data will be physically placed in the storage-brno1-cerit.metacentrum.cz, it will be accessible via symlink /storage/brno7-cerit/.
- The relocation also applies to the fishery project directory, which will be accessible from January 11 by the existing symlink.
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Working_with_data/Working_with_data_in_a_job), and which will try to save the resulting data into /storage/brno7-cerit/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
- Data of the jobs that work directly with the data saved in /storage/brno7-cerit/ (not recommended) will be terminated.
Recovery of the storage-brno6.metacentrum.cz
The storage array storage-brno6.metacentrum.cz (/storage /brno6/) have been back in operation since Friday January 4.
The failure of the disk array was very serious. Fortunately, much of the data was saved, but a small part of the data (primarily those manipulated at the time of the malfunction) could be lost or damaged.
Please check your data stored in the /storage/brno6/ file system.
Backup policy
Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Ivana Křenková, Sun Jan 06 10:16:00 CET 2019
12. - 13. 12. 2018 - Unexpected power outage in Prague FZU (cluster luna, kalpa and )
Dear users,
let us inform you that due to todays unexpected power outage in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home are unavailable. The vendor works on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Ivana Křenková, Wed Dec 12 15:50:00 CET 2018
10.12.2018 - Data transfer /storage/brno6/ --> /storage/brno1/
Due to repeated HW failure in /storage/brno6/, the data was moved to another storage /storage/brno1/, with the unchanged symlink /storage/brno6/.
The defective storage is being repaired by the vendor (replacement of the controller). Once repaired, the data will be returned to the original location.
Ivana Křenková, Mon Dec 10 10:16:00 CET 2018
26-27.11.2018 - Unexpected failure of /storage/brno6 file system
Ivana Křenková, Mon Nov 26 10:16:00 CET 2018
23. 11. 3 - 4 PM - Disk array /storage/brno11-elixir/ planned HW upgrade
Let us inform you that on Friday 23 the /storage/brno11-elixir/ (storage-brno11-elixir.metacentrum.cz) will be 10 minutes unavailable (between 3 and 4 PM) due to the HW upgrade.
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Working_with_data/Working_with_data_in_a_job), and which will try to save the resulting data into /storage/brno11-elixir/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
- Data of the jobs that work directly with the data saved in /storage/brno11-elixir/ (not recomanded) will be terminated.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
MetaCentrum
Ivana Křenková, Fri Nov 23 23:00:00 CET 2018
from 19.11.2018 - Unexpected failure of /storage/brno6 file system
we are actually facing a HW problem with /storage/brno6/ file system. MetaCloud web page (OpenNebula https://cloud.metacentrum.cz/) is not working, from this reason, too. Update: back in operation since Nov 21
The problem with access to /storage/brno6/home/ persists.
We apologize for any inconvenience caused.
Ivana Křenková, Mon Nov 19 10:16:00 CET 2018
26.-28.10.2018 - Data migration onto new data storage
Dear CESNET MetaCentrum and Storage facility user,
We would like to inform you that the hierarchical storage in Pilsen (du1.cesnet.cz, /storage/plzen2-archive in MetaCentrum) will be permanently decommissioned.
If you have no data in this storage facility, this mail is not relevant for you. All your data from plzen2-archive will be transferred by storage administrators to a new storage facility.
This e-mail is to inform you about the plan and the schedule.
Data in Pilsen will be made permanently inaccessible for the users during the evening of 26th October. We'll start final synchronisation of recent changes to Ostrava storage facility, i.e., du4.cesnet.cz, /storage/du-cesnet in MetaCentrum (note the change in naming convention). The data will be inaccessible during the transfer period. We expect to make the data available in the new location in Ostrava again in the evening of 28th October. The data will be permanently available in Ostrava since then.
Kindly note new Data Storage Terms of Service (ToS) and the changes they introduce. Policies for archival (long-term) data and temporary backups have been distinguished. You can find full text of the ToS on https://du.cesnet.cz/en/provozni_pravidla/start, and we also have a short description of most important changes on https://du.cesnet.cz/en/navody/faq/start#handling_archives_and_backups. Both archive as well as backup policies are available to MetaCentrum users.
Data from Pilsen is considered an archive and it is handled as such.
If you have any questions or need any kind of help, please contact our user support (by replying to this mail and/or on support@cesnet.cz).
Thank you for your cooperation.
With kind regards,
Your CESNET MetaCentrum and Data Storage team
Ivana Křenková, Wed Oct 24 10:16:00 CEST 2018
13.9.2018 - Unexpected failure of /storage/brno2 file system
we are actually facing a problem with /storage/brno2/ file system. Unfortunately, some machines with /home on this storage are not working properly. In the meantime, please use machines in other localities or CERIT-SC machines in Brno (PBS server wagap-pro, frontend zuphux.cerit-sc.cz). We apologize for the inconveniences caused.
We apologize for any inconvenience caused.
Ivana Křenková, Thu Sep 13 10:16:00 CEST 2018
21.-23.5.2018 - Expected restart of virtual machines in MetaCloud due to security update
due to planned maintenance and security updates on physical machines, the virtual machines dukan1.ics.muni.cz - dukan26ics.muni.cz and gorbag.ics.muni.cz will be continuously restarted in the first half of the next week. The information about affected machines will be in OpenNebula (https://cloud.metacentrum.cz/) in the Info section of each virtual machine.
We apologize for any inconvenience caused.
Ivana Křenková, Thu May 17 10:16:00 CEST 2018
12.2.2018 till 11 AM - Unexpected failure of AFS file system
Actualization 2018-02-12 11 AM: AFS is working properly again
An AFS server crash occurred this weekend, also causing unexpected problems in the vicious part of the AFS subsystem. As a result of these failures, some volumes are not available on AFS (and also SW modules are not available) and can not be logged on to some computational nodes and frontends. We're working on the repair.
We apologize for any inconvenience caused.
Ivana Křenková, Mon Feb 12 10:16:00 CET 2018
5.2.2018 - Unplanned network connectivity outage in Brno
Due to the failure of the network connectivity in the Brno location, there are no services requiring a network connection hosted in Brno - MetaCloud, Brno machines ... We are working on the remedy.
With apologies for the inconvenience and with thanks for your understanding.
MetaCentrum
Ivana Křenková, Mon Feb 05 10:00:00 CET 2018
from Jan 8 - Response to security failures in processors known as Meltdown and Specter
Dear users,
MetaCentre administrators track the situation with recent bugs in processors (known as Meltdown and Specter, for more information see https://spectreattack.com/).
We evaluate the real impacts of infrastructure vulnerabilities. We have applied the available updates in the VMWare and MetaCloud environments. For part of the computational nodes we monitor available updates and evaluate their impact on the Metacentra environment (they are tested for performance limitations). The computing nodes are being updated gradually. If the situation requires, we could force the immediate restart of the computing resources and stop all active tasks. Especially for the upcoming long tasks, please consider postponing their execution at a later time, especially if your tasks can not be restarted.
We apologize for any inconvenience caused.
MetaCentrum
Ivana Křenková, Tue Jan 09 15:50:00 CET 2018
from 31.12.2017 - Unexpected power outage in Prague FZU (cluster luna, kalpa)
Dear users,
let us inform you that due to todays unexpected power outage in Prague's server room the local clusters luna and kalpa are unavailable.
The vendor works on the repair, the length of the outage can not be estimated.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Ivana Křenková, Tue Jan 02 15:50:00 CET 2018
7.12.2017 - Disk array /storage/budejovice1/ planned HW upgrade
Let us inform you that on Thursday December 7 the /storage/budejovice1/ (storage-budejovice1.metacentrum.cz) will be moved to a new hardware and will be several hours unavailable during the final synchronization. Shared disk space at hildor*:/scratch.shared, mounted from this storage, will not be available too.
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Working_with_data/Working_with_data_in_a_job), and which will try to save the resulting data into /storage/budejovice1/, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
- Data of the jobs that work directly with the data saved in /storage/budejovice1/ (not recomanded) will be terminated.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
MetaCentrum
Ivana Křenková, Tue Dec 05 23:00:00 CET 2017
28.11.2017 - PBS Pro bug in new version
Dear users,
Due to a bug in the new version of PBS Pro the walltime of almost all running jobs was reseted. The PBS Pro could not recognized the CPU usage, significantly overestimated the cpu usage time and jobs unexpectedly ended. We reported the error to PBS Pro developers and returned PBS Pro server to the previous version.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova, Tue Nov 28 13:50:00 CET 2017
6.10.2017 (7-10 AM) - Power outage in JU's server room
Dear users,
Let us inform you that due to a planned power outage in Ceske Budejovice the clusters hildor/haldir/hagrid and disk array /storage/budejovice1/ will be temporary unavailable on Friday October 6 (7-10 AM). Unfortunately all running jobs will be terminated. Please copy the data you will need for your calculation during these few days to another disk array.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova, Thu Oct 05 01:50:00 CEST 2017
25. 7. 2017 - MetaCloud: firmware actualization on dukan 19-25 machines
Dear users,
Given a pressing need to update firmware in cloud nodes dukan19 through dukan25 we will have to briefly power off virtual machines using those nodes. The intervention is scheduled for Tuesday 25 July. Each node, hence each collocated virtual machine, will be powered off for approximately 20 minutes. We will boot the virtual machines afterwards. There will be no data loss. Affected users have been notified by e-mail.
With apologies for the inconvenience and with thanks for your understanding,
MetaCloud team
Ivana Krenkova, Tue Jul 25 01:50:00 CEST 2017
5. 6. 2017 - MetaCloud: migration of virtual machines running on dukan1-10
Dear users,
On Monday 5th June we are going to migrate virtual machines away from nodes dukan1-10. Affected machines will be powered off temporarily. There will be no data loss. Machines with private network addresses (currently in range 10.4.0.*) require special treatment. Given the current configuration of our network their private IP addresses will have to change. Please, look up the new IP addresses of your virtual machines through the MetaCloud interface after that date. Affected users have already been notified by e-mail.
MetaCloud team
Ivana Krenkova, Mon May 29 01:50:00 CEST 2017
4.6.2017 (7:45-10 AM) - Power outage in JU's server room
Dear users,
Let us inform you that due to a planned power outage in Ceske Budejovice the clusters hildor/haldir/hagrid and disk array /storage/budejovice1/ will be temporary unavailable on Sunday June 4 (7:45-10 AM). Unfortunately all running jobs will be terminated. Please copy the data you will need for your calculation during these few days to another disk array.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova, Wed May 17 01:50:00 CEST 2017
11.5.2017 - OS upgrade on the Zuphux frontend (Centos 7.3) + PBS Pro setting as the default environment in CERIT-SC
On May 11th, server zuphux will be restarted to a new OS version (Centos 7.3).
At the same time, the planning system in the Torque environment (@wagap) will no longer accept new jobs. Existing jobs will be counted on the remaining nodes. The remaining computational nodes in the Torque environment will be gradually converted to PBS Pro. Machines currently available in a PBS Pro environment are labeled by "Pro" in the PBSMon application https://metavo.metacentrum.cz/pbsmon2/nodes/physical .
Frontend zuphux.cerit-sc.cz will be set by default to PBSPro (@wagap-pro) environment.
With apologies for the inconvenience and with thanks for your understanding.
CERIT-SC users support
Ivana Křenková, Wed May 10 23:00:00 CEST 2017
7.4.2017 4 PM-0 AM - Zuphux frontend and @wagap, @wagap-pro outage
On Friday April 4, from 15:45, the frontend zuphux will be temporary unavaibale due to an unplanned emergency service of critical disk array controllers. Estimated time of the outage is 2 hours. Other frontends can be used during the outage:
https://wiki.metacentrum.cz/wiki/Frontend
Other services running from the affected disk array (Torque server @wagap and PBS Pro server @wagap-pro) will be migrated to another server on Thuersday evening, with some very short outages on Thuersday and Friday evenings.
With apologies for the inconvenience and with thanks for your understanding.CERIT_SC support
Ivana Křenková, Thu Apr 06 23:00:00 CEST 2017
10.3.2017 - Outage on archieval storage in Brno /storage/brno4-cerit-hsm/
Dear users,
after the upgrade of the HSM storage-brno4-cerit-hsm.metacentrum.cz (the upgrade was realised by the vendor on February 14-15) unexpexted error occured, the HSM is particulary available. The vendor works on the repair, the length of the outage can not be estimated.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
Ivana Krenkova, Fri Mar 10 01:50:00 CET 2017
24.2.2017 from 4 AM - Unplanned outage in Pilsen
Today (around 4 AM) occured an accident on watter cooling system in Pilsen, which affected all Pilsen computing nodes, frontends, and /storage/plzen1/. The machines are back in operation (Nevertheless, some related service works still occur...)
We apologize for any inconvenience caused.
Ivana Křenková,
MetaCentrum
Tom Rebok, Fri Feb 24 15:26:00 CET 2017
from 19.2.2017 - Outage on archieval storage in Brno /storage/brno4-cerit-hsm/
Dear users,
after the upgrade of the HSM storage-brno4-cerit-hsm.metacentrum.cz (the upgrade was realised by the vendor on February 14-15) unexpexted error occured, the HSM is unavailable now. The vendor works on the repair, the length of the outage can not be estimated.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
Ivana Krenkova, Mon Feb 20 01:50:00 CET 2017
14.-15.2.2017 - Planned system actualisation on archieval storage in Brno /storage/brno4-cerit-hsm/
Dear users,
Let us inform you that from Wednesday February 14 (9 AM) to February 15 (6 PM) the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to a security actualisation of the system.
IMPORTANT: The HSM still hosts data from Jihlava /storage/jihlava1-cerit/
Influence on the running jobs:
- the jobs that work with the data saved on (or will save data to) another disk array will not be influenced
- the jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures), and which will try to save the resulting data into /storage/jihlava1-cerit/, during the outage, will not be influenced as well (you'll find the resulting data in the scratch space of the relevant nodes)
- the jobs that work directly with the data saved in /storage/jihlava1-cerit/, or the jobs that will not check the success of copying-out the data into this array, will most probably crash. If you have some critical/long-term computations, that may be influenced by the outage, let us know -- we'll try to suspend your computation during the outage (however, the success of the suspend process cannot be guaranteed)
Ivana Krenkova,
Ivana Krenkova, Tue Feb 07 01:50:00 CET 2017
23.1.2017 - Disk array /storage/praha1/ planned HW upgrade
Let us inform you that on Monday January 23 the Prague's /storage/praha1/ (storage-praha1.metacentrum.cz) will be moved to a new hardware and will be several hours unavailable during the final synchronization. Shared disk space at *:/scratch.shared, mounted from this storage, will not be available too.
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures), and which will try to save the resulting data into /storage/plzen1/, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
- Data of the jobs that work directly with the data saved in /storage/praha1/ (not recomanded) will be terminated.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
MetaCentrum
Ivana Křenková, Mon Jan 09 23:00:00 CET 2017
11. 1. 2017 - Planned MetaCloud upgrade
Dear users,
the OpenNebula upgrade announced earlier will take place on 11 January. At that time, the front-end will be unavailable for some time, and virtual machines running in the dukan.ics.muni.cz cluster will be restarted as we update the nodes.
Please be aware that there may be issues especially with older virtual machines instantiated with the previous OpenNebula version (2015 and earlier). Please contact us (cloud@metacentrum.cz) in case of trouble.
MetaCloud tym
Ivana Krenkova, Mon Jan 09 01:50:00 CET 2017
15.12.2016 (11PM-02AM) - Planed outage of Torque server @wagap
Dear users,
Let us inform you that on Thuersday (Dec 15, 11PM - 2AM.) the Torque server wagap.cerit-sc.cz will be temporary unavailable due to a SW upgrade. Sending new jobs and manipulating with jobs in the system will not be allowed during the outage.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
Ivana Krenkova, Thu Dec 15 01:50:00 CET 2016
8.12.2016 - Power outage in JU's server room
Dear users,
Let us inform you that due to an unexpected power outage in Ceske Budejovice the clusters hildor/haldir/hagrid are temporary unavailable. Unfortunately all running jobs have been terminated.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
Ivana Krenkova, Thu Dec 08 01:50:00 CET 2016
from 1.11.2016 - tarkil frontend planned outage
Let us inform you that the tarkil.cesnet.cz frontend is unavailable due to a migration to another HW. All running processes on the frontend were terminated.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum
Ivana Křenková, Tue Nov 01 23:00:00 CET 2016
27.10.2016 from10 PM - /storage/brno3-cerit/ planned HW upgrade
Let us inform you that on Thuersday October 27 (10 AM) the Brno's /storage/brno3-cerit/ (storage-brno3-cerit.metacentrum.cz) will be moved to a new hardware.
Influence on the running jobs:
- The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
- The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures), and which will try to save the resulting data into /storage/brno3-cerit/, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes or the computed data will be stored to the old hardware and synchronized with the new one later. Please ask us to prioritize copying of your data.
- Data of the jobs that work directly with the data saved in /storage/brno3-cerit/ can be stored to the old hardware as well and synchronized with the new one up to tens hour later. Please ask us to prioritize copying of your data.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
MetaCentrum & CERIT-SC
Ivana Křenková, Tue Oct 25 23:00:00 CEST 2016
30.8.2016 from 10 PM - Zuphux frontend planned outage
Let us inform you that on Tuesday (August 30, 10 PM - 0 AM) the zuphux frontend will be shortly unavailable due to a migration to another HW. All running processes on the frontend will be terminated during the outage.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum
Ivana Křenková, Wed Aug 24 23:00:00 CEST 2016
25.-29.7.2016 - Planed service maintenance of clusters and disk array in Ceske Budejovice
Dear users,
Let us inform you that from July 25 to 29, hildor, haldir, hagrid clusters and disk array /storage/budejovice1/ will not be temporarily available due to moving to another server room. Please copy the data you will need for your calculation during these few days to another disk array.
With many thanks for understanding,
Ivana Krenkova
MetaCentrum
Ivana Křenková, Fri Jun 24 15:50:00 CEST 2016
27.4.2016 10 PM - Power outage in UK's server room
Dear users,
Let us inform you that due to a planned power outage in UK's Karolina server room the local servers eru1, eru2, acharon, AFS servers asterix, obelix, sal will be temporary unavailable tomorrow (April 27), 10-11 PM.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova, Tue Apr 26 01:50:00 CEST 2016
21.4.2016 from 10:30 PM - Planned MetaCloud upgrade
Dear users,
CERIT-SC's resources in the OpenNebula MetaCloud (phys. nodes hda*) will be under maintenance this Thursday 21th April from 10:30pm. Your virtual machine(s) will be only paused (you won't loose your running state) and one by one resumed. Optimistic estimate is that each VM shouldn't be down for more than 30 minutes. Whole maintenance can take up to 2 hours.
Ivana Krenkova,
Ivana Krenkova, Tue Apr 19 01:50:00 CEST 2016
18.4.2016 7-15:00 - Planed power outage in Brno UKB
Dear users,
let us inform you that due to a planned power outage in Brno's server room in UKB the local clusters lex, krux, zubat and disk arrays brno9-ceitec + brno10-ceitec-hsm will be temporary unavailable.
We apologize for any inconvience caused.
Ivana Krenkova, Mon Apr 11 03:50:00 CEST 2016
18.4.2016 7-15:00 - Unplaned air conditioning outage in Brno CERIT-SC
Dear users,
let us inform you that due to a unexpected air conditioning outage in Brno's CERIT-SC server room today in the morning, a part of local clusters zigur, zapat, and zebra has been switched off as a prevention of overheating. The computing nodes will be gradually returned back to normal operation. Unfortunatelly all running jobs on affected nodes have been terminated.
We apologize for any inconvience caused.
Ivana Krenkova, Mon Apr 11 03:50:00 CEST 2016
7.4.2016 - Power outage in JU's server room
Dear users,
Let us inform you that due to an unexpected power outage the clusters hermes/hildor/haldir are temporary unavailable.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
Ivana Krenkova, Thu Apr 07 01:50:00 CEST 2016
1.3.2016 - PBS server (sendmail) problem today
Dear users,
Let us inform you the sendmail of the PBS server sent not actual error reports about terminated jobs via e-mails today in the night.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
Ivana Krenkova, Tue Mar 01 01:50:00 CET 2016
2.3.-3.3.2016 - Planned system actualisation on archieval storage in Brno /storage/brno4-cerit-hsm/
Dear users,
Let us inform you that from Wednesday March 2 (9 AM) to March 3 (6 PM) the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to a security actualisation of the system.
*****************************************
IMPORTANT:
The HSM hosts data from Jihlava /storage/jihlava1-cerit/
*****************************************
Influence on the running jobs:
- the jobs that work with the data saved on (or will save data to) another disk array will not be influenced
- the jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures), and which will try to save the resulting data into /storage/jihlava1-cerit/, during the outage, will not be influenced as well (you'll find the resulting data in the scratch space of the relevant nodes)
- the jobs that work directly with the data saved in /storage/jihlava1-cerit/, or the jobs that will not check the success of copying-out the data into this array, will most probably crash. If you have some critical/long-term computations, that may be influenced by the outage, let us know -- we'll try to suspend your computation during the outage (however, the success of the suspend process cannot be guaranteed)
Ivana Krenkova,
Ivana Krenkova, Tue Feb 23 01:50:00 CET 2016
23.2.2016 10-11AM - Planned service maintenance of /storage/brno6/
Dear users,
Let us inform you that on Tuesday, February 23 the Brno's /storage/brno6/ will be unavailable due to battery replacement by the supplier.
Influence on the running jobs:
- the jobs that work with the data saved on (or will save data to) another disk array will not be influenced
- the jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures), and which will try to save the resulting data into /storage/brno6/ during the outage, will not be influenced as well (you'll find the resulting data in the scratch space of the relevant nodes)
- the jobs that work directly with the data saved in /storage/brno6/ or the jobs that will not check the success of copying-out the data into this array, will most probably crash. If you have some critical/long-term computations, that may be influenced by the outage, let us know -- we'll try to suspend your computation during the outage (however, the success of the suspend process cannot be guaranteed)
Moreover, the user interface (Sunstone) as well as the programming interface (API) for MetaCloud will be unavailable for several hours. Existing virtual machines will not be affected! It will be, however, impossible to create new ones or manage existing ones during the outage.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková
MetaCentrum & CERIT-SC
Ivana Krenkova, Tue Feb 16 01:50:00 CET 2016
12.2.2016 8AM - Hadoop cluster planned outage
Dear users,
Let us inform you that on Friday (February 12, 8:00 a.m.) the Hadoop cluster will be shortly unavailable due to SW upgrade:
- CDH 5.4.7 --> CDH 5.5.1 (with support of Hadoop 2.6.0 and Spark 1.5.0) and new Java 8
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Ivana Krenkova, Thu Feb 11 03:50:00 CET 2016
4.2.2016 11AM - Hadoop cluster planned outage
Dear users,
Let us inform you that on Thuersday (February 4, 11:00 a.m.) the Hadoop cluster will be shortly unavailable due to certificates change, machines reboot and preparation of the new experimental cluster based on containers.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Ivana Krenkova, Wed Feb 03 03:50:00 CET 2016
25.7.2016 10:00 AM - Hadoop cluster planned outage
Dear users,
Let us inform you that on Monday (July 25, 10:00 a.m.) the Hadoop cluster will be unavailable due to upgrade from CDH 5.5.1 to 5.8.0 (with Hadoop 2.6.0, and Spark 1.6.0) and due to Java environment upgrade.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Ivana Krenkova, Wed Feb 03 03:50:00 CET 2016
11.2.2016 - Planned MetaCloud upgrade
Dear users,
A long-planned upgrade of the OpenNebula cloud manager will take place on 11 February. The user interface (Sunstone) as well as the programming interface (API) for MetaCloud will be unavailable for several hours. Existing virtual machines will not be affected! It will be, however, impossible to create new ones or manage existing ones during the outage. Please accept our apologies for the inconvenience this may cause you.
Ivana Krenkova,
Ivana Krenkova, Thu Jan 28 01:50:00 CET 2016
23.-24. 1.2016 - Planned network upgrade in FZU AVCR in Prague
Dear users,
let us inform you that due to a planned upgrade of the network connection in the Institute of Physics of the Czech Academy of Sciences in Prague, the local clusters kalpa and luna + disk array /storage/praha4-fzu/ will be temporary unavailable at the veekend, 23-24 January.
We apologize for any inconvience caused.
Ivana Krenkova, Thu Jan 21 08:00:00 CET 2016
3.12.2014 - Unexpected power outage in Jihlava (clusters zigur a zapat)
Dear users,
let us inform you that due to todays unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Krenkova
MetaCentrum & CERIT-SC.
Ivana Krenkova, Thu Dec 03 03:50:00 CET 2015
21.10.2015 16:30 - Unexpected power outage in Brno UKB (clusters perian)
Dear users,
let us inform you that due to an unexpected power outage in Brno's server room in UKB the local cluster Perian was temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.
We apologize for any inconvience caused.
Ivana Krenkova, Wed Oct 21 03:50:00 CEST 2015
14.10.2015 5-11 PM - Kerberos service outage
Dear users,
Let us inform you that yesterday in the evening (17-23 hrs.) due to a violation of the integrity of the KDC server database that operates Kerberos, some of database records were temporary unavailable. Unfortunately it caused problems with operations requiring Kerberos (typically saving data from running jobs to a /storage etc.).
Ivana Krenkova,
Ivana Krenkova, Thu Oct 15 01:50:00 CEST 2015
9.10.2015 - MetaCloud outage
Dear users,
Let us inform you that the MetaCloud front-end is unavailable due to a HW fault in its storage array. Virtual machines created beforehand are still operational, but new ones cannot be instantiated and you also cannot manage existing machines through the cloud management interface (OpenNebula). Thank you for your patience.
Ivana Krenkova,
Ivana Krenkova, Fri Oct 09 01:50:00 CEST 2015
8.-9.10.2015 - Planned system actualisation on /storage/plzen1/ and GALAXY portal outage
Dear users,
Let us inform you that From October 8 to 9 the Pilsen's /storage/plzen1/ will be unavailable due to moving on a new hardware
*****************************************
IMPORTANT
Portal GALAXY, hosted on the storage will be unavailable during the outage.
*****************************************
Influence on the running jobs:
- the jobs that work with the data saved on (or will save data to) another disk array will not be influenced
- the jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures), and which will try to save the resulting data into /storage/plzen1/ during the outage, will not be influenced as well (you'll find the resulting data in the scratch space of the relevant nodes)
- the jobs that work directly with the data saved in /storage/plzen1/ or the jobs that will not check the success of copying-out the data into this array, will most probably crash. If you have some critical/long-term computations, that may be influenced by the outage, let us know -- we'll try to suspend your computation during the outage (however, the success of the suspend process cannot be guaranteed)
Ivana Krenkova,
Ivana Krenkova, Wed Oct 07 01:50:00 CEST 2015
18.8.-18.10.2015 - Planed service maintenance of zigur and zapat clusters and disk array /storage/jihlava1-cerit/
Due to HW problems (being solved with original supplier), the zigur and zapat clusters will be available 1 month later, in the second half of October.
With many thanks for understanding.
--
Dear users,
From August 18, due to moving to Brno, zigur and zapat clusters and disk array /storage/jihlava1-cerit/ will not be available temporarily.
The clusters are covered by maintenance contract therefore the move will be done by the original supplier, approx. time of moving is a month (144 nodes of cluster plus disk array).
- The clusters will not be available all the time (approx. 1 month). The walltime limit in the queues will decrease gradually to prevent running any job during the outage. Remaining running jobs will be killed on switching the machines off.
- Current data in /storage/jihlava1-cerit/ will be temporary available only for reading since August 14, 11 PM.
- The data will be moved to storage-brno4-cerit-hsm.metacentrum.cz (CERIT-SC's HSM), they will be available for reading and writing since August 18 via symlink /storage/jihlava1-cerit/home/$LOGIN.
- The link /storage/jihlava1-cerit/home/$LOGIN will point to /auto/brno4-cerit-hsm/fineus/home/$LOGIN (after having finished the data transfer)
- Afterwards, the disk array will be available in Brno as /storage/brno7-cerit/ (fineus-home.cerit-sc.cz). PLEASE NOTE: the original data will not be copied back; they will remain accessible in CERIT-SC's HSM. The users are recommended to move the date elsewhere. In case of huge data amount, please contact us at support@cerit-sc.cz to schedule optimal transfer.
Influence on the running jobs:
- the jobs that work with the data saved on (or will save data to) another disk array will not be influenced
- the jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures ), and which will try to save the resulting data into /storage/brno1 (/storage/home) during the outage, will not be influenced as well (you'll find the resulting data in the scratch space of the relevant nodes)
- the jobs that work directly with the data saved in /storage/brno1 (/storage/home) or the jobs that will not check the success of copying-out the data into this array, will most probably crash. If you have some critical/long-term computations, that may be influenced by the outage, let us know -- we'll try to suspend your computation during the outage (however, the success of the suspend process cannot be guaranteed)
With many thanks for understanding,
Ivana Krenkova
MetaCentrum & CERIT-SC
Ivana Křenková, Thu Oct 01 15:50:00 CEST 2015
21.9.2014 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)
Dear users,
let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Krenkova
MetaCentrum & CERIT-SC.
Ivana Krenkova, Mon Sep 21 03:50:00 CEST 2015
22.9.-23. 9.2015 - Planned system actualisation on archieval storage in Brno
Dear users,
Let us inform you that from Tuesday September 22 (10 AM) to Wednesday September 23 the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to an actualisation of the system.
*****************************************
IMPORTANT
The HSM hosts data from Jihlava /storage/jihlava1-cerit/ and older /storage/brno1/. We strongly recommend you to transfer all data used in your jobs to another storage (for example /storage/brno6). In case you need any data from these archieval storages during the outage, please inform us in advance via e-mail meta@cesnet.cz.
*****************************************
Influence on the running jobs:
- the jobs that work with the data saved on (or will save data to) another disk array will not be influenced
- the jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures), and which will try to save the resulting data into /storage/jihlava1-cerit/, /storage/brno1/ (/storage/home) during the outage, will not be influenced as well (you'll find the resulting data in the scratch space of the relevant nodes)
- the jobs that work directly with the data saved in /storage/jihlava1-cerit/, /storage/brno1/ (/storage/home) or the jobs that will not check the success of copying-out the data into this array, will most probably crash. If you have some critical/long-term computations, that may be influenced by the outage, let us know -- we'll try to suspend your computation during the outage (however, the success of the suspend process cannot be guaranteed)
Ivana Krenkova,
Ivana Krenkova, Wed Sep 16 01:50:00 CEST 2015
18. 9.2015 -? - Outage on archieval storage in Brno
Dear users,
Let us inform you that from September 18 the Brno's /storage/brno4-cerit-hsm/ is not available due to an SW failure of HSM system. Major software patches (bug fixes) will be applied by the system vendor.
IMPORTANT: The HSM hosts data from Jihlava /storage/jihlava1-cerit/ and older /storage/brno1/ (/storage/home)
Ivana Krenkova,
Ivana Krenkova, Wed Sep 16 01:30:00 CEST 2015
29.8.2015 - Power outage in Prague (frontend and cluster tarkil + /storage/praha1)
Dear users,
let us inform you that due to an unexpected power outage in Prague's server room the frontend and local cluster Tarkil, Mudrc, as well as the /storage/praha1 are temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.
We apologize for any inconvience caused.
Ivana Krenkova
MetaCentrum
Ivana Krenkova, Sat Aug 29 03:50:00 CEST 2015
24.-31.8.2015 - Planed service maintenance of doom cluster and disk array /storage/ostrava1/
Dear users,
Let us inform you that due to a power outage in Jihlava's server room today, the local cluster Doom, as well as the /storage/ostrava1/ are temporary unavailable. The computing nodes will be gradually returned back to normal operation later this day.
From August 24 to 31, due to moving to Brno, doom cluster and disk array /storage/ostrava1/ will not be available temporarily. Please copy to another disk storade date you will need for your calsulation during these few days.
With many thanks for understanding,
Ivana Krenkova
MetaCentrum
Ivana Křenková, Tue Aug 11 15:50:00 CEST 2015
22.6.2014 10-11 PM - Skirit frontend planned outage
Let us inform you that on Monday, June 22 10AM, the skirit frontend will be shortly unavailable due to an upgrade. All running processes on the frontend will be terminated during the outage.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, Fri Jun 19 23:00:00 CEST 2015
16.6.2015 10 - 12 AM - Planed power outage in Prague (frontend and cluster tarkil + /storage/praha1)
Dear users,
let us inform you that due to a planned outage of the network connection, frontend tarkil, cluster tarkil and disk array /storage/praha1/ will be temporally unavailable. Jobs running on the affected cluster or using the /storage/praha1/ will be temporarly suspended. Shortly before (and of course also during) the outage there will be no possibility to start a new job on the affected cluster.
Please, terminate all interactive jobs running from the tarkil frontend until Tuesday morning. All running processes on the frontend will be terminated during the outage.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Ivana Krenkova, Fri Jun 12 03:50:00 CEST 2015
25.6.2015 10AM - Hadoop cluster planned outage
Dear users,
Let us inform you that on Tuesday (June 25, 10:00 a.m.) the Hadoop cluster will be shortly unavailable due to a HW maintainance - replacing of CMOS battery on hador-c1.ics.muni.cz server.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Ivana Krenkova, Fri Jun 12 03:50:00 CEST 2015
18.5.2014 10-12 PM - Skirit frontend planned outage
Let us inform you that on Monday, May 18, the skirit frontend will be shortly unavailable due to an upgrade. All running processes on the frontend will be terminated during the outage.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum
Ivana Křenková, Thu May 14 23:00:00 CEST 2015
31.3.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)
Dear users,
let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.
We apologize for any inconvience caused.
Ivana Krenkova, Tue Mar 31 03:50:00 CEST 2015
24.-27.3.2015 - Scheduled downtime of the 'metacloud-dukan' cluster
Dear Users!
This is to inform you that there will be a scheduled downtime of the 'metacloud-dukan' cluster, part of the physical resources in MetaCloud. This will be the last in a series of outages that were required to extend, improve and physically move our cloud infrastructure. The downtime well begin on 24 March and end on 27 March. All virtual machines running on nodes dukan{1..10}.ics.muni.cz will be stopped. During the outage, the hypervisor will change from XEN to KVM, finally unifying hypervisors used on all resources across MetaCloud.
How to tell if the outage affects your virtual machines
Use the OpenNebula dashboard to display a list of all your virtual machines (Virtual Resources → Virtual Machines). The 'Host' column shows the physical node name for each VM. The outage will affect all virtual machines on nodes dukan{1..10}.ics.muni.cz. You may also filter the contents of the VMs table using the Search box on the top of the page.
What will happen with my virtual machines during the outage
All affected VMs must be stopped. It will be a great help to us if you can stop your own machines before end of business on Monday, 23 March. Otherwise, we will stop you VMs and move them to storage as the downtime you will be able to start your machines again. Since the hypervisor will change from XEN to KVM, some machines may fail to start properly. Therefore, do not hesitate to contact us in case any of your VMs acts strangely. Unfortunately, it is not possible to check for compatibility with KVM beforehand, and can be only done experimentally. Standard MetaCentrum images, however, are already tuned for KVM and are expected to cope without glitches.
Thank you for your understanding. Be assured that this is the last planned downtime for the foreseeable future.
Best regards, MetaCloud
Ivana Křenková, Tue Mar 10 15:50:00 CET 2015
3.3.2015 10-12 hod. - 3.12.2014: Unexpected power outage in Prague (cluster luna)
Dear users,
let us inform you that due to todays unexpected power outage in Prague's server room the local cluste luna is temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum .
Ivana Křenková, Tue Mar 03 15:50:00 CET 2015
13.1.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)
Dear users,
let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.
We apologize for any inconvience caused.
Ivana Krenkova, Tue Jan 13 03:50:00 CET 2015
10.1.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat)
Dear users,
let us inform you that due to todays unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Krenkova
MetaCentrum & CERIT-SC.
Ivana Krenkova, Sat Jan 10 03:50:00 CET 2015
- Possible problem of memory writes on zebra cluster
After moving nodes of the zewura SMP cluster (renamed to zebra1-12) to the new computer room some of the nodes appeare to exhibit very rare memory write failures under very intesive memory stress test. The problem is not reproducible, it occured only few times during several days of testing. We consider it almost impossible to occure in normal operation. The problem was reported to the supplier's technical support for futher detailed diagnostics.
Nodes are being returned to the normal operation. Despite the problems are not expected, we kindly ask the users for reporting any suspicious behaviour.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum & CERIT-SC.
Ivana Krenkova, Tue Dec 09 03:50:00 CET 2014
3. -4. 12. 2014 - Planned system actualisation on archieval storages in Pilsen and Brno
Let us inform you that from Wednesdey December 3 (8.30 AM) to Thuersday December 4 (20 PM) the Pilsen's /storage/plzen2-archieve/ and Brno's /storage/brno4-cerit-hsm/ will be unavailable due to an actualisation of the system. In case you need any data from these archieval storages during the outage, please inform us in advance via e-mail meta@cesnet.cz.
The other two archieval storages (/storage/jihlava2-archive and /storage/brno5-archive) will not be affected.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova
Ivana Křenková, Tue Nov 25 10:00:00 CET 2014
28.11.2014 9 - 13 PM - Planed power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)
Dear users,
let us inform you that due to a planned power outage in the Jihlava's server room, the local clusters with property 'jihlava' will be temporarly unavailable on Friday 28.11.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Krenkova
MetaCentrum & CERIT-SC.
Ivana Krenkova, Fri Nov 21 03:50:00 CET 2014
31.10.2014 - Data transfer finished -- brno3-cerit now in normal operation
Today morning, the transfer of brno3-cerit data (temporarily stored in Jihlava) has been finished -- the brno3-cerit storage is now in normal operation mode.
Attention: Under specific circumstances (particularly, when your jobs have been finishing during synchronization), some data may not been synchronized -- if so, you'll find your data in Jihlava's location, actually available via /auto/jihlava1-cerit/brno3/export/home/$USER (please, transfer the missing data on your own -- we'll delete them after a few weeks).
With best regards
Tom Rebok.
Tom Rebok, Fri Oct 31 16:33:00 CET 2014
29.-30.10.2014 - Returning data back to Jihlava -- short outage of brno3-cerit disk array
Since we managed to repair the array /storage/brno3-cerit, the data (temporarily hosted in Jihlava) will be returned back to Brno
*** on Wednesday, 29th of October ***
Since it is not possible to perform this transfer transparently, it is necessary to operate the /storage/brno3 array in a not fully consistent state for about 1-2 days.
To minimize the impacts of this transfer on you and your computations, it will be managed as follows:
- current brno3-cerit data (hosted in Jihlava) will be available in /auto/jihlava1-cerit/brno3/export/home/$LOGIN from Wednesday morning
- the data from Jihlava will be (already from Wednesday morning) also available in /storage/brno3-cerit/home/$LOGIN (however, this data may be a few days old -- the synchronisation will be performed on Monday evening).
Note: In the case of zewura and zegox clusters as well as zuphux frontend, this data are also located in /home/$LOGIN... - during Wednesday/Thursday, the rest of data from Jihlava (/auto/jihlava1-cerit/brno3/export/home/$LOGIN) will become available on Brno array (/storage/brno3-cerit/home/$LOGIN) -- then, the transfer will become completed.
Note: If you change particular data during Wednesday/Thursday in /storage/brno3/home/$LOGIN, the data can be overwritten by data synchronised/copied from Jihlava.
The running jobs should not be influenced by this transfer.
We are sorry for inconvenience.
With best regards and thanks for understanding,
Tomas Rebok,
MetaCentrum NGI.
Tom Rebok, Thu Oct 23 01:40:00 CEST 2014
4.10.2014 - Unexpected power otage in Ostrava (GPU cluster doom)
Dear users,
let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum
Tom Rebok, Sat Oct 04 11:05:00 CEST 2014
1. 10. 2014 9:00 - 16:00 - Planned system actualisation on /storage/brno4-cerit-hsm/
Hierarchical storage in Brno /storage/brno4-cerit-hsm/ will be inaccessible on October 1, 2014, from 9 AM till 16 AM (expected). Major software patches (bug fixes) will be applied by the system vendor.
Ivana Křenková, Wed Oct 01 13:11:00 CEST 2014
29.9.2014 - Unexpected outage of /storage/brno2, some fronteds, and nodes
Because of several SW problems that have recently occured, the disk array /storage/brno2/, some frontends and nodes were not working properly today. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused.
Ivana Křenková,
MetaCentrum
Ivana Křenková, Mon Sep 29 23:00:00 CEST 2014
26.9.2014 - Unavailability of /storage/brno3-cerit
Dear users,
let us inform you, due to an unexpected short power outage on the CERIT-SC server room last night (25.9., approx 9 PM) the the disk array /storage/brno3-cerit/ filesystem is not working properly. We work on data recovery at the moment. The user data (208 TB) are being coppied (temporary) to Jihlava (/auto/jihlava1-cerit/brno3/export), with expected time about 1 or 2 weeks (due to the huge volume of data). In case you need your data urgently, please contact us at meta@cesnet.cz, we will copy it with a higher priority.
Jihlava's disk array will serve temporary (during the Brno's disk array recovery) as /home for zewura and zegox clusters, and zuphux frontend. All accessible data will be available also via simlink /storage/brno3-cerit. All the data will return from Jihlava to Brno after the Brno's disk array recovery.
With apologies for the inconvenience and with thanks for your understanding,
MetaCentrum & CERIT-SC
Ivana Křenková, Fri Sep 26 15:00:00 CEST 2014
26.9.2014 - Unexpected outage of /storage/brno3-cerit
Dear users,
let us inform you, due to an unexpected short power outage, the disk array /storage/brno3-cerit/ is temporarly unavailable today. We work on data recovery at the moment. In case you need your data very urgently, please contact us at meta@cesnet.cz, we ensure copying your data to another disk storage.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, MetaCentrum
Ivana Křenková, Fri Sep 26 04:00:00 CEST 2014
19.8.2014 - Unexpected power otage in Ostrava (GPU cluster doom)
Dear users,
let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum
Tom Rebok, Tue Aug 19 11:05:00 CEST 2014
15. 8. 2014 14:45 - 22:00 - Unexpected power outage in Brno server rooms, some services may still not work (e.g., license server, portal)
Dear users,
today, another unexpected power outage has occured, this time in Brno server rooms. Because of this, the Brno part of MetaCentrum infrastructure has been paralyzed, including several central services hosted there (e.g., scheduler, license server, disk storages, ...). The jobs running during the outage had been unfortunately stopped.
Most of the nodes and services should be available now. However, a few power circuits couldn't be revived and a deeper inspection of power supplies should be performed in order to detect the failing ones -- thus, several services (e.g., license server and parts of the portal) still not work.
We're really sorry for the troubles caused -- unfortunately, we're pulling the shorter end of the rope in the fight "higher power" vs. man. :-(
Tom Rebok
MetaCentrum
Tom Rebok, Sat Aug 16 07:44:00 CEST 2014
15.8.2014 - Unexpected power otage in Ostrava (GPU cluster doom)
Dear users,
let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum
Tom Rebok, Fri Aug 15 11:05:00 CEST 2014
19.8.2014 11:00-13:00 - Skirit frontend planned outage
Let us inform you that on Tuesday (August 19, 11:00 p.m.) the skirit frontend will be shortly unavailable due to a SW upgrade. All running processes on the frontend will be terminated during the outage.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum
Ivana Křenková, Thu Aug 14 23:00:00 CEST 2014
7.8.2014 3:50 - 9:00 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)
Dear users,
let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Tom Rebok
MetaCentrum & CERIT-SC.
Tom Rebok, Thu Aug 07 11:05:00 CEST 2014
25.7.2014 14:00 - 14:30 - Connectivity problems in Pilsen
Today, around 2p.m., there were some unexpected connectivity problems observed at server rooms of the University of West Bohemia, which affected our pilsen nodes as well. The major problems were noticed between 2pm and 2:30pm, however, some consequent minor problems could be noticed even after that time.
The connectivity should be already restored. (Nevertheless, some related service works still occur...)
We apologize for any inconvenience caused.
Tomáš Rebok,
MetaCentrum & CERIT-SC.
Tom Rebok, Fri Jul 25 15:26:00 CEST 2014
28.4.2014 - Unexpected power outage in Jihlava
Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were partially temporarly unavailable. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum & CERIT-SC
Ivana Křenková, Mon Apr 28 14:00:00 CEST 2014
16.4.2014 16:00 - Unexpected outage of /storage/brno2 and fronted skirit
Because of several SW problems that have recently occured, the disk array /storage/brno2/ and frontend skirit are not working properly today again.
We apologize for any inconvenience caused.
Ivana Křenková, MetaCentrum
Ivana Křenková, Wed Apr 16 04:00:00 CEST 2014
10.4.2014 - Unexpected outage of /storage/brno2, some fronteds, and nodes
Because of several SW problems that have recently occured, the disk array /storage/brno2/, some frontends and nodes were not working properly today. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused.
Ivana Křenková,
MetaCentrum
Ivana Křenková, Thu Apr 10 23:00:00 CEST 2014
3/23/2014 23:00 PM - Zuphux frontend planned outage
Let us inform you that on Saturday (March 23, 23:00 p.m.) the zuphux frontend will be shortly unavailable due to a SW upgrade (Debian 6 -> Debian 7). All running proccesses on the frontend will be terminated during the outage.
You can use any of the other frontends during the outage:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum & CREIT-SC
Ivana Křenková, Wed Mar 19 23:00:00 CET 2014
25.-26. 2. 2014 - Service maintenance of the disk array /storage/brno1 (/storage/home)
Because of several HW/SW problems that have recently occured with the disk array /storage/brno1 (/storage/home), its complex service maintenance and SW upgrade has to be urgently performed.
Unfortunately, this maintenance cannot be performed on the live system; thus, the disk array has to be ***PUT OUT OF OPERATION*** (and made inaccessible)
on Tuesday, 25. February 2014 during morning hours
(The assumed shutdown duration is 1-2 days.)
Influence on the running jobs:
- the jobs that work with the data saved on (or will save data to) another disk array will not be influenced
- the jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Running_jobs_in_scheduler#Recommended_procedures ), and which will try to save the resulting data into /storage/brno1 (/storage/home) during the outage, will not be influenced as well (you'll find the resulting data in the scratch space of the relevant nodes)
- the jobs that work directly with the data saved in /storage/brno1 (/storage/home) or the jobs that will not check the success of copying-out the data into this array, will most probably crash. If you have some critical/long-term computations, that may be influenced by the outage, let us know -- we'll try to suspend your computation during the outage (however, the success of the suspend process cannot be guaranteed)
We're really sorry for the problems that may occur. Unfortunatelly, the current condition of the /storage/brno1 (/storage/home) disk array cannot be left untouched any more -- this would result in bigger problems in the future.
With many thanks for understanding
Tomáš Rebok.
Tom Rebok, Thu Feb 20 22:05:00 CET 2014
6. 1. 2014 - Unexpected power outage in Jihlava
Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum & CERIT-SC
Ivana Křenková, Mon Jan 06 14:14:00 CET 2014
5. 11. 2013 - Unexpected power outage in Jihlava (Zigur and Zapat clusters)
Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable.The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková, Tue Nov 05 15:17:00 CET 2013
1. 10. 2013 - Outage in Brno, October 1, 2012
All computing nodes located in the computing room of ICS MU (with property "brno", except machines zewura [1-8]) will be down on Tuesday October 1st due to works on electric network extension for expected new cluster of the CERIT-SC center.
Long jobs queues (more than 4 days) were disabled on that clusters. All the other queues will be disabled later. Running jobs will be killed on switching the machines off. Please finish all jobs until end of September. Running jobs will be killed on switching the machines off.
At the same time, the frontend skirit.ics.muni.cz will not be available during the outage.
We are sorry for temporary unavailability of the resources.
Ivana Křenková, Thu Sep 26 16:17:00 CEST 2013
9. 9. 2013 9:00 - 17:00 - Planned system actualisation on /storage/plzen2-archieve/
On Monday between 9:00 a.m. and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ will be unavailable due to an actualisation of the system.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, Tue Sep 03 13:11:00 CEST 2013
7.8.2013 11:45PM - Short power outage at Jihlava
The following machines were affected: zapat23 zapat98 zapat99 zapat100 zapat101 zapat111 zigur1 zigur3 zigur28 zigur30 zigur31
Martin Kuba, Thu Aug 08 11:41:00 CEST 2013
29. 7. 2013 - Power outage in Jihlava's server room
Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat and disk array /storage/jihlava1-cerit are temporarly unavailable. Unfortunatelly all running jobs have been terminated.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, Mon Jul 29 10:00:00 CEST 2013
10. 8. 2013 7:00 - 10.00 - Planned system actualisation on /storage/plzen2-archieve/
Let us inform you that today between 14:00 and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ can be shortly unavailable due to an actualisation of the system.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, Tue Jul 09 10:00:00 CEST 2013
18. 6. 2013 10.00 - Skirit frontend outage
Let us inform you that on Tuesday (June 18, 10:00 a.m.) the skirit frontend will be shortly unavailable due to a HW upgrade. At the same time the system will be upgraded (Debian 5 -> Debian 6).
You can use any of the other frontends during the outage:
- hermes.metacentrum.cz Debian 5.0
- tarkil.cesnet.cz Debian 6.0
- nympha.zcu.cz Debian 6.0
- minos.zcu.cz Debian 6.0
- perian.ncbr.muni.cz Debian 6.0
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, Sun Jun 16 10:00:00 CEST 2013
18. 6. 2013 10.00 - Skirit frontend outage
Let us inform you that on Tuesday (June 18, 10:00 a.m.) the skirit frontend will be shortly unavailable due to a HW upgrade. At the same time the system will be upgraded (Debian 5 -> Debian 6).
You can use any of the other frontends during the outage:
- hermes.metacentrum.cz Debian 5.0
- tarkil.cesnet.cz Debian 6.0
- nympha.zcu.cz Debian 6.0
- minos.zcu.cz Debian 6.0
- perian.ncbr.muni.cz Debian 6.0
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, Sun Jun 16 10:00:00 CEST 2013
17. 5. 2013 - Air condition outage in Plzen server room
Let us inform you that due to an unexpected event on air condition in the Pilzen's server room and overheating of the local clusters, machines Gram, Minos, Nympha, Konos, Ajax, and disk array /storage/plzen1 are unavailable from todays evening.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, Fri May 17 10:10:00 CEST 2013
16. 5. 2013 - Brno's disk array outage (/storage/brno1)
Dnes došlo v důsledku servisního zásahu dodavatele k neplánovanému výpadku staršího brněnského diskového pole. Dočasně není dostupný /storage/brno1, /afs a SW moduly. Omlouváme se za nepříjemnosti.
Petr Hanousek, Thu May 16 12:00:00 CEST 2013
5. 3. 2013 - New trouble ticketing system
On 5th March 2013 from 9:00 till approx 12:00 will be unavailable our trouble ticketing system (RT - rt3.cesnet.cz) due to necessary upgrade. During the outage will not be accessible neither the web nor the mail interface. E-mails sent during the outage (ie. for address meta@cesnet.cz) will be delivered after its end. We appologize for the half-day late response on requests.
Petr Hanousek, Tue Mar 05 17:08:00 CET 2013
22. - 25. 10. 2012 - Scheduled downtime in Pilsen
All computing nodes located in the computing room of ZČU (ajax, konos, minos[20-35], nympha) will be down for the period October 22-25 due to moving to the new server room. Currently jobs are held in queues. Running jobs will be killed on switching the machines off.
We are sorry for temporary unavailability of the resources.
Ivana Křenková, Mon Oct 22 16:25:00 CEST 2012
10.-11.10.2012 - Reconstruction of electrical wiring in Pilsen - afterworks
The takeover of work on switching Pilsen's UL011 to energocentrum was revealed serious defect - failure of some support systems (measurement and control). The repair take unfortunately another switch off (killing of running jobs). The works will take place on the night of Wednesday to Thursday, October 10, 2012 (21:00 - 5:00). Sorry for the inconvenience.
Petr Hanousek, Tue Oct 02 16:21:00 CEST 2012
14.9.2012 - Filled volume /storage/brno1
Volume /storage/brno1 is filled to 100 percent. Moreover, there is also probably damaged the file system, so the volume is not currently suitable for working with the data. Please use the volumes /storage/brno2 (11TB available) and /storage/plzen1 (27TB available) for your work. Unfortunately I cannot estimate the time needed for repair so far.
In this context I would like to ask you to delete all unnecessary files stored in mentioned volumes.
Petr Hanousek, Fri Sep 14 16:20:00 CEST 2012
19. - 20.9.2012 - Reconstruction of electrical wiring in Pilsen vol 2
On the night of 19 on September 20, 2012 will be reconstructed the wiring in a server room in Pilsen. Machines will be switched off in Wednesday 19th in the afternoon, launch is anticipated in Thursday 30th in the morning. From Thursday morning should be finally available the "long" queue on affected machines.
Besides mentioned clusters will be also unavailable disk volume /storage/plzen1.
We apologize for the temporary inconveniences.
Petr Hanousek, Thu Sep 13 16:09:00 CEST 2012
29.8.2012 - Delayed reconstruction of electrical wiring in Pilsen
Reported outage for tomorrow is canceled because of problems at the supplier's works. We will inform you about newly planned suspension through this channel. 'Long' queue on affected machines will remain closed for now.
Petr Hanousek, Wed Aug 29 16:05:00 CEST 2012
29.8. - 30.8.2012 - Reconstruction of electrical wiring in Pilsen
On the night of 29 on August 30, 2012 will be reconstructed the wiring in a server room in Pilsen. Machines will be switched off in Wednesday 29th in the afternoon, launch is anticipated in Thursday 30th in the morning. The "long" queue is already suspended for taking jobs on these machines, all possibly running jobs will be killed in the time of power down.
Besides mentioned clusters will be also unavailable disk volume /storage/plzen1.
We apologize for the temporary inconveniences.
Petr Hanousek, Wed Aug 22 11:27:00 CEST 2012

