Outages

You can read this as RSS feed.

11.12.2024 - /storage/brno12-cerit/ and frontend zuphux outage

Actualization 10:50 AM

the storage is back in operation

--

Dear users,

currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.

If possible, use other storage and frontends for now.

 

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 11. 12. 2024

18.-22.10.2024 - Unplanned outage of the network connection in Pilsen at the NITS hall

Dear users,
Since this afternoon, due to a network connection failure, the clusters konos and kubus, located in the NTIS hall, are unavailable. A new switch will be provided within the next week.
In the meantime, if possible, use other machines at other locations
 
Thanks for your understanding,
MetaCentrum Team
 

 



 


Ivana Křenková, 18. 10. 2024

12.10.2024 - /storage/brno12-cerit/ outage

update 1PM: 

the disk array is back in operation

--

Dear users,

currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.

If possible, use other storage and frontends for now.

 

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 12. 10. 2024

18-19.8.2024 - /storage/brno12-cerit/ and frontend zuphux outage

update 26.8., 15 PM: the disk array is back in operation and the data should be readable. Please report any problems. Thank you for your understanding.

 

update 26.8. from 10:30 AM: during this morning the disk array will be briefly unavailable, we are trying to re-access unreadable data. We apologize for the inconvenience.

 

update 20.8.:

We regret to inform you that we have been experiencing significant hardware issues with the /storage/brno12-cerit/ directory since Sunday.

A small part of the data in /storage/brno12-cerit is now inaccessible due to a failure on one of the disk arrays, attempting to read it is showing up as an Input/Output error (in terms of blocks of data this is about 1.1%, but since large files over 4MB are spread across multiple devices it is more likely that at least some of them are affected). The fault is being addressed by the manufacturer's support. So far, the data is not definitively lost, but we don't currently know when it will be made available, or whether it will all be OK in the end. If you need some of them quickly, it may be more efficient reload the data (if it was primary input) or recalculate what is needed.

Otherwise, right now /storage/brno12-cerit is running normally, and there's no particular reason to assume that other data is more at risk than usual (however, given the size of the repository, this is not independently backed up, certainly it is not intended for archival or otherwise irreplaceable data), except that there may still be some limitations on operation while the broken piece of hardware is repaired.

Please note that due to the priority to increase the maximum capacity offered, it is not possible to perform a full backup of all data on storage of this size.
To ensure full backups we would need to at least double the funding to purchase suitable HW. As the archive purposes cover the disk arrays of the CESNET Data Care departement, and the branch repositories are also being prepared within the EOSC project, we only backup on our disk arrays in the form of snapshots. These offer some protection in case a user inadvertently deletes some of his files. In general, data that existed same days before the accident can be restored. However, snapshots are stored on the same disk arrays as the data itself, so in the event of a hardware failure these backups may be lost :-(  
https://docs.metacentrum.cz/data/metacentrum-backup/

We are very sorry, we try to do our best to get back the lost data together with the HW vendor.

If you need it very urgently, please send the jobs to the system once again. We are able to make your priority higher (to start jobs as soon as possible), if needed.

Thank you for your understanding.

everything-fails-all-the -time-amazon

-.

update 19.8.: update 19.8.: the disk array is only working in limited mode, with short outages. If possible, limit work on this array. We are trying to stabilize the situation.

update 18.8. at 8PM: the storage is back in operation 

--

Dear users,

currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.

If possible, use other storage and frontends for now.

 

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 18. 8. 2024

27.6.2024 - Unplanned network failure in Brno

Dear user,

A while ago, there was a network failure on the local network in Brno (a broken cable in Mendel University), which caused the unavailability of some computing clusters in this location (tyra+aman+zenon). We have reported the outage and are waiting for a replacement internet connection.

 
With apologies and thanks for your understanding
MetaCentrum team


Ivana Křenková, 27. 6. 2024

from January 2024 - Deccomission of archive /storage/du-cesnet/


In the archive repository /storage/du-cesnet/ (du4.cesnet.cz) a mechanical failure of the tape robot occurred in winter. Data is still being transferred to the object storage and access to the data on the tapes is very limited. After discussion with DU colleagues, we removed access to the mentioned storage from our machines (to speed up the transfer). If you need your data as a priority, please contact CESNET data storage at du-support@cesnet.cz.

We apologize for the inconvenience.

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 24. 5. 2024

23.5.2024 - /storage/brno12-cerit/ and frontend zuphux outage

update: 23.5. at 9:30 a.m. back in operation

--

Dear users,

currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.

If possible, use other storage and frontends for now.

 

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 23. 5. 2024

13.5.2024 - /storage/brno12-cerit/ and frontend zuphux outage

Update May 13, 11:30: storage is fully back in operation

---

Dear users,

currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.

If possible, use other storage and frontends for now.

 

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 13. 5. 2024

19-24.4.2024 - Scheduled maintenance of network

Dear users,

on 19 - 21 April and 24 April in the afternoon/evening/night hours, software upgrades will take place in the backbone routers of the network. The outage will be at the times indicated and between 30 - 60 minutes (see attached schedule).

=======================================================================  

*Friday 19.4.2023 17:00 - 21:00 * - Prague-Sitel, Plzeň1,2  

*Friday 19.4.2023 20:00 - 00:00* - Jihlava  

*Saturday 20.4.2023 15:00 - 19:00* - Prague - ÚMG - UJV Řež  

*Saturday 20.4.2023 19:00 - 00:00* - Olomouc1,2 - České Budějovice  

*Sunday 21.4.2023 00:00 - 05:00 - *Prague1 - Brno1

*Wednesday 24.4.2023 00:00 - 05:00 - *Praha2 - Brno2
 

 

We apologize for any inconvenience,

MetaCentrum

 


Ivana Křenková, 19. 4. 2024

11.3.2024 up to 6PM - Scheduled maintenance of the MetaCentrum Cloud

Dear user of MetaCentrum Cloud [1],

Today 11.3.2024 (Monday) in the morning and part of the afternoon (until approx. 18:00) the new instance of e-INFRA CZ G2 OpenStack cloud in Brno [1] is and will continue to be unavailable, there was an unplanned outage caused by planned cloud maintenance. Outage affects all API services, already running virtual servers remain functional. The main G1 OpenStack cloud in Brno [2] is not affected.

 

[1] https://brno.openstack.cloud.e-infra.cz/

[2] https://cloud.metacentrum.cz/ https://cloud.muni.cz/

 

We apologize for any inconvenience,

MetaCentrum Cloud team

 


Ivana Křenková, 11. 3. 2024

7.3.2024 - /storage/brno12-cerit/ and frontend zuphux outage

Status update: as of 10AM, the disk array is back will full functionality

 

Dear users,

currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.

If possible, use other storage and frontends for now.

 

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 7. 3. 2024

7.2.2024 - /storage/brno12-cerit/ and frontend zuphux outage

update 11:50 AM - the disk array is now fixed and available again

 

Dear users,

currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.

If possible, use other storage and frontends for now.

 

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 7. 2. 2024

3. 2. 2024 from 9 AM - Short outage of /storage/brno2/

Due to maintenance there will be a short outage on the /storage/brno2/ disk array on Saturday 19. 2. from 9 am.

During the outage it won't be possible to log in to the skirit, perian and onyx frontends and the PBS server meta-pbs.metacentrum.cz won't submit new jobs to the Brno cluster.


OnDemand will also be affected (using the home directory of /storage/brno2/).


We apologize for any inconvenience.

MetaCentrum

 

 


Ivana Křenková, 2. 2. 2024

11. 1. 2024, 15-15:45 - brno2 outage

Dear users,

currently the brno2 storage is down due to yet unspecified disc error. This means also the skirit frontend is not accessible.

We are investigating the cause. If possible, use other storages and frontends meanwhile.

Thank you for your understanding,

your MetaCentrum Team

 

 



 


Ivana Křenková, 11. 1. 2024

24/08/2023 - Planned outage of Galaxy

Dear users,

The https://usegalaxy.cz service will be migrated to the more stable environment of VMWare cluster on Thursday Aug 24. Existing user data will be migrated as well.

The service will become unavailable from 10 am CEST (after that time we do not guarantee correct migration of newer data, though), and the outage is expected to end in early afternoon. However, the IP address and DNS records are going to be changed as well, their propagation will take some time. Therefore, the service is expected to be fully available again from Friday Aug 25.

With apologies and thanks for understanding
Galaxy MetaCenter Team

 

 



 


Ivana Křenková, 23. 8. 2023

1/08/2023 - Planned outage of elmo frontend

Dear users,

on the 1st of September the elmo.elixir-czech.cz will be on downtime.

To access computational resources, please use any other frontend, see https://docs.metacentrum.cz/basics/concepts/#frontends-storages-homes

With apologies and thanks for understanding
MetaCenter Team

 

 



 


Ivana Křenková, 1. 8. 2023

14.07.2023 4PM - Planned outage of data connection in Pruhonice

Dear user,

This afternoon (14 July) after 4PM there will be a short outage of data connection in Průhonice (ibot cluster). We have limited the submission of new jobs to this cluster, we will resume traffic as soon as the network connection is restored.

Running jobs that copy output back to the disk array will fail to do so, and data will remain in the scratch on the appropriate node where it was running. The data on the compute nodes can be accessed from any frontend using the following shortcut:
     

     go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME
     FOR EXAMPLE 
     tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz

With apologies and thanks for understanding
MetaCenter Team

 

 



 


Ivana Křenková, 14. 7. 2023

7-10.7.2023 - Unplanned disk array failure /storage/brno1-cerit/

Update: the storage is slow, we are working on a fix

------

Dear user,

Today afternoon (7 July) there was a HW failure of the /storage/brno1-cerit/ disk array. We are working on getting it back up and running in cooperation with the supplier.

Running jobs that copy output back to the array fail to do this, and the data remains in the scratch on the appropriate node where it was running. To access the data on the compute nodes, use the following shortcut:

     go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME
     FOR EXAMPLE 
     tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz

You can use other frontends (https://wiki.metacentrum.cz/wiki/Frontend) and disk arrays during the outage.
 
With apologies and thanks for understanding
MetaCenter Team

 

 



 


Ivana Křenková, 7. 7. 2023

20.6.2023 5-10PM - Scheduled maintenance of the MetaCentrum Cloud

Dear user of Cloud MetaCentrum [1],

There will be a reconfiguration of the Metacenter OpenStack cloud block storage in order to increase its capacity scheduled on Tuesday 20.6. between 5:00 PM and 10:00 PM CET.
From our experience we know that even little configuration change may cause a short outage (10-30 minutes) in relation with approximately 3K volumes that are now allocated. VMs operations will not be affected, the Main OpenStack API will be available as well as the Horizon UI, Cinder block storage and API will be temporarily unavailable preventing volumes creation.

We apologize for any inconvenience,

MetaCentrum Cloud team

[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz


Ivana Křenková, 20. 6. 2023

19. 6. 2023 - Hardware failure of the storage brno2

Dear users,

we are sorry to announce that due to hardware failure the storage brno2 is down.

Consequently it is not possible to log in to frontends skirit, perian and onyx.

Currently we cannot tell whether/when the storage will be up again.

We will update you in this matter as soon as possible.

If you have any questions concerning your data and running jobs, contact us at meta@cesnet.cz.

We are very sorry for the inconvenience,

your MetaCentrum team.

 

 

 


Ivana Křenková, 19. 6. 2023

12.-15. 5. 2023 - Planned outage of luna cluster, luna frontend and the storage-praha6-fzu disk array

Dear users,

On 12-15 May, there will be a planned shutdown of most servers in the server room at the FZÚ AV ČR due to regular annual inspection of the electricity. The outage will include all nodes of the luna cluster, including the luna frontend and the storage-praha6-fzu disk array. The outage will also be used to replace faulty RAM in some servers.

We apologize for the inconvenience,

Your MetaCentrum support team.
 

 

 

 


Ivana Křenková, 4. 5. 2023

18-24.3.2023 - Unplanned disk array failure /storage/brno2/

 Update 03/27/2023: There is another problem, it will be fixed in a few hours. Please be patient. The disk array was returned to service in the afternoon the same day.

 

Update 03/24/2023: The /storage/brno2/ disk array is back in full operation. Data remains intact.

-----------

Dear user,

On Saturday afternoon (18 March) there was a HW failure of the /storage/brno2/ disk array. We are working on getting it back up and running in cooperation with the supplier. We are not yet able to say when the array will be operational. The supplier is proceeding carefully so that we do not lose the stored data.

It is not possible to log in to frontends where this array serves as /home (skirit, onyx) and the disk array cannot be accessed from elsewhere (from other frontends or nodes). OnDemand is also affected.

Running jobs that copy output back to the array fail to do this, and the data remains in the scratch on the appropriate node where it was running. To access the data on the compute nodes, use the following shortcut:

     go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME
     FOR EXAMPLE 
     tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz

You can use other frontends (https://wiki.metacentrum.cz/wiki/Frontend) and disk arrays during the outage.
 
With apologies and thanks for understanding
MetaCenter Team

 

 



 


Ivana Křenková, 18. 3. 2023

20-21.10.10.2022 - Unplanned network failure in Brno

update

Metacentrum OpenStack (CESNET_MCC), Status 2022-10-21 9:00

Openstack is functional, but limited amount of servers/hypervisors running around 40 VMs are without a network. We are working on VM migrations where possible.

---

Dear user,

Today we are experiencing numerous short-term outages on the local network in Brno, which are causing short-term unavailability of the cerit-pbs scheduling system and some machines. The cause is being investigated by local network specialists.

With apologies and thanks for your understanding
MetaCentrum team


Ivana Křenková, 20. 10. 2022

1.9.2022 - Planned outage of lex, krux, zubat cluster and brno14-ceitec storage

Dear users,

on Thursday 1st of September there will be power outage in the CEITEC server room. Consequently the cl;usters krux, lex and zubat as well as brno14-ceitec storage will be inaccessible. The downtime is planned to last between 5 a.m. and 12 a.m.

Jobs running on the affected clusters will be held by PBS to be run after the outage is over and no action on users' side is needed.

Jobs running elsewhere may be affected if they copy data to/from brno14-ceitec storage while the storage is down. If your jobs fail due to this reason at start, resubmit them after the outage is over. If your finising jobs fail due to the inability to copy results to brno14-ceitec, please fetch the files manually from scratch directory.



We apologize for the inconvenience,

your MetaCentrum support team.
 

 

 

 


Ivana Křenková, 23. 8. 2022

14.7.2022 - Planned outage of /storage/liberec3-tul, charon frontend and charon cluster

Dear users,

on Thursday 14th July there will be power outage due to maintenance in the facilities of Technical university of Liberec. Consequently  /storage/liberec3-tulcharon.nti.tul.cz frontend and charon cluster will be powered down. The downtime is planned to last the whole day.

No action is needed on the users' side. Jobs whose walltime would collide with the start of downtime will be held by PBS to be run after the outage is over.

We apologize for the inconvenience,

your MetaCentrum support team.
 

 


Ivana Křenková, 11. 7. 2022

1.7.2022 - Unplanned outage of the old /storage/brno6/ (disks failure)

Dear users,

Due to an unplanned crash of the /storage/brno6/ disk array, which we were going to shut down in the next few days due to its age, we are forced to speed up this process. Most of your data from the /storage/brno6/ array can be found in the /storage/brno2/home/LOGIN/brno6/ directory.

The last full synchronization took place during the night from Wednesday to Thursday, and another partial synchronization took place during the downtime. Some of the data you uploaded to the array in the last few hours may not have been copied yet.

If we can get the old array back up and running, we will try to sync the newest data. Finally, the /storage/brno6/ disk array HW will be decommissioned without replacement, for working with data in Brno please use the /storage/brno2/ disk array, where the data have been transferred or any other disk array available in MetaCenter. Symlink /storage/brno6/ leads to the old field in the violation and will be deleted together with the HW shutdown.

 
We apologize for any inconvenience,

MetaCentrum

 


Ivana Křenková, 1. 7. 2022

24.6.2022 2-4PM - Scheduled maintenance of the MetaCentrum Cloud

Dear user of Cloud MetaCentrum,

There is planned load and performance cloud infrastructure testing scheduled on Friday 2022-06-24 from 14:00 to 16:00 (CEST).

Planned testing scenarios should not affect/interrupt any cloud functionality, but will result in extensive infrastructure load visible to end users as additional OpenStack API and UI latences.

We apologize for any inconvenience,

MetaCentrum Cloud team

[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz


Ivana Křenková, 23. 6. 2022

2.6.2022 - HW upgrade of the following disk arrays: /storage/praha1/ = /storage/vestec1-elixir/

update 3. 6. 2022 3 PM

After upgrading the disk array, there were problems with the new file system. The problem has been fixed and the array is available again, you can start using it.

 

Disk array upgrade of the  /storage/praha1/ = /storage/vestec1-elixir/

On Thuersday, June 2, the disk arrays will be upgraded in Prague (capacity, redundancy, and speed increase), during which it will be necessary to stop the arrays for a short time.

If everything goes according to plan, short outages of the storage-vestec1 (= praha1) array can be expected. In the coming days, there should be a significant increase in available capacity.

We will try to minimize the impact on running jobs as much as possible.

At the same time, the quota for the size of stored data will be increased to 0.5T -> 2TB and the quota for the number of files to 2 million.

 

With apologies for the inconvenience and with thanks for your understanding.

Yours,

MetaCentrum

 


Ivana Křenková, 24. 5. 2022

23.5.2022 - Unplanned power failure in Brno

update 24. 5. 2022

All OpenStack services are now available after the unplanned power outage from 2022-05-22.

You may now start your VMs. If you experience any issues, please contact us at cloud@metacentrum.cz.

We apologize for any inconvenience.

 

 --

Dear user,

During the night of 22nd to 23rd May, there was an unplanned power failure in data centre A510 (FI MU Brno). The backup power supply did not come on.

Most of the systems in the datacenter are running again, the problem occures in MetaCentrum Cloud.

The outage also affects the zuphux.cerit-sc.cz frontend, some clusters and Rancher (Kubernetes), which run from the cloud.

 

We apologize for any inconvenience,

MetaCentrum team

 


Ivana Křenková, 23. 5. 2022

13.4.2022 12AM -8PM - Scheduled outage of the MetaCentrum Cloud

Dear user of Cloud MetaCentrum,

On Wednesday, April 13, 2022, at 12:00 AM to 8:00 PM, a power outage is planned for part of the A510 datacenter. The outage should be uneventful (thanks to the backup power supply) and should last 1-2 hours. We do not anticipate any issues, but during a full outage, selected user vm's in openstack may be unavailable.

 

We apologize for any inconvenience,

MetaCentrum Cloud team

[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz


Ivana Křenková, 12. 4. 2022

7.-8.4.2022 - Scheduled outage of the MetaCentrum Cloud

Update:

The MetaCentrum OpenStack cloud [1] is experiencing an unplanned series of network outages after yesterday's reconfiguration of HW network elements. The estimated time when outages may still occur is Friday, April 8, 2022 from 8:00 AM to 8:00 PM.

This is an extension of the announced outage scheduled for April 7, 2022.

Thank you for your understanding,
MetaCenter Cloud Team

--

Dear user of Cloud MetaCentrum,

Let us inform you that Metacentrum OpenStack cloud [1] planned networking maintenance is scheduled on Thursday 2022-04-07 from 7:00 to 20:00 (CEST). We plan to improve network stability by upgrading cloud network switches firmware and reconfiguration. We expect OpenStack cloud API and UI functionality will be unaffected. Selected cloud hypervisors (and there located cloud user VMs) may suffer from short networking outages.

 

We apologize for any inconvenience,

MetaCentrum Cloud team

[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz


Ivana Křenková, 6. 4. 2022

28.3.2022 - Scheduled outage storage-praha5-elixir disk array

On Monday, March 28, the storage-praha5-elixir disk array will be upgraded (capacity, redundancy, and speed increase, OS upgrade, IP addresses change). The storage will be temporarily shut down during the upgrade. Occasional unavailability of the storage can be expected during the day. We do not recommend using the field at that time.

Sorry for the inconvenience,
MetaCentrum
 


Ivana Křenková, 22. 3. 2022

4.3.2022 2-4 PM - Scheduled outage of the MetaCentrum Cloud

Dear user of Cloud MetaCentrum,

Let us inform you that Metacentrum OpenStack cloud [1] planned outage is scheduled on Friday 2022-03-04 from 14:00 to 16:00 (CET). The planned cloud improvements are migration of core controller servers to another resource pool and also production ipv6 address support.

We expect OpenStack cloud API and UI downtime will be up to 15 minutes. Users' running virtual servers will not be affected.

We apologize for any inconvenience,

MetaCentrum Cloud team

[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz


Ivana Křenková, 2. 3. 2022

26.1.2022 - HW upgrade of the following disk arrays: /storage/praha1/, /storage/vestec1-elixir/, and /storage/praha5-elixir/

Disk array upgrade of the  /storage/praha1/, /storage/vestec1-elixir/, and /storage/praha5-elixir/

On Wednesday, January 26, the disk arrays will be upgraded in Prague (capacity increase), during which it will be necessary to stop the arrays for a short time.

If everything goes according to plan, short outages of the storage-vestec1 (= praha1) array in the morning and storage-praha5-elixir in the afternoon can be expected. In the coming days, there should be a significant increase in available capacity.

We will try to minimize the impact on running jobs as much as possible.


With apologies for the inconvenience and with thanks for your understanding.

Yours,

MetaCentrum

 


Ivana Křenková, 25. 1. 2022

21.1.2022 - Cluster krux, zubat, lex outage

Cluster krux, zubat, lex, frontend perian and brno9-ceitec outage

Last night, there was a cooling failure in the CEITEC server room, where the krux, zubat and lex computing nodes are located. These clusters are temporarily down. They will be returned back to operation after the cooling fault has been rectified.

With apologies for the inconvenience and with thanks for your understanding. 

Yours,

MetaCentrum

 


Ivana Křenková, 21. 1. 2022

12.1.2022 - Scheduled outage of the MetaCentrum Cloud

Dear user of Cloud MetaCentrum,

let us inform you about the planned upgrade of cloud 'Cloud MetaCentrum' (OpenStack) infrastructure which is scheduled on 12.1.2022 (DD.MM.YYYY) from 9:00 to 16:00. This upgrade is due preparation for adding support of IPv6 protocol.

We don't expect any issues. But any feedback about problems during upgrade is welcome.

We apologize for any inconvenience,

MetaCentrum Cloud team


Ivana Křenková, 10. 1. 2022

16.12.2021 - Cluster krux, zubat, lex, frontend perian and brno9-ceitec outage

Cluster krux, zubat, lex, frontend perian and brno9-ceitec outage

On Thursday 16th starting at 7:00 a.m. there will be planned power outage in CEITEC server room. Consequently the clusters krux, zubat and lex, as well as perian frontend and brno9-ceitec storage, will be down. The planned outage duration is till 12 a.m.

With apologies for the inconvenience and with thanks for your understanding.

Yours,

MetaCentrum

 


Ivana Křenková, 13. 12. 2021

1.-2.12.2021 - HW upgrade of the /storage/brno6/

 

HW upgrade of the /storage/brno6/

From Wednesday December 12 (6 PM) to Thuersday December 2 (12 AM), the old disk array /storage/brno6/, will be upgraded to a new hardware. Try to limit the work on this disk array. Running processes that use long-running files directly in /storage/brno6 may crash after switching. 

  • During the synchronization, the /storage/brno6/ will be fully accessible (RW), except the final synchronization the last day.
  • After copying is completed, the new disk array will be available on the same symlink as the old disk array, from the user's point of view, nothing changes:
/storage/brno6/
  • After the upgrade, the data will be physically located in the following storage (the name remains the same as in the past):
    storage-brno6.metacentrum.cz
    

Influence on the running jobs:

  • The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.


With apologies for the inconvenience and with thanks for your understanding.

Yours,

MetaCentrum

 


Ivana Křenková, 30. 11. 2021

21.10.2020 - Scheduled outage of the MetaCentrum Cloud

Dear user of Cloud MetaCentrum,

Let us inform you about the planned outage of the API and dashboard component in cloud 'Cloud MetaCentrum' (OpenStack). This scheduled outage is due to an reverse proxy upgrade. This outage affects API and dashboard access to Openstack, virtual machines should not be affected. The outage is scheduled on 21.10.2021 (DD.MM.YYYY) in the time of 8:30 am - 16:00 am CEST (UTC+2:00).

We apologize for any inconvenience,

MetaCentrum Cloud team


Ivana Křenková, 14. 10. 2021

5.10.2021 - Unexpected outage of /storage/budejovice1/ and cluster hiildor

Dear users,
 

The disk array /storage/budejovice1/home / and cluster hidlor are temporarily unavailable due to an unplanned power failure.

We try to locate and correct the defect in cooperation with local administrators.  

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 5. 10. 2021

5.10.-7.10.2021 - Luna cluster, luna frontend and storage-praha6-fzu planned outage

Dear Metacentrum users,

due to hardware upgrade there will be a planned outage from Tuesday 5th october, 7 a.m., till Thursday 7th october, 12 a.m. The luna cluster, luna frontend and storage-praha6-fzu will not be available during the outage.

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 4. 10. 2021

27.8.2021 - Unexpected outage of /storage/budejovice1/

Dear users,
 

The disk array /storage/budejovice1/home / is temporarily unavailable due to an unplanned network failure.

We try to locate and correct the defect in cooperation with local administrators. The /storage/budejovice1/  is temporarily unavailable. The storage itself is fully functional, you just can't access the data. We are unable to estimate downtime at this time.

 

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 26. 8. 2021

29.7.-1.8.2021 - HW upgrade of the /storage/brno2/

Updated July 30, 2021

Data is transferred to the new HW, in case of problems do not hesitate to contact.

Quotas have been set for the number and size of files, by default 3 TB and 2 million files.


 

HW upgrade of the /storage/brno2/

From Thursday July 29 to Sunday April 1, the old disk array /storage/brno2/, will be upgraded to a new hardware. Due to the huge amount of data, we estimate that the final synchronization will take several days, so please be patient. Try to limit the work on this disk array.

  • During the synchronization, the /storage/brno2/ will be fully accessible (RW), except the final synchronization the last day.
  • After copying is completed, the new disk array will be available on the same symlink as the old disk array, from the user's point of view, nothing changes:
/storage/brno2/
  • After the upgrade, the data will be physically located in the following storage (the name remains the same as in the past):
    storage-brno2.metacentrum.cz
    

Influence on the running jobs:

  • The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
  • Data written to /storage/brno2/ during synchronization may remain untransferred to the original field, storage-brno6: ~ /../ fsbrno2 / home / $ LOGNAME, and you will need to copy it individually.

Backup policy reminder

Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.

List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery

With apologies for the inconvenience and with thanks for your understanding.

Yours,

MetaCentrum

 


Ivana Křenková, 22. 7. 2021

22.-27.4.2021 - HW upgrade of the /storage/plzen1/

Update April 26, 2021 - data is transferred to the new disk array. But there are occasional problems with the stability of the new disk array reported. We are working intensively to solve the stability problem. Please be patient.

Please check, whether your data on the new storage is complete. If not, you can copy it from the old storage, which has been renamed to storage-plzen1a.metacentrum.cz.

Please keep in mind that the storages cannot be operated interactively in a shell (see https://wiki.metacentrum.cz/wiki/Working_with_data#ssh_protocol). You can list the content of your home directory by the command

ssh user_name@storage-plzen1a.metacentrum.cz ls

You can fetch the data then

scp  user_name@storage-plzen1a.metacentrum.cz:~/some_directory .

HW upgrade of the /storage/plzen1/

From Thursday 22 to Sunday 25 April, the old disk array storage-plzen1.metacentrum.cz (/storage/plzen1/), serving as the /home for Pilsen's clusters, will be upgraded to a new hardware. Due to the huge amount of data, we estimate that the final synchronization will take several days, so please be patient. Try to limit the work on this disk array.

  • During the synchronization, the /storage/plzen1/ will be fully accessible (RW), except the final synchronization the last day.
  • During the upgrade, new jobs will not start on alfrid, konos, ida, kirke, minos, nympha clusters. The running jobs using the /storage/plzen1/ will be terminated with the final data synchronization.
  • After copying is completed, the new disk array will be available on the same symlink as the old disk array, from the user's point of view, nothing changes:
/storage/plzen1/
  • After the upgrade, the data will be physically located in the following storage (the name remains the same as in the past):
    storage-plzen1.metacentrum.cz
    
  • The new storage has 3 times more capacity than the old storage (1.1 PB), among other things, it solves the problem of running out of space.
  • The new storage will serve as the /home for Pilsen's clusters.

Influence on the running jobs:

  • The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
  • The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Beginners_guide#Run_batch_jobs), and which will try to save the resulting data into /storage/plzen1/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
  • Data of the jobs that work directly with the data saved in /storage/plzen1/ (not recommended) will be terminated.

Backup policy reminder

Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.

List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery

With apologies for the inconvenience and with thanks for your understanding.

Yours,

MetaCentrum

 


Ivana Křenková, 15. 4. 2021

3. 2. - HW upgrade of the /storage/praha1/, /storage/praha6-fzu/, unavailability of adan, luna, and tarkil clusters

HW upgrade of the storage-praha1.metacentrum.cz

On Wednesday, February 3, the old storage array storage-praha1.metacentrum.cz /storage/praha1/, serving as the /home for Prague's clusters, will be upgradet to a new hardware.

  • The data stored in the storage may not be accessible due to migration to another storage, the clusters luna, tarkil, and adan will be switch off. Try to limit the work on this disk array, the newly written data during the outage may not be available on the new array. After the outage, it will be possible to transfer the data. Please check.
  • The data will be physically placed in the storage storage-vestec1-elixir.metacentrum.cz  with the symlink to /storage/praha1/
  • Further, the storage /storage/praha6-fzu will not be available during HW upgrade
  • The new storage will serve as the /home for Prague's clusters.
  • Old disk array will be temporary accessible as storage-praha1.metacentrum.cz

Influence on the running jobs:

  • The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
  • The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available athttps://wiki.metacentrum.cz/wiki/Beginners_guide#Run_batch_jobs), and which will try to save the resulting data into /storage/praha1 during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
  • Data of the jobs that work directly with the data saved in /storage/praha1/ (not recommended) will be terminated.

 

Backup policy

Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.

List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery

With apologies for the inconvenience and with thanks for your understanding.

Yours,

MetaCentrum

 


Ivana Křenková, 29. 1. 2021

5.-6.12.2020 - Planned electricity outages in Prague server room

Dear users,

let us inform you that on Saturday Dec 5 and Sunday Dec 6 will occurre a planned outage in Pague server room due to the repair of electrical wiring.

Tarkil cluster will be shut down for the duration of the repair. We will try to keep the /storage/praha1/ disk array in operation from a backup source.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Yours,
MetaCentrum

 


Ivana Křenková, 24. 11. 2020

22.10.2020 - Unexpected network outage in Pilsen's and Ceske Budejovice server rooms

Dear users,

let us inform you that due to todays unexpected network outage in Pilsen and Ceske Budejovice. Some frontends,  clusters and disk arrays maight be unavailable. We work on the repair.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum

 

 

 


Ivana Křenková, 22. 10. 2020

22-23.9.2020 - Unexpected power outage on bthe Prague ELIXIR-CZ server room

update: 24. 09. 2020 14:01 outage of cooling system still occures
 
Dear users,
 
Last night there was a failure of cooling in the Prague ELIXIR server room and clusters elmo1 and 2 and also the storage had to be switched off.

Cooling system is being serviced, so hopefully approaches will be possible soon.

 

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 23. 9. 2020

15-16. 10. 2020 - Planned short network outages in Prague server room

On 15 and 16 September in the evening, the SW will be upgraded on routers in Prague Dejvice,
The upgrade will result in approximately 30-minute network outages on individual routers.

Tuesday, 15 September, from 22:00 to 01:00

- connection for cluster TARKIL - L2 connection to cluster ARUBA
- connection for cluster SKURUT FZU - global table - primary
- connection for cluster SKURUT FZU - L3 VPN LHCONE - backup

Wednesday, 16 September, from 20:00 to 23:00

- connection for cluster SKURUT - global table - backup
- connection for cluster SKURUT - L3 VPN LHCONE - primary
- connection for Elixir cluster at UOCHB
- connection for cluster at (luna, kalpa) FZU
- GEANT connection to LHCONE


We assume that the outage will occur about half an hour after the beginning of the time slot.

We apologize for the inconvenience.
Your MetaCentrum

 

 

 


Ivana Křenková, 11. 9. 2020

2-3. 8. 2020 - Unexpected outage of /storage/praha1/

Dear users,
 

The disk array /storage/praha1/home / is temporarily unavailable due to an unplanned HW/SW failure.

The outage also affected the frontend tarkil, as well as computing clusters with home directory on the disk array
(adan, luna, kalpa, tarkil, ...)  

 

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 2. 8. 2020

16.7.2020 - Scheduled outage of the MetaCentrum Cloud

Dear user of Cloud MetaCentrum,

Let us inform you about the planned outage of the network overlay in
cloud 'Cloud MetaCentrum' (OpenStack). This scheduled outage is
necessary due to an upgrade of the network overlay which cannot be
performed without downtime. The outage is scheduled on 16.07.2020
(DD.MM.YYYY) in the time of 8:00 am - 12:00 pm CEST  (UTC+2:00).  During
the outage, you will not be able to access your machines, nor your
machines will be able to access the internet.  The computation of your
machines should not be affected.

We apologize for any inconvenience,

MetaCentrum Cloud team


Ivana Křenková, 9. 7. 2020

27. 5. 2020 - Scheduled outage of the MetaCentrum Cloud

Dear user of MetaCentrum Cloud.

Due to an upgrade of MetaCentrum Cloud (OpenStack) from Stein to Train release, OpenStack control plane will be unavailable on May 27th 2020. Outage will start at 8:00 AM CET and will continue until 6:00 PM CET of the same day. During the time of upgrade, the OpenStack API (including dashboard) will not be accessible. Virtual instances should be accessible and working throughout the outage, however it is not recommended to plan critical processes during that time.


Ivana Křenková, 14. 5. 2020

16. - 17. 5. 2020 - planned outage of all worker nodes luna and storage at Prague Slovanka

We would like to inform you about a planned outage of all worker nodes luna at the weekend May, 16-17. The outage is due to a planned electricity shortage at the locality Slovanka.

We are going to shutdown all worker nodes luna on Saturday, May 16, morning at 6:00. The worker nodes luna will be available again on Monday, May 18, morning.
The disk arrays /storage/praha4-fzu/home and /storage/praha6-fzu/home/ will be on outage too.

Thank you for your understanding.

Best regards

MetaCentrum

 


Ivana Křenková, 11. 5. 2020

23. 4. 2020 - Unexpected outage of /storage/budejovice1/

Dear users,
 

The disk array storage-budejovice1.metacentrum.cz / storage / budejovice1 / home / is temporarily unavailable due to an unplanned HW/SW failure which has occured today in the night.

The outage also affected the frontend hildor, as well as computing clusters with home directory on the disk array.


 

Influence on the running jobs:

  • The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
  • The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Working_with_data/Working_with_data_in_a_job), and which will try to save the resulting data into /storage/budejovice1/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
  • Data of the jobs that work directly with the data saved in /storage/budejovice1/ (not recommended) will be terminated.

 

 

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 23. 4. 2020

19.2.2020 - Outage of disk fields /storage/brno2 and /storage/brno6

Due to maintenance reasons there will be outage on disk fields  /storage/brno2 and /storage/brno6 on 19. 2. between 13 and 14 PM. During the outage it will not be possible to log on to skirit and perian frontends and the PBS server meta-pbs.metacentrum.cz won't submit new jobs to Brno clusters.



We apologize for any inconvenience.

MetaCentrum

 

 


Ivana Křenková, 19. 2. 2020

11.2.2020 - Expected outage of cluster charon

Please note that on 11.2, 10-14 h, there will be planned outage on the computational node charon.nti.tul.cz.

We apologize for any inconvenience.

MetaCentrum

 

 


Ivana Křenková, 11. 2. 2020

12.2.2020 - Outage of PBS-server, PBSmon application and partial outage of OpenStack

Update: After noon, the network problem was resolved.

 

Repeated short failures of the university network segment in Brno cause failure of the cerit-pbs PBS server, non-updating of PBSmon application and partial outages of OpenStack.

We're working to fix the issue.



We apologize for any inconvenience.

MetaCentrum

 

 


Ivana Křenková, 11. 2. 2020

14-16.1.2019 - Expected outage of the clusters carex, draba, and the /storage/pruhonice1-ibot/ disk array

let us inform us about the scheduled outage  of the clusters carex.ibot.cas.cz and draba.ibot.cas.cz and the /storage/pruhonice1-ibot/ disk array in Průhonice on January 14 - 16 due to the planned HW upgrade.


We apologize for any inconvenience.

MetaCentrum


 

 


Ivana Křenková, 7. 1. 2020

16.12.2019 - Expected outage 'Cloud2 MetaCentrum' (OpenStack)

Dear MetaCentrum Cloud user,

let us inform you about the scheduled outage of MetaCentrum Cloud
(OpenStack) on December 16th (Monday) 2019 due to a major upgrade of the
OpenStack control plane (from Rocky version to Stein). The outage will
start at 7:00 AM (CET, UTC+1:00) and will continue until 6:00 PM of the
same day. During the time of upgrade, the OpenStack API (including
dashboard) will not be accessible. Virtual machines should be accessible
throughout the outage, however it is not recommended to run critical
processes during that time.

Thank you for your patience.


We apologize for any inconvenience.

MetaCentrum & Cloud Team


 

 


Ivana Křenková, 3. 12. 2019

30. 10. 2019 3-4 PM - Unexpected outage in the UOCHB server room

Dear users,
On Wednesday, October 30, there was a complete power outage in the UOCHB hall (between 3PM and 4 PM), affecting the clusters elmo1, elmo2 and the disk array storage-praha5-elixir.metacentrum.cz (/storage/praha5-elixir/). The electricity supply was restored after less than an hour. Building management is working to determine the reason for the outage.

 

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 30. 10. 2019

21. - 22. 10. 2019 - Unexpected outage of /storage/brno2/

Dear users,
 
On Monday, October 21 at 10:00 am, due to an unplanned shutdown one of the server room in Brno located at FI MU, the clusters and disk array located in this hall will be temporarly shut down.

• storage-brno2.metacentrum.cz /storage/brno2/ disk array will be temporarily unavailable, data will be moved to another array. Data will be available again from 18PM on  /storage/brno6/
disk array, with original symlink /storage/brno2/. We plan to keep the frontends accessible at all times
at least one of the copies, but in the meantime the latest data will not be in the new location yet,
please do not use the /storage/brno2/ until the event ends.
 

Influence on the running jobs:

  • The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
  • The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Working_with_data/Working_with_data_in_a_job), and which will try to save the resulting data into /storage/brno7-cerit/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
  • Data of the jobs that work directly with the data saved in /storage/brno2/ (not recommended) will be terminated.

 

  • Running frontend sessions that access /storage/brno2/ will be terminated just like jobs running inside /storage/brno2/.
  • Next, there will be about 15 minutes of network outage due to moving the router to another location.
 

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 18. 10. 2019

4.9.2019 7-12 AM - Expected outage 'Cloud2 MetaCentrum' (OpenStack)

Dear user of Cloud2 MetaCentrum,

Let us inform you about the planned outage of the network overlay in cloud 'Cloud2 MetaCentrum' (OpenStack). This scheduled outage is necessary due to an upgrade of the network overlay which cannot be performed without downtime. The outage is scheduled on 4. 9. 2019 (DD.MM.YYYY) in the time of 7:00 am - 12:00 am CEST (UTC+2:00).

During the outage, you will not be able to access your machines, nor your machines will be able to access the internet. The computation of your machines should not be affected.



We apologize for any inconvenience.

MetaCentrum & Cloud Team


 

 


Ivana Křenková, 29. 8. 2019

21.8.2019 7-10 AM - Expected outage 'Cloud2 MetaCentrum' (OpenStack)

Dear user of Cloud2 MetaCentrum,

Let us inform you about the planned outage of the network overlay in cloud 'Cloud2 MetaCentrum' (OpenStack). This scheduled outage is necessary due to an upgrade of the network overlay which cannot be performed without downtime. The outage is scheduled on 21.08.2019 (DD.MM.YYYY) in the time of 7:00 am - 10:00 am CEST (UTC+2:00).

During the outage, you will not be able to access your machines, nor your machines will be able to access the internet. The computation of your machines should not be affected.



We apologize for any inconvenience.

MetaCentrum & Cloud Team


 

 


Ivana Křenková, 13. 8. 2019

17.7.2019 5-7AM - Expected outage du2.cesnet.cz (/storage/jihlava2-archive/)

Let us inform you that due to a planned central diesel gregate revision in Jihlava's server room the du2.cesnet.cz (/storage/jihlava2-archive/) and ceph object storage will be temporarly unavailable on Wednesday, 17 July between 5 and 7 AM.

We apologize for any inconvenience caused.
MetaCentrum

 

 


Ivana Křenková, 16. 7. 2019

20.6.2019 - Unexpected outage in Bnro's server room (most of clusters and /storage, old MetaCloud)

Dear users,

let us inform you that due to todays unexpected network outage in Brno's server room some Brno's clusters and disk arrays are unavailable. We work on the repair.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum

 

 

 


Ivana Křenková, 20. 6. 2019

26.4.2019 - Unexpected outage in CERIT-SC's server room in Brno (most of clusters and /storage/brno3-cerit/)

Dear users,

let us inform you that due to todays unexpected heating outage (early morning) in Brno's server room some CERIT-SC clusters and disk array are unavailable. We work on the repair.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum

 

 

 


Ivana Křenková, 26. 4. 2019

10.4.2019 - Unexpected failure of du2.cesnet.cz (/storage/jihlava2-archive/)

Dear users,
we are actually facing a power failure in Jihlava. Therefore, the  du2.cesnet.cz (/storage/jihlava2-archive/) is not available.

 

We apologize for any inconvenience caused.

 
MetaCentrum
 

 


Ivana Křenková, 10. 4. 2019

12.3.2019 - Unexpected outage in CERIT-SC's server room in Brno (cluster zefron, uv and /storage)

Dear users,

let us inform you that due to todays unexpected power or network outage (2 PM) in Prague's server room some CERIT-SC clusters and disk array are unavailable. We work on the repair.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum

 

 

 


Ivana Křenková, 12. 3. 2019

8.3.2019 10-11AM - Planed 10 mins network outage in Prague FZU

Dear users,

let us inform you that due to a planned central switch firmware upgrade in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home will be aprox. 10 minutes unavailable on Wednesday, 20 February between 10 and 11 AM.

We apologize for any inconvenience caused.
MetaCentrum

 

 

 


Ivana Křenková, 7. 3. 2019

20.2.2019 9AM-9PM - Planed power outage in Prague FZU (cluster luna, kalpa and )

Dear users,

let us inform you that due to a planned network connectivity upgrade in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home will be unavailable on Wednesday, 20 February.

We apologize for any inconvenience caused.
MetaCentrum

 

 

 


Ivana Křenková, 15. 2. 2019

28.1.2019 - Unexpected failure of /storage/praha1/ file system

Dear VO MetaCloud users,
 
we are actually facing a problem with /storage/praha1/ file system. Unfortunately, some machines with /home on this storage (luna, tarkil) are not working properly. We apologize for any inconvenience caused.
Ivana Křenková
MetaCentrum

Ivana Křenková, 28. 1. 2019

9.-11.1. - Decomission of the /storage/brno7-cerit/, recovery of the /storage/brno6/

Decommission of the storage-brno7-cerit.metacentrum.cz

On Wednesday, January 9, the old storage array storage-brno7-cerit.metacentrum.cz  (/storage/brno7-cerit /) will be shut down.

  • From January 9 to 11, the data stored in this field will not be accessible due to migration to another storage.
  • From Friday (January 11) the data will be physically placed in the storage-brno1-cerit.metacentrum.cz, it will be accessible via symlink /storage/brno7-cerit/.
  • The relocation also applies to the fishery project directory, which will be accessible from January 11 by the existing symlink.

 

Influence on the running jobs:

  • The jobs that work with the data saved on (or will save data to) another disk array will not be influenced.
  • The jobs that perform their computations within the scratch space, which check the success of copying-out the resulting data (e.g., using the script skeleton available at https://wiki.metacentrum.cz/wiki/Working_with_data/Working_with_data_in_a_job), and which will try to save the resulting data into /storage/brno7-cerit/ during the outage, will not be influenced as well -- you'll find the resulting data in the scratch of the relevant nodes.
  • Data of the jobs that work directly with the data saved in /storage/brno7-cerit/ (not recommended) will be terminated.

 

 

Recovery of the storage-brno6.metacentrum.cz

The storage array storage-brno6.metacentrum.cz (/storage /brno6/) have been back in operation since Friday January 4.

The failure of the disk array was very serious. Fortunately, much of the data was saved, but a small part of the data (primarily those manipulated at the time of the malfunction) could be lost or damaged.

Please check your data stored in the /storage/brno6/ file system.

 

Backup policy

Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.

 

With apologies for the inconvenience and with thanks for your understanding.

Yours,

MetaCentrum

Ivana Křenková, 6. 1. 2019

12. - 13. 12. 2018 - Unexpected power outage in Prague FZU (cluster luna, kalpa and )

Dear users,

let us inform you that due to todays unexpected power outage in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home are unavailable. The vendor works on the repair.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum

 

 

 


Ivana Křenková, 12. 12. 2018

10.12.2018 - Data transfer /storage/brno6/ --> /storage/brno1/

Due to repeated HW failure in /storage/brno6/, the data was moved to another storage /storage/brno1/, with the unchanged symlink /storage/brno6/.

The defective storage is being repaired by the vendor (replacement of the controller). Once repaired, the data will be returned to the original location.

 

We apologize for any inconvenience caused.
 
Yours,
MetaCentrum

Ivana Křenková, 10. 12. 2018

26-27.11.2018 - Unexpected failure of /storage/brno6 file system

Unfortunately, we are repeatedly facing the HW problem with /storage/brno6/home file system. We are evaluating the severity of the situation and working on the repair. We are trying to minimize the consequences. Users data are currently unavailable.
 
Update November 26 3PM:  MetaCloud web page (OpenNebula https://cloud.metacentrum.cz/) has been suspended due to data recovery. 
We apologize for any inconvenience caused.
 
Yours,
MetaCentrum

Ivana Křenková, 26. 11. 2018

23. 11. 3 - 4 PM - Disk array /storage/brno11-elixir/ planned HW upgrade

Let us inform you that on Friday 23 the /storage/brno11-elixir/ (storage-brno11-elixir.metacentrum.cz) will be 10 minutes unavailable (between 3 and 4 PM) due to the HW upgrade.

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Křenková, 23. 11. 2018

from 19.11.2018 - Unexpected failure of /storage/brno6 file system

Dear VO MetaCloud users,
 
we are actually facing a HW problem with /storage/brno6/ file system. 
MetaCloud web page (OpenNebula https://cloud.metacentrum.cz/) is not working, from this reason, too. Update: back in operation since Nov 21
The problem with access to /storage/brno6/home/ persists.

We apologize for any inconvenience caused.

Ivana Křenková
MetaCentrum
 

Ivana Křenková, 19. 11. 2018

26.-28.10.2018 - Data migration onto new data storage

Dear CESNET MetaCentrum and Storage facility user,

We would like to inform you that the hierarchical storage in Pilsen (du1.cesnet.cz, /storage/plzen2-archive in MetaCentrum) will be permanently decommissioned.

If you have no data in this storage facility, this mail is not relevant for you. All your data from plzen2-archive will be transferred by storage administrators to a new storage facility.

This e-mail is to inform you about the plan and the schedule.

Data in Pilsen will be made permanently inaccessible for the users during the evening of 26th October. We'll start final synchronisation of recent changes to Ostrava storage facility, i.e., du4.cesnet.cz, /storage/du-cesnet in MetaCentrum (note the change in naming convention). The data will be inaccessible during the transfer period. We expect to make the data available in the new location in Ostrava again in the evening of 28th October. The data will be permanently available in Ostrava since then.

Kindly note new Data Storage Terms of Service (ToS) and the changes they introduce. Policies for archival (long-term) data and temporary backups have been distinguished. You can find full text of the ToS on https://du.cesnet.cz/en/provozni_pravidla/start, and we also have a short description of most important changes on https://du.cesnet.cz/en/navody/faq/start#handling_archives_and_backups. Both archive as well as backup policies are available to MetaCentrum users.

Data from Pilsen is considered an archive and it is handled as such.

If you have any questions or need any kind of help, please contact our user support (by replying to this mail and/or on support@cesnet.cz).

Thank you for your cooperation.

With kind regards,

Your CESNET MetaCentrum and Data Storage team


Ivana Křenková, 24. 10. 2018

13.9.2018 - Unexpected failure of /storage/brno2 file system

Dear VO MetaCloud users,
 
we are actually facing a problem with /storage/brno2/ file system. Unfortunately, some machines with /home on this storage are not working properly.
In the meantime, please use machines in other localities or CERIT-SC machines in Brno (PBS server wagap-pro, frontend zuphux.cerit-sc.cz).
We apologize for the inconveniences caused.

We apologize for any inconvenience caused.

Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, 13. 9. 2018

21.-23.5.2018 - Expected restart of virtual machines in MetaCloud due to security update

Dear VO MetaCloud users,
 
due to planned maintenance and security updates on physical machines, the virtual machines dukan1.ics.muni.cz - dukan26ics.muni.cz and gorbag.ics.muni.cz
will be continuously restarted in the first half of the next week. The information about affected machines will be in OpenNebula (https://cloud.metacentrum.cz/) in the Info section of each virtual machine.

We apologize for any inconvenience caused.

Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, 17. 5. 2018

12.2.2018 till 11 AM - Unexpected failure of AFS file system

Actualization 2018-02-12 11 AM: AFS is working properly again

An AFS server crash occurred this weekend, also causing unexpected problems in the vicious part of the AFS subsystem. As a result of these failures, some volumes are not available on AFS (and also SW modules are not available) and can not be logged on to some computational nodes and frontends. We're working on the repair.

We apologize for any inconvenience caused.

Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, 12. 2. 2018

5.2.2018 - Unplanned network connectivity outage in Brno

Due to the failure of the network connectivity in the Brno location, there are no services requiring a network connection hosted in Brno - MetaCloud, Brno machines ... We are working on the remedy.
With apologies for the inconvenience and with thanks for your understanding.

MetaCentrum


Ivana Křenková, 5. 2. 2018

from Jan 8 - Response to security failures in processors known as Meltdown and Specter

Dear users,

MetaCentre administrators track the situation with recent bugs in processors (known as Meltdown and Specter, for more information see https://spectreattack.com/).
We evaluate the real impacts of infrastructure vulnerabilities. We have applied the available updates in the VMWare and MetaCloud environments. For part of the computational nodes we monitor available updates and evaluate their impact on the Metacentra environment (they are tested for performance limitations). The computing nodes are being updated gradually. If the situation requires, we could force the immediate restart of the computing resources and stop all active tasks. Especially for the upcoming long tasks, please consider postponing their execution at a later time, especially if your tasks can not be restarted.

We apologize for any inconvenience caused.

MetaCentrum


Ivana Křenková, 9. 1. 2018

from 31.12.2017 - Unexpected power outage in Prague FZU (cluster luna, kalpa)

Dear users,

let us inform you that due to todays unexpected power outage in Prague's server room the local clusters luna and kalpa are unavailable.
The vendor works on the repair, the length of the outage can not be estimated.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum


Ivana Křenková, 2. 1. 2018

7.12.2017 - Disk array /storage/budejovice1/ planned HW upgrade

Let us inform you that on Thursday December 7 the /storage/budejovice1/ (storage-budejovice1.metacentrum.cz) will be moved to a new hardware and will be several hours unavailable during the final synchronization. Shared disk space at hildor*:/scratch.shared, mounted from this storage, will not be available too.

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Křenková, 5. 12. 2017

28.11.2017 - PBS Pro bug in new version

Dear users,

Due to a bug in the new version of PBS Pro the walltime of almost all running jobs was reseted. The PBS  Pro could not recognized the CPU usage, significantly overestimated the cpu usage time and jobs unexpectedly ended. We reported the error to PBS Pro developers and returned PBS Pro server to the previous version.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, 28. 11. 2017

6.10.2017 (7-10 AM) - Power outage in JU's server room

Dear users,

Let us inform you that due to a planned power outage in Ceske Budejovice the clusters hildor/haldir/hagrid and disk array /storage/budejovice1/ will be temporary unavailable on Friday October 6 (7-10 AM). Unfortunately all running jobs will be terminated. Please copy the data you will need for your calculation during these few days to another disk array.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, 5. 10. 2017

25. 7. 2017 - MetaCloud: firmware actualization on dukan 19-25 machines

Dear users,

Given a pressing need to update firmware in cloud nodes dukan19 through dukan25 we will have to briefly power off virtual machines using those nodes. The intervention is scheduled for Tuesday 25 July. Each node, hence each collocated virtual machine, will be powered off for approximately 20 minutes. We will boot the virtual machines afterwards. There will be no data loss. Affected users have been notified by e-mail.

With apologies for the inconvenience and with thanks for your understanding,

MetaCloud team


Ivana Krenkova, 25. 7. 2017

5. 6. 2017 - MetaCloud: migration of virtual machines running on dukan1-10

Dear users,

On Monday 5th June we are going to migrate virtual machines away from nodes dukan1-10. Affected machines will be powered off temporarily. There will be no data loss. Machines with private network addresses (currently in range 10.4.0.*) require special treatment. Given the current configuration of our network their private IP addresses will have to change. Please, look up the new IP addresses of your virtual machines through the MetaCloud interface after that date. Affected users have already been notified by e-mail.

With apologies for the inconvenience and with thanks for your understanding,
MetaCloud team

 

 

 

 


Ivana Krenkova, 29. 5. 2017

4.6.2017 (7:45-10 AM) - Power outage in JU's server room

Dear users,

Let us inform you that due to a planned power outage in Ceske Budejovice the clusters hildor/haldir/hagrid and disk array /storage/budejovice1/ will be temporary unavailable on Sunday June 4 (7:45-10 AM). Unfortunately all running jobs will be terminated. Please copy the data you will need for your calculation during these few days to another disk array.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, 17. 5. 2017

11.5.2017 - OS upgrade on the Zuphux frontend (Centos 7.3) + PBS Pro setting as the default environment in CERIT-SC

On May 11th, server zuphux will be restarted to a new OS version (Centos 7.3).

At the same time, the planning system in the Torque environment (@wagap) will no longer accept new jobs. Existing jobs will be counted on the remaining nodes. The remaining computational nodes in the Torque  environment will be gradually converted to PBS Pro. Machines currently available in a PBS Pro environment are labeled by "Pro" in the PBSMon application  https://metavo.metacentrum.cz/pbsmon2/nodes/physical .

Frontend zuphux.cerit-sc.cz will be set by default to PBSPro (@wagap-pro) environment.

With apologies for the inconvenience and with thanks for your understanding.

CERIT-SC users support


Ivana Křenková, 10. 5. 2017

7.4.2017 4 PM-0 AM - Zuphux frontend and @wagap, @wagap-pro outage

On Friday April 4, from 15:45, the frontend zuphux will be temporary unavaibale due to an unplanned emergency service of critical disk array controllers. Estimated time of the outage is 2 hours. Other frontends can be used during the outage:
https://wiki.metacentrum.cz/wiki/Frontend

Other services running from the affected disk array (Torque server @wagap and PBS Pro server @wagap-pro) will be migrated to another server on Thuersday evening, with some very short outages on Thuersday and Friday evenings.

With apologies for the inconvenience and with thanks for your understanding.CERIT_SC support


Ivana Křenková, 6. 4. 2017

10.3.2017 - Outage on archieval storage in Brno /storage/brno4-cerit-hsm/

Dear users,

after the upgrade of the  HSM storage-brno4-cerit-hsm.metacentrum.cz (the upgrade was realised by the vendor on February 14-15) unexpexted error occured, the HSM is particulary available. The vendor works on the repair, the length of the outage can not be estimated. 

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, 10. 3. 2017

24.2.2017 from 4 AM - Unplanned outage in Pilsen

Today (around 4 AM) occured an accident on watter cooling system in Pilsen, which affected all Pilsen computing nodes, frontends, and /storage/plzen1/. The machines are back in operation (Nevertheless, some related service works still occur...)

We apologize for any inconvenience caused.

Ivana Křenková,
MetaCentrum

 


Tom Rebok, 24. 2. 2017

from 19.2.2017 - Outage on archieval storage in Brno /storage/brno4-cerit-hsm/

Dear users,

after the upgrade of the  HSM storage-brno4-cerit-hsm.metacentrum.cz (the upgrade was realised by the vendor on February 14-15) unexpexted error occured, the HSM is unavailable now. The vendor works on the repair, the length of the outage can not be estimated. 

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, 20. 2. 2017

14.-15.2.2017 - Planned system actualisation on archieval storage in Brno /storage/brno4-cerit-hsm/

Dear users,

Let us inform you that from Wednesday February 14 (9 AM) to February 15 (6 PM) the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to a security actualisation of the system.

IMPORTANT: The HSM still hosts data from Jihlava /storage/jihlava1-cerit/


Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, 7. 2. 2017

23.1.2017 - Disk array /storage/praha1/ planned HW upgrade

Let us inform you that on Monday January 23 the Prague's /storage/praha1/ (storage-praha1.metacentrum.cz) will be moved to a new hardware and will be several hours unavailable during the final synchronization. Shared disk space at *:/scratch.shared, mounted from this storage, will not be available too.

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Křenková, 9. 1. 2017

11. 1. 2017 - Planned MetaCloud upgrade

Dear users,

the OpenNebula upgrade announced earlier will take place on 11 January. At that time, the front-end will be unavailable for some time, and virtual machines running in the dukan.ics.muni.cz cluster will be restarted as we update the nodes.

Please be aware that there may be issues especially with older virtual machines instantiated with the previous OpenNebula version (2015 and earlier). Please contact us (cloud@metacentrum.cz) in case of trouble.

With apologies for the inconvenience and with thanks for your understanding,
MetaCloud tym

 

 

 

 


Ivana Krenkova, 9. 1. 2017

15.12.2016 (11PM-02AM) - Planed outage of Torque server @wagap

Dear users,

Let us inform you that on Thuersday (Dec 15, 11PM - 2AM.) the Torque server wagap.cerit-sc.cz will be temporary unavailable due to a SW upgrade. Sending new jobs and manipulating with jobs in the system will not be allowed during the outage.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, 15. 12. 2016

8.12.2016 - Power outage in JU's server room

Dear users,

Let us inform you that due to an unexpected power outage in Ceske Budejovice the clusters hildor/haldir/hagrid are temporary unavailable. Unfortunately all running jobs have been terminated.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, 8. 12. 2016

from 1.11.2016 - tarkil frontend planned outage

Let us inform you that the tarkil.cesnet.cz frontend is unavailable due to a migration to another HW. All running processes on the frontend were terminated.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum


Ivana Křenková, 1. 11. 2016

27.10.2016 from10 PM - /storage/brno3-cerit/ planned HW upgrade

Let us inform you that on Thuersday October 27 (10 AM) the Brno's /storage/brno3-cerit/ (storage-brno3-cerit.metacentrum.cz) will be moved to a new hardware.

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC


Ivana Křenková, 25. 10. 2016

30.8.2016 from 10 PM - Zuphux frontend planned outage

Let us inform you that on Tuesday (August 30,  10 PM - 0 AM) the zuphux frontend will be shortly unavailable due to a migration to another HW. All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum


Ivana Křenková, 24. 8. 2016

25.-29.7.2016 - Planed service maintenance of clusters and disk array in Ceske Budejovice

Dear users,

Let us inform you that from July 25 to 29, hildor, haldir, hagrid clusters and disk array /storage/budejovice1/ will not be temporarily available due to moving to another server room. Please copy the data you will need for your calculation during these few days to another disk array.

With many thanks for understanding,

Ivana Krenkova
MetaCentrum


 

 


Ivana Křenková, 24. 6. 2016

27.4.2016 10 PM - Power outage in UK's server room

Dear users,

Let us inform you that due to a planned power outage in UK's Karolina server room the local servers eru1, eru2, acharon, AFS servers asterix, obelix, sal will be temporary unavailable tomorrow (April 27), 10-11 PM.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, 26. 4. 2016

21.4.2016 from 10:30 PM - Planned MetaCloud upgrade

Dear users,

CERIT-SC's resources in the OpenNebula MetaCloud (phys. nodes hda*) will be under maintenance this Thursday 21th April from 10:30pm. Your virtual machine(s) will be only paused (you won't loose your running state) and one by one resumed. Optimistic estimate is that each VM shouldn't be down for more than 30 minutes. Whole maintenance can take up to 2 hours.

 

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, 19. 4. 2016

18.4.2016 7-15:00 - Planed power outage in Brno UKB

Dear users,

let us inform you that due to a planned power outage in Brno's server room in UKB the local clusters lex, krux, zubat and disk arrays brno9-ceitec + brno10-ceitec-hsm will be temporary unavailable.

We apologize for any inconvience caused.

 
Ivana Krenkova
MetaCentrum

 


Ivana Krenkova, 11. 4. 2016

18.4.2016 7-15:00 - Unplaned air conditioning outage in Brno CERIT-SC

Dear users,

let us inform you that due to a unexpected air conditioning outage in Brno's CERIT-SC server room today in the morning, a part of local clusters zigur, zapat, and zebra has been switched off as a prevention of overheating. The computing nodes will be gradually returned back to normal operation. Unfortunatelly all running jobs on affected nodes have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum & CERIT-CS

 


Ivana Krenkova, 11. 4. 2016

7.4.2016 - Power outage in JU's server room

Dear users,

Let us inform you that due to an unexpected power outage the clusters hermes/hildor/haldir are temporary unavailable.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, 7. 4. 2016

1.3.2016 - PBS server (sendmail) problem today

Dear users,

Let us inform you the sendmail of the PBS server sent not actual error reports about terminated jobs via e-mails today in the night.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, 1. 3. 2016

2.3.-3.3.2016 - Planned system actualisation on archieval storage in Brno /storage/brno4-cerit-hsm/

Dear users,

Let us inform you that from Wednesday March 2 (9 AM) to March 3 (6 PM) the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to a security actualisation of the system.

*****************************************
IMPORTANT:
The HSM hosts data from Jihlava /storage/jihlava1-cerit/
*****************************************

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, 23. 2. 2016

23.2.2016 10-11AM - Planned service maintenance of /storage/brno6/

Dear users,

Let us inform you that on Tuesday, February 23 the Brno's /storage/brno6/ will be unavailable due to battery replacement by the supplier.

Influence on the running jobs:

Moreover, the user interface (Sunstone) as well as the programming interface (API) for MetaCloud will be unavailable for several hours. Existing virtual machines will not be affected! It will be, however, impossible to create new ones or manage existing ones during the outage.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková
MetaCentrum & CERIT-SC

 

 

 


Ivana Krenkova, 16. 2. 2016

12.2.2016 8AM - Hadoop cluster planned outage

Dear users,

Let us inform you that on Friday (February 12, 8:00 a.m.) the Hadoop cluster will be shortly unavailable due to SW upgrade:

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 


Ivana Krenkova, 11. 2. 2016

4.2.2016 11AM - Hadoop cluster planned outage

Dear users,

Let us inform you that on Thuersday (February 4, 11:00 a.m.) the Hadoop cluster will be shortly unavailable due to certificates change, machines reboot and preparation of the new experimental cluster based on containers.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 


Ivana Krenkova, 3. 2. 2016

25.7.2016 10:00 AM - Hadoop cluster planned outage

Dear users,

Let us inform you that on Monday (July 25, 10:00 a.m.) the Hadoop cluster will be unavailable due to upgrade from CDH 5.5.1 to 5.8.0 (with Hadoop 2.6.0, and Spark 1.6.0) and due to Java environment upgrade.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 


Ivana Krenkova, 3. 2. 2016

11.2.2016 - Planned MetaCloud upgrade

Dear users,

A long-planned upgrade of the OpenNebula cloud manager will take place on 11 February. The user interface (Sunstone) as well as the programming interface (API) for MetaCloud will be unavailable for several hours. Existing virtual machines will not be affected! It will be, however, impossible to create new ones or manage existing ones during the outage. Please accept our apologies for the inconvenience this may cause you.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, 28. 1. 2016

23.-24. 1.2016 - Planned network upgrade in FZU AVCR in Prague

Dear users,

let us inform you that due to a planned upgrade of the network connection in the Institute of Physics of the Czech Academy of Sciences in Prague, the local clusters kalpa and luna + disk array /storage/praha4-fzu/ will be temporary unavailable at the veekend, 23-24 January.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum

 

 


Ivana Krenkova, 21. 1. 2016

3.12.2014 - Unexpected power outage in Jihlava (clusters zigur a zapat)

Dear users,

let us inform you that due to todays unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Krenkova
MetaCentrum & CERIT-SC.


Ivana Krenkova, 3. 12. 2015

21.10.2015 16:30 - Unexpected power outage in Brno UKB (clusters perian)

Dear users,

let us inform you that due to an unexpected power outage in Brno's server room in UKB the local cluster Perian was temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum & CERIT-SC

 


Ivana Krenkova, 21. 10. 2015

14.10.2015 5-11 PM - Kerberos service outage

Dear users,

Let us inform you that yesterday in the evening (17-23 hrs.) due to a violation of the integrity of the KDC server database that operates Kerberos, some of database records were temporary unavailable. Unfortunately it caused problems with operations requiring Kerberos (typically saving data from running jobs to a /storage etc.).

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, 15. 10. 2015

9.10.2015 - MetaCloud outage

Dear users,

Let us inform you that the MetaCloud front-end is unavailable due to a HW fault in its storage array. Virtual machines created beforehand are still operational, but new ones cannot be instantiated and you also cannot manage existing machines through the cloud management interface (OpenNebula). Thank you for your patience.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, 9. 10. 2015

8.-9.10.2015 - Planned system actualisation on /storage/plzen1/ and GALAXY portal outage

Dear users,

Let us inform you that From October 8 to 9 the Pilsen's /storage/plzen1/ will be unavailable due to moving on a new hardware

*****************************************
IMPORTANT
Portal GALAXY, hosted on the storage will be unavailable during the outage.
*****************************************

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, 7. 10. 2015

18.8.-18.10.2015 - Planed service maintenance of zigur and zapat clusters and disk array /storage/jihlava1-cerit/

Due to HW problems (being solved with original supplier), the zigur and zapat clusters will be available 1 month later, in the second half of October.

With many thanks for understanding.

--

Dear users,

From August 18, due to moving to Brno,  zigur and zapat clusters and disk array /storage/jihlava1-cerit/ will not be available temporarily.

The clusters are covered by maintenance contract therefore the move will be done by the original supplier, approx. time of moving is a month (144 nodes of cluster plus disk array).

Influence on the running jobs:


With many thanks for understanding,

Ivana Krenkova
MetaCentrum & CERIT-SC
 

 


Ivana Křenková, 1. 10. 2015

21.9.2014 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Krenkova
MetaCentrum & CERIT-SC.


Ivana Krenkova, 21. 9. 2015

22.9.-23. 9.2015 - Planned system actualisation on archieval storage in Brno

Dear users,

Let us inform you that from Tuesday September 22 (10 AM) to Wednesday September 23 the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to an actualisation of the system.

*****************************************
IMPORTANT
The HSM hosts data from Jihlava /storage/jihlava1-cerit/ and older /storage/brno1/. We strongly recommend you to transfer all data used in your jobs to another storage (for example /storage/brno6). In case you need any data from these archieval storages during the outage, please inform us in advance via e-mail meta@cesnet.cz.
*****************************************

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, 16. 9. 2015

18. 9.2015 -? - Outage on archieval storage in Brno

Dear users,

Let us inform you that from September 18 the Brno's /storage/brno4-cerit-hsm/ is not available due to an SW failure of HSM system. Major software patches (bug fixes) will be applied by the system vendor.

IMPORTANT: The HSM hosts data from Jihlava /storage/jihlava1-cerit/ and older /storage/brno1/ (/storage/home)

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, 16. 9. 2015

29.8.2015 - Power outage in Prague (frontend and cluster tarkil + /storage/praha1)

Dear users,

let us inform you that due to an unexpected power outage in Prague's server room the frontend and local cluster Tarkil, Mudrc, as well as the /storage/praha1 are temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova

MetaCentrum

 

 


Ivana Krenkova, 29. 8. 2015

24.-31.8.2015 - Planed service maintenance of doom cluster and disk array /storage/ostrava1/

Dear users,

Let us inform you that due to a power outage in Jihlava's server room today, the local cluster Doom, as well as the /storage/ostrava1/ are temporary unavailable. The computing nodes will be gradually returned back to normal operation later this day.

From August 24 to 31, due to moving to Brno, doom cluster and disk array /storage/ostrava1/ will not be available temporarily. Please copy to another disk storade date you will need for your calsulation during these few days.

 

With many thanks for understanding,

Ivana Krenkova
MetaCentrum


 

 


Ivana Křenková, 11. 8. 2015

22.6.2014 10-11 PM - Skirit frontend planned outage

Let us inform you that on Monday, June 22 10AM, the skirit frontend will be shortly unavailable due to an upgrade. All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,
MetaCentrum

Ivana Křenková, 19. 6. 2015

16.6.2015 10 - 12 AM - Planed power outage in Prague (frontend and cluster tarkil + /storage/praha1)

Dear users,

let us inform you that due to a planned outage of the network connection, frontend tarkil, cluster tarkil and disk array /storage/praha1/ will be temporally unavailable. Jobs running on the affected cluster or using the /storage/praha1/ will be temporarly suspended. Shortly before (and of course also during) the outage there will be no possibility to start a new job on the affected cluster. 

Please, terminate all interactive jobs running from the tarkil frontend until Tuesday morning. All running processes on the frontend will be terminated during the outage.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 

 


Ivana Krenkova, 12. 6. 2015

25.6.2015 10AM - Hadoop cluster planned outage

Dear users,

Let us inform you that on Tuesday (June 25, 10:00 a.m.) the Hadoop cluster will be shortly unavailable due to a HW maintainance - replacing of CMOS battery on hador-c1.ics.muni.cz server.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 

 


Ivana Krenkova, 12. 6. 2015

18.5.2014 10-12 PM - Skirit frontend planned outage

Let us inform you that on Monday, May 18, the skirit frontend will be shortly unavailable due to an upgrade. All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum


Ivana Křenková, 14. 5. 2015

31.3.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum & CERIT-SC

 


Ivana Krenkova, 31. 3. 2015

24.-27.3.2015 - Scheduled downtime of the 'metacloud-dukan' cluster

Dear Users!

This is to inform you that there will be a scheduled downtime of the 'metacloud-dukan' cluster, part of the physical resources in MetaCloud. This will be the last in a series of outages that were required to extend, improve and physically move our cloud infrastructure. The downtime well begin on 24 March and end on 27 March. All virtual machines running on nodes dukan{1..10}.ics.muni.cz will be stopped. During the outage, the hypervisor will change from XEN to KVM, finally unifying hypervisors used on all resources across MetaCloud.

How to tell if the outage affects your virtual machines

Use the OpenNebula dashboard to display a list of all your virtual machines (Virtual Resources → Virtual Machines). The 'Host' column shows the physical node name for each VM. The outage will affect all virtual machines on nodes dukan{1..10}.ics.muni.cz. You may also filter the contents of the VMs table using the Search box on the top of the page.

What will happen with my virtual machines during the outage

All affected VMs must be stopped. It will be a great help to us if you can stop your own machines before end of business on Monday, 23 March. Otherwise, we will stop you VMs and move them to storage as the downtime you will be able to start your machines again. Since the hypervisor will change from XEN to KVM, some machines may fail to start properly. Therefore, do not hesitate to contact us in case any of your VMs acts  strangely. Unfortunately, it is not possible to check for compatibility with KVM beforehand, and can be only done experimentally. Standard MetaCentrum images, however, are already tuned for KVM and are expected to cope without glitches.
Thank you for your understanding. Be assured that this is the last planned downtime for the foreseeable future.

Best regards, MetaCloud

 


Ivana Křenková, 10. 3. 2015

3.3.2015 10-12 hod. - 3.12.2014: Unexpected power outage in Prague (cluster luna)

Dear users,

let us inform you that due to todays unexpected power outage in Prague's server room the local cluste luna is temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...


Ivana Křenková
MetaCentrum .


Ivana Křenková, 3. 3. 2015

13.1.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum & CERIT-SC

 


Ivana Krenkova, 13. 1. 2015

10.1.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat)

Dear users,

let us inform you that due to todays unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Krenkova
MetaCentrum & CERIT-SC.


Ivana Krenkova, 10. 1. 2015

- Possible problem of memory writes on zebra cluster

After moving nodes of the zewura SMP cluster (renamed to zebra1-12) to the new computer room some of the nodes appeare to exhibit very rare memory write failures under very intesive memory stress test. The problem is not reproducible, it occured only few times during several days of testing. We consider it almost impossible to occure in normal operation. The problem was reported to the supplier's technical support for futher detailed diagnostics.

Nodes are being returned to the normal operation. Despite the problems are not expected, we kindly ask the users for reporting any suspicious behaviour.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum & CERIT-SC.


Ivana Krenkova, 9. 12. 2014

3. -4. 12. 2014 - Planned system actualisation on archieval storages in Pilsen and Brno

Let us inform you that from Wednesdey December 3 (8.30 AM) to Thuersday December 4 (20 PM) the Pilsen's /storage/plzen2-archieve/ and Brno's /storage/brno4-cerit-hsm/ will be unavailable due to an actualisation of the system. In case you need any data from these archieval storages during the outage, please inform us in advance via e-mail meta@cesnet.cz.

The other two archieval storages (/storage/jihlava2-archive and /storage/brno5-archive) will not be affected.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova


Ivana Křenková, 25. 11. 2014

28.11.2014 9 - 13 PM - Planed power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to a planned power outage in the Jihlava's server room, the local clusters with property 'jihlava' will be temporarly unavailable on Friday 28.11.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Krenkova
MetaCentrum & CERIT-SC.

 

 


Ivana Krenkova, 21. 11. 2014

31.10.2014 - Data transfer finished -- brno3-cerit now in normal operation

Today morning, the transfer of brno3-cerit data (temporarily stored in Jihlava) has been finished -- the brno3-cerit storage is now in normal operation mode.

Attention: Under specific circumstances (particularly, when your jobs have been finishing during synchronization), some data may not been synchronized -- if so, you'll find your data in Jihlava's location, actually available via /auto/jihlava1-cerit/brno3/export/home/$USER (please, transfer the missing data on your own -- we'll delete them after a few weeks).

With best regards
Tom Rebok.


Tom Rebok, 31. 10. 2014

29.-30.10.2014 - Returning data back to Jihlava -- short outage of brno3-cerit disk array

Since we managed to repair the array /storage/brno3-cerit, the data (temporarily hosted in Jihlava) will be returned back to Brno

*** on Wednesday, 29th of October ***

Since it is not possible to perform this transfer transparently, it is necessary to operate the /storage/brno3 array in a not fully consistent state for about 1-2 days.

To minimize the impacts of this transfer on you and your computations, it will be managed as follows:

Note: If you change particular data during Wednesday/Thursday in /storage/brno3/home/$LOGIN, the data can be overwritten by data synchronised/copied from Jihlava.

The running jobs should not be influenced by this transfer.

We are sorry for inconvenience.

With best regards and thanks for understanding,
Tomas Rebok,
MetaCentrum NGI.


Tom Rebok, 23. 10. 2014

4.10.2014 - Unexpected power otage in Ostrava (GPU cluster doom)

Dear users,

let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...


Ivana Křenková
MetaCentrum


Tom Rebok, 4. 10. 2014

1. 10. 2014 9:00 - 16:00 - Planned system actualisation on /storage/brno4-cerit-hsm/

Hierarchical storage in Brno /storage/brno4-cerit-hsm/ will be inaccessible on October 1, 2014, from 9 AM till 16 AM (expected). Major software patches (bug fixes) will be applied by the system vendor.

With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, 1. 10. 2014

29.9.2014 - Unexpected outage of /storage/brno2, some fronteds, and nodes

Because of several SW problems that have recently occured, the disk array /storage/brno2/, some frontends and nodes were not working properly today. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused.

Ivana Křenková,

MetaCentrum


Ivana Křenková, 29. 9. 2014

26.9.2014 - Unavailability of /storage/brno3-cerit

Dear users,

let us inform you, due to an unexpected short power outage on the CERIT-SC server room last night (25.9., approx 9 PM) the the disk array /storage/brno3-cerit/ filesystem is not working properly. We work on data recovery at the moment. The user data (208 TB) are being coppied (temporary) to Jihlava (/auto/jihlava1-cerit/brno3/export), with expected time about 1 or 2 weeks (due to the huge volume of data). In case you need your data urgently, please contact us at meta@cesnet.cz, we will copy it with a higher priority.

Jihlava's disk array will serve temporary (during the Brno's disk array recovery) as /home for zewura and zegox clusters, and zuphux frontend. All accessible data will be available also via simlink /storage/brno3-cerit. All the data will return from Jihlava to Brno after the Brno's disk array recovery.

With apologies for the inconvenience and with thanks for your understanding,

MetaCentrum & CERIT-SC


Ivana Křenková, 26. 9. 2014

26.9.2014 - Unexpected outage of /storage/brno3-cerit

Dear users,

let us inform you, due to an unexpected short power outage, the disk array /storage/brno3-cerit/ is temporarly unavailable today. We work on data recovery at the moment. In case you need your data very urgently, please contact us at meta@cesnet.cz, we ensure copying your data to another disk storage.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková, MetaCentrum


Ivana Křenková, 26. 9. 2014

19.8.2014 - Unexpected power otage in Ostrava (GPU cluster doom)

Dear users,

let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...


Ivana Křenková
MetaCentrum


Tom Rebok, 19. 8. 2014

15. 8. 2014 14:45 - 22:00 - Unexpected power outage in Brno server rooms, some services may still not work (e.g., license server, portal)

Dear users,

today, another unexpected power outage has occured, this time in Brno server rooms. Because of this, the Brno part of MetaCentrum infrastructure has been paralyzed, including several central services hosted there (e.g., scheduler, license server, disk storages, ...). The jobs running during the outage had been unfortunately stopped.

Most of the nodes and services should be available now. However, a few power circuits couldn't be revived and a deeper inspection of power supplies should be performed in order to detect the failing ones -- thus, several services (e.g., license server and parts of the portal) still not work.

We're really sorry for the troubles caused -- unfortunately, we're pulling the shorter end of the rope in the fight "higher power" vs. man. :-(

Tom Rebok
MetaCentrum


Tom Rebok, 16. 8. 2014

15.8.2014 - Unexpected power otage in Ostrava (GPU cluster doom)

Dear users,

let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...


Ivana Křenková
MetaCentrum


Tom Rebok, 15. 8. 2014

19.8.2014 11:00-13:00 - Skirit frontend planned outage

Let us inform you that on Tuesday (August 19, 11:00 p.m.) the skirit frontend will be shortly unavailable due to a SW upgrade. All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum


Ivana Křenková, 14. 8. 2014

7.8.2014 3:50 - 9:00 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Tom Rebok
MetaCentrum & CERIT-SC.


Tom Rebok, 7. 8. 2014

25.7.2014 14:00 - 14:30 - Connectivity problems in Pilsen

Today, around 2p.m., there were some unexpected connectivity problems observed at server rooms of the University of West Bohemia, which affected our pilsen nodes as well. The major problems were noticed between 2pm and 2:30pm, however, some consequent minor problems could be noticed even after that time.

The connectivity should be already restored. (Nevertheless, some related service works still occur...)

We apologize for any inconvenience caused.

Tomáš Rebok,
MetaCentrum & CERIT-SC.


Tom Rebok, 25. 7. 2014

28.4.2014 - Unexpected power outage in Jihlava

Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were partially temporarly unavailable. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Křenková
MetaCentrum & CERIT-SC

 


Ivana Křenková, 28. 4. 2014

16.4.2014 16:00 - Unexpected outage of /storage/brno2 and fronted skirit

Because of several SW problems that have recently occured, the disk array /storage/brno2/ and frontend skirit are not working properly today again.

We apologize for any inconvenience caused.

Ivana Křenková, MetaCentrum


Ivana Křenková, 16. 4. 2014

10.4.2014 - Unexpected outage of /storage/brno2, some fronteds, and nodes

Because of several SW problems that have recently occured, the disk array /storage/brno2/, some frontends and nodes were not working properly today. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused.

Ivana Křenková,

MetaCentrum


Ivana Křenková, 10. 4. 2014

3/23/2014 23:00 PM - Zuphux frontend planned outage

Let us inform you that on Saturday (March 23, 23:00 p.m.) the zuphux frontend will be shortly unavailable due to a SW upgrade (Debian 6 -> Debian 7). All running proccesses on the frontend will be terminated during the outage.

You can use any of the other frontends during the outage:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum & CREIT-SC


Ivana Křenková, 19. 3. 2014

25.-26. 2. 2014 - Service maintenance of the disk array /storage/brno1 (/storage/home)

Because of several HW/SW problems that have recently occured with the disk array /storage/brno1 (/storage/home), its complex service maintenance and SW upgrade has to be urgently performed.

Unfortunately, this maintenance cannot be performed on the live system; thus, the disk array has to be ***PUT OUT OF OPERATION*** (and made inaccessible)

on Tuesday, 25. February 2014 during morning hours
(The assumed shutdown duration is 1-2 days.)

Influence on the running jobs:

We're really sorry for the problems that may occur. Unfortunatelly, the current condition of the /storage/brno1 (/storage/home) disk array cannot be left untouched any more -- this would result in bigger problems in the future.

With many thanks for understanding
Tomáš Rebok.


Tom Rebok, 20. 2. 2014

6. 1. 2014 - Unexpected power outage in Jihlava

Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Křenková
MetaCentrum & CERIT-SC


Ivana Křenková, 6. 1. 2014

5. 11. 2013 - Unexpected power outage in Jihlava (Zigur and Zapat clusters)

Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were  temporarly unavailable.The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

 

Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, 5. 11. 2013

1. 10. 2013 - Outage in Brno, October 1, 2012

All computing nodes located in the computing room of ICS MU (with property "brno", except machines zewura [1-8]) will be down on Tuesday October 1st due to works on electric network extension for expected new cluster of the CERIT-SC center.

Long jobs queues (more than 4 days) were disabled on that clusters. All the other  queues will be disabled later. Running jobs will be killed on switching the machines off. Please finish all jobs until end of September. Running jobs will be killed on switching the machines off.

At the same time, the frontend skirit.ics.muni.cz will not be available during the outage.

We are sorry for temporary unavailability of the resources.


Ivana Křenková, 26. 9. 2013

9. 9. 2013 9:00 - 17:00 - Planned system actualisation on /storage/plzen2-archieve/

On Monday between 9:00 a.m. and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ will be unavailable due to an actualisation of the system. 
With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, 3. 9. 2013

7.8.2013 11:45PM - Short power outage at Jihlava

The following machines were affected: zapat23 zapat98 zapat99 zapat100 zapat101 zapat111 zigur1 zigur3 zigur28 zigur30 zigur31


Martin Kuba, 8. 8. 2013

29. 7. 2013 - Power outage in Jihlava's server room

Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat and disk array /storage/jihlava1-cerit are temporarly unavailable. Unfortunatelly all running jobs have been terminated.

With apologies for the inconvenience and with thanks for your understanding.

 


Ivana Křenková, 29. 7. 2013

10. 8. 2013 7:00 - 10.00 - Planned system actualisation on /storage/plzen2-archieve/

Let us inform you that today between 14:00 and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ can be shortly unavailable due to an actualisation of the system.
With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, 9. 7. 2013

18. 6. 2013 10.00 - Skirit frontend outage

Let us inform you that on Tuesday (June 18, 10:00 a.m.) the skirit frontend will be shortly unavailable due to a HW upgrade. At the same time the system will be upgraded (Debian 5 -> Debian 6).

You can use any of the other frontends during the outage:

With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, 16. 6. 2013

18. 6. 2013 10.00 - Skirit frontend outage

Let us inform you that on Tuesday (June 18, 10:00 a.m.) the skirit frontend will be shortly unavailable due to a HW upgrade. At the same time the system will be upgraded (Debian 5 -> Debian 6).

You can use any of the other frontends during the outage:

With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, 16. 6. 2013

17. 5. 2013 - Air condition outage in Plzen server room

Let us inform you that due to an unexpected event on air condition in the Pilzen's server room and overheating of the local clusters, machines Gram, Minos, Nympha, Konos, Ajax, and disk array /storage/plzen1 are unavailable from todays evening.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, 17. 5. 2013

16. 5. 2013 - Brno's disk array outage (/storage/brno1)

Dnes došlo v důsledku servisního zásahu dodavatele k neplánovanému výpadku staršího brněnského diskového pole. Dočasně není dostupný /storage/brno1, /afs a SW moduly. Omlouváme se za nepříjemnosti.


Petr Hanousek, 16. 5. 2013

5. 3. 2013 - New trouble ticketing system

On 5th March 2013 from 9:00 till approx 12:00 will be unavailable our trouble ticketing system (RT - rt3.cesnet.cz) due to necessary upgrade. During the outage will not be accessible neither the web nor the mail interface. E-mails sent during the outage (ie. for address meta@cesnet.cz) will be delivered after its end. We appologize for the half-day late response on requests.


Petr Hanousek, 5. 3. 2013

22. - 25. 10. 2012 - Scheduled downtime in Pilsen

All computing nodes located in the computing room of ZČU (ajax, konos, minos[20-35], nympha) will be down for the period October 22-25 due to moving to the new server room. Currently jobs are held in queues. Running jobs will be killed on switching the machines off.

We are sorry for temporary unavailability of the resources.


Ivana Křenková, 22. 10. 2012

10.-11.10.2012 - Reconstruction of electrical wiring in Pilsen - afterworks

The takeover of work on switching Pilsen's UL011 to energocentrum was revealed serious defect - failure of some support systems (measurement and control). The repair take unfortunately another switch off (killing of running jobs). The works will take place on the night of Wednesday to Thursday, October 10, 2012 (21:00 - 5:00). Sorry for the inconvenience.


Petr Hanousek, 2. 10. 2012

14.9.2012 - Filled volume /storage/brno1

Volume /storage/brno1 is filled to 100 percent. Moreover, there is also probably damaged the file system, so the volume is not currently suitable for working with the data. Please use the volumes /storage/brno2 (11TB available) and /storage/plzen1 (27TB available) for your work. Unfortunately I cannot estimate the time needed for repair so far.

In this context I would like to ask you to delete all unnecessary files stored in mentioned volumes.


Petr Hanousek, 14. 9. 2012

19. - 20.9.2012 - Reconstruction of electrical wiring in Pilsen vol 2

On the night of 19 on September 20, 2012 will be reconstructed the wiring in a server room in Pilsen. Machines will be switched off in Wednesday 19th in the afternoon, launch is anticipated in Thursday 30th in the morning. From Thursday morning should be finally available the "long" queue on affected machines.

Besides mentioned clusters will be also unavailable disk volume /storage/plzen1.

We apologize for the temporary inconveniences.


Petr Hanousek, 13. 9. 2012

29.8.2012 - Delayed reconstruction of electrical wiring in Pilsen

Reported outage for tomorrow is canceled because of problems at the supplier's works. We will inform you about newly planned suspension through this channel. 'Long' queue on affected machines will remain closed for now.


Petr Hanousek, 29. 8. 2012

29.8. - 30.8.2012 - Reconstruction of electrical wiring in Pilsen

On the night of 29 on August 30, 2012 will be reconstructed the wiring in a server room in Pilsen. Machines will be switched off in Wednesday 29th in the afternoon, launch is anticipated in Thursday 30th in the morning. The "long" queue is already suspended for taking jobs on these machines, all possibly running jobs will be killed in the time of power down.

Besides mentioned clusters will be also unavailable disk volume /storage/plzen1.

We apologize for the temporary inconveniences.


Petr Hanousek, 22. 8. 2012