Outages

You can read this as RSS feed.

21.-23.5.2018 - Expected restart of virtual machines in MetaCloud due to security update

Dear VO MetaCloud users,
 
due to planned maintenance and security updates on physical machines, the virtual machines dukan1.ics.muni.cz - dukan26ics.muni.cz and gorbag.ics.muni.cz
will be continuously restarted in the first half of the next week. The information about affected machines will be in OpenNebula (https://cloud.metacentrum.cz/) in the Info section of each virtual machine.

We apologize for any inconvenience caused.

Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, Thu May 17 10:16:00 CEST 2018

12.2.2018 till 11 AM - Unexpected failure of AFS file system

Actualization 2018-02-12 11 AM: AFS is working properly again

An AFS server crash occurred this weekend, also causing unexpected problems in the vicious part of the AFS subsystem. As a result of these failures, some volumes are not available on AFS (and also SW modules are not available) and can not be logged on to some computational nodes and frontends. We're working on the repair.

We apologize for any inconvenience caused.

Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, Mon Feb 12 10:16:00 CET 2018

5.2.2018 - Unplanned network connectivity outage in Brno

Due to the failure of the network connectivity in the Brno location, there are no services requiring a network connection hosted in Brno - MetaCloud, Brno machines ... We are working on the remedy.
With apologies for the inconvenience and with thanks for your understanding.

MetaCentrum


Ivana Křenková, Mon Feb 05 10:00:00 CET 2018

from Jan 8 - Response to security failures in processors known as Meltdown and Specter

Dear users,

MetaCentre administrators track the situation with recent bugs in processors (known as Meltdown and Specter, for more information see https://spectreattack.com/).
We evaluate the real impacts of infrastructure vulnerabilities. We have applied the available updates in the VMWare and MetaCloud environments. For part of the computational nodes we monitor available updates and evaluate their impact on the Metacentra environment (they are tested for performance limitations). The computing nodes are being updated gradually. If the situation requires, we could force the immediate restart of the computing resources and stop all active tasks. Especially for the upcoming long tasks, please consider postponing their execution at a later time, especially if your tasks can not be restarted.

We apologize for any inconvenience caused.

MetaCentrum


Ivana Křenková, Tue Jan 09 15:50:00 CET 2018

from 31.12.2017 - Unexpected power outage in Prague FZU (cluster luna, kalpa)

Dear users,

let us inform you that due to todays unexpected power outage in Prague's server room the local clusters luna and kalpa are unavailable.
The vendor works on the repair, the length of the outage can not be estimated.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum


Ivana Křenková, Tue Jan 02 15:50:00 CET 2018

7.12.2017 - Disk array /storage/budejovice1/ planned HW upgrade

Let us inform you that on Thursday December 7 the /storage/budejovice1/ (storage-budejovice1.metacentrum.cz) will be moved to a new hardware and will be several hours unavailable during the final synchronization. Shared disk space at hildor*:/scratch.shared, mounted from this storage, will not be available too.

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Křenková, Tue Dec 05 23:00:00 CET 2017

28.11.2017 - PBS Pro bug in new version

Dear users,

Due to a bug in the new version of PBS Pro the walltime of almost all running jobs was reseted. The PBS  Pro could not recognized the CPU usage, significantly overestimated the cpu usage time and jobs unexpectedly ended. We reported the error to PBS Pro developers and returned PBS Pro server to the previous version.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, Tue Nov 28 13:50:00 CET 2017

6.10.2017 (7-10 AM) - Power outage in JU's server room

Dear users,

Let us inform you that due to a planned power outage in Ceske Budejovice the clusters hildor/haldir/hagrid and disk array /storage/budejovice1/ will be temporary unavailable on Friday October 6 (7-10 AM). Unfortunately all running jobs will be terminated. Please copy the data you will need for your calculation during these few days to another disk array.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, Thu Oct 05 01:50:00 CEST 2017

25. 7. 2017 - MetaCloud: firmware actualization on dukan 19-25 machines

Dear users,

Given a pressing need to update firmware in cloud nodes dukan19 through dukan25 we will have to briefly power off virtual machines using those nodes. The intervention is scheduled for Tuesday 25 July. Each node, hence each collocated virtual machine, will be powered off for approximately 20 minutes. We will boot the virtual machines afterwards. There will be no data loss. Affected users have been notified by e-mail.

With apologies for the inconvenience and with thanks for your understanding,

MetaCloud team


Ivana Krenkova, Tue Jul 25 01:50:00 CEST 2017

5. 6. 2017 - MetaCloud: migration of virtual machines running on dukan1-10

Dear users,

On Monday 5th June we are going to migrate virtual machines away from nodes dukan1-10. Affected machines will be powered off temporarily. There will be no data loss. Machines with private network addresses (currently in range 10.4.0.*) require special treatment. Given the current configuration of our network their private IP addresses will have to change. Please, look up the new IP addresses of your virtual machines through the MetaCloud interface after that date. Affected users have already been notified by e-mail.

With apologies for the inconvenience and with thanks for your understanding,
MetaCloud team

 

 

 

 


Ivana Krenkova, Mon May 29 01:50:00 CEST 2017

4.6.2017 (7:45-10 AM) - Power outage in JU's server room

Dear users,

Let us inform you that due to a planned power outage in Ceske Budejovice the clusters hildor/haldir/hagrid and disk array /storage/budejovice1/ will be temporary unavailable on Sunday June 4 (7:45-10 AM). Unfortunately all running jobs will be terminated. Please copy the data you will need for your calculation during these few days to another disk array.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, Wed May 17 01:50:00 CEST 2017

11.5.2017 - OS upgrade on the Zuphux frontend (Centos 7.3) + PBS Pro setting as the default environment in CERIT-SC

On May 11th, server zuphux will be restarted to a new OS version (Centos 7.3).

At the same time, the planning system in the Torque environment (@wagap) will no longer accept new jobs. Existing jobs will be counted on the remaining nodes. The remaining computational nodes in the Torque  environment will be gradually converted to PBS Pro. Machines currently available in a PBS Pro environment are labeled by "Pro" in the PBSMon application  https://metavo.metacentrum.cz/pbsmon2/nodes/physical .

Frontend zuphux.cerit-sc.cz will be set by default to PBSPro (@wagap-pro) environment.

With apologies for the inconvenience and with thanks for your understanding.

CERIT-SC users support


Ivana Křenková, Wed May 10 23:00:00 CEST 2017

7.4.2017 4 PM-0 AM - Zuphux frontend and @wagap, @wagap-pro outage

On Friday April 4, from 15:45, the frontend zuphux will be temporary unavaibale due to an unplanned emergency service of critical disk array controllers. Estimated time of the outage is 2 hours. Other frontends can be used during the outage:
https://wiki.metacentrum.cz/wiki/Frontend

Other services running from the affected disk array (Torque server @wagap and PBS Pro server @wagap-pro) will be migrated to another server on Thuersday evening, with some very short outages on Thuersday and Friday evenings.

With apologies for the inconvenience and with thanks for your understanding.CERIT_SC support


Ivana Křenková, Thu Apr 06 23:00:00 CEST 2017

10.3.2017 - Outage on archieval storage in Brno /storage/brno4-cerit-hsm/

Dear users,

after the upgrade of the  HSM storage-brno4-cerit-hsm.metacentrum.cz (the upgrade was realised by the vendor on February 14-15) unexpexted error occured, the HSM is particulary available. The vendor works on the repair, the length of the outage can not be estimated. 

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, Fri Mar 10 01:50:00 CET 2017

24.2.2017 from 4 AM - Unplanned outage in Pilsen

Today (around 4 AM) occured an accident on watter cooling system in Pilsen, which affected all Pilsen computing nodes, frontends, and /storage/plzen1/. The machines are back in operation (Nevertheless, some related service works still occur...)

We apologize for any inconvenience caused.

Ivana Křenková,
MetaCentrum

 


Tom Rebok, Fri Feb 24 15:26:00 CET 2017

from 19.2.2017 - Outage on archieval storage in Brno /storage/brno4-cerit-hsm/

Dear users,

after the upgrade of the  HSM storage-brno4-cerit-hsm.metacentrum.cz (the upgrade was realised by the vendor on February 14-15) unexpexted error occured, the HSM is unavailable now. The vendor works on the repair, the length of the outage can not be estimated. 

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, Mon Feb 20 01:50:00 CET 2017

14.-15.2.2017 - Planned system actualisation on archieval storage in Brno /storage/brno4-cerit-hsm/

Dear users,

Let us inform you that from Wednesday February 14 (9 AM) to February 15 (6 PM) the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to a security actualisation of the system.

IMPORTANT: The HSM still hosts data from Jihlava /storage/jihlava1-cerit/


Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, Tue Feb 07 01:50:00 CET 2017

23.1.2017 - Disk array /storage/praha1/ planned HW upgrade

Let us inform you that on Monday January 23 the Prague's /storage/praha1/ (storage-praha1.metacentrum.cz) will be moved to a new hardware and will be several hours unavailable during the final synchronization. Shared disk space at *:/scratch.shared, mounted from this storage, will not be available too.

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Křenková, Mon Jan 09 23:00:00 CET 2017

11. 1. 2017 - Planned MetaCloud upgrade

Dear users,

the OpenNebula upgrade announced earlier will take place on 11 January. At that time, the front-end will be unavailable for some time, and virtual machines running in the dukan.ics.muni.cz cluster will be restarted as we update the nodes.

Please be aware that there may be issues especially with older virtual machines instantiated with the previous OpenNebula version (2015 and earlier). Please contact us (cloud@metacentrum.cz) in case of trouble.

With apologies for the inconvenience and with thanks for your understanding,
MetaCloud tym

 

 

 

 


Ivana Krenkova, Mon Jan 09 01:50:00 CET 2017

15.12.2016 (11PM-02AM) - Planed outage of Torque server @wagap

Dear users,

Let us inform you that on Thuersday (Dec 15, 11PM - 2AM.) the Torque server wagap.cerit-sc.cz will be temporary unavailable due to a SW upgrade. Sending new jobs and manipulating with jobs in the system will not be allowed during the outage.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, Thu Dec 15 01:50:00 CET 2016

8.12.2016 - Power outage in JU's server room

Dear users,

Let us inform you that due to an unexpected power outage in Ceske Budejovice the clusters hildor/haldir/hagrid are temporary unavailable. Unfortunately all running jobs have been terminated.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, Thu Dec 08 01:50:00 CET 2016

from 1.11.2016 - tarkil frontend planned outage

Let us inform you that the tarkil.cesnet.cz frontend is unavailable due to a migration to another HW. All running processes on the frontend were terminated.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum


Ivana Křenková, Tue Nov 01 23:00:00 CET 2016

27.10.2016 from10 PM - /storage/brno3-cerit/ planned HW upgrade

Let us inform you that on Thuersday October 27 (10 AM) the Brno's /storage/brno3-cerit/ (storage-brno3-cerit.metacentrum.cz) will be moved to a new hardware.

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC


Ivana Křenková, Tue Oct 25 23:00:00 CEST 2016

30.8.2016 from 10 PM - Zuphux frontend planned outage

Let us inform you that on Tuesday (August 30,  10 PM - 0 AM) the zuphux frontend will be shortly unavailable due to a migration to another HW. All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum


Ivana Křenková, Wed Aug 24 23:00:00 CEST 2016

25.7.2016 10:00 AM - Hadoop cluster planned outage

Dear users,

Let us inform you that on Monday (July 25, 10:00 a.m.) the Hadoop cluster will be unavailable due to upgrade from CDH 5.5.1 to 5.8.0 (with Hadoop 2.6.0, and Spark 1.6.0) and due to Java environment upgrade.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 


Ivana Krenkova, Wed Feb 03 03:50:00 CET 2016

25.-29.7.2016 - Planed service maintenance of clusters and disk array in Ceske Budejovice

Dear users,

Let us inform you that from July 25 to 29, hildor, haldir, hagrid clusters and disk array /storage/budejovice1/ will not be temporarily available due to moving to another server room. Please copy the data you will need for your calculation during these few days to another disk array.

With many thanks for understanding,

Ivana Krenkova
MetaCentrum


 

 


Ivana Křenková, Fri Jun 24 15:50:00 CEST 2016

18.4.2016 7-15:00 - Unplaned air conditioning outage in Brno CERIT-SC

Dear users,

let us inform you that due to a unexpected air conditioning outage in Brno's CERIT-SC server room today in the morning, a part of local clusters zigur, zapat, and zebra has been switched off as a prevention of overheating. The computing nodes will be gradually returned back to normal operation. Unfortunatelly all running jobs on affected nodes have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum & CERIT-CS

 


Ivana Krenkova, Mon Apr 11 03:50:00 CEST 2016

27.4.2016 10 PM - Power outage in UK's server room

Dear users,

Let us inform you that due to a planned power outage in UK's Karolina server room the local servers eru1, eru2, acharon, AFS servers asterix, obelix, sal will be temporary unavailable tomorrow (April 27), 10-11 PM.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, Tue Apr 26 01:50:00 CEST 2016

21.4.2016 from 10:30 PM - Planned MetaCloud upgrade

Dear users,

CERIT-SC's resources in the OpenNebula MetaCloud (phys. nodes hda*) will be under maintenance this Thursday 21th April from 10:30pm. Your virtual machine(s) will be only paused (you won't loose your running state) and one by one resumed. Optimistic estimate is that each VM shouldn't be down for more than 30 minutes. Whole maintenance can take up to 2 hours.

 

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, Tue Apr 19 01:50:00 CEST 2016

18.4.2016 7-15:00 - Planed power outage in Brno UKB

Dear users,

let us inform you that due to a planned power outage in Brno's server room in UKB the local clusters lex, krux, zubat and disk arrays brno9-ceitec + brno10-ceitec-hsm will be temporary unavailable.

We apologize for any inconvience caused.

 
Ivana Krenkova
MetaCentrum

 


Ivana Krenkova, Mon Apr 11 03:50:00 CEST 2016

7.4.2016 - Power outage in JU's server room

Dear users,

Let us inform you that due to an unexpected power outage the clusters hermes/hildor/haldir are temporary unavailable.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum

 

 


Ivana Krenkova, Thu Apr 07 01:50:00 CEST 2016

1.3.2016 - PBS server (sendmail) problem today

Dear users,

Let us inform you the sendmail of the PBS server sent not actual error reports about terminated jobs via e-mails today in the night.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, Tue Mar 01 01:50:00 CET 2016

2.3.-3.3.2016 - Planned system actualisation on archieval storage in Brno /storage/brno4-cerit-hsm/

Dear users,

Let us inform you that from Wednesday March 2 (9 AM) to March 3 (6 PM) the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to a security actualisation of the system.

*****************************************
IMPORTANT:
The HSM hosts data from Jihlava /storage/jihlava1-cerit/
*****************************************

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, Tue Feb 23 01:50:00 CET 2016

23.2.2016 10-11AM - Planned service maintenance of /storage/brno6/

Dear users,

Let us inform you that on Tuesday, February 23 the Brno's /storage/brno6/ will be unavailable due to battery replacement by the supplier.

Influence on the running jobs:

Moreover, the user interface (Sunstone) as well as the programming interface (API) for MetaCloud will be unavailable for several hours. Existing virtual machines will not be affected! It will be, however, impossible to create new ones or manage existing ones during the outage.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková
MetaCentrum & CERIT-SC

 

 

 


Ivana Krenkova, Tue Feb 16 01:50:00 CET 2016

12.2.2016 8AM - Hadoop cluster planned outage

Dear users,

Let us inform you that on Friday (February 12, 8:00 a.m.) the Hadoop cluster will be shortly unavailable due to SW upgrade:

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 


Ivana Krenkova, Thu Feb 11 03:50:00 CET 2016

4.2.2016 11AM - Hadoop cluster planned outage

Dear users,

Let us inform you that on Thuersday (February 4, 11:00 a.m.) the Hadoop cluster will be shortly unavailable due to certificates change, machines reboot and preparation of the new experimental cluster based on containers.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 


Ivana Krenkova, Wed Feb 03 03:50:00 CET 2016

11.2.2016 - Planned MetaCloud upgrade

Dear users,

A long-planned upgrade of the OpenNebula cloud manager will take place on 11 February. The user interface (Sunstone) as well as the programming interface (API) for MetaCloud will be unavailable for several hours. Existing virtual machines will not be affected! It will be, however, impossible to create new ones or manage existing ones during the outage. Please accept our apologies for the inconvenience this may cause you.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, Thu Jan 28 01:50:00 CET 2016

23.-24. 1.2016 - Planned network upgrade in FZU AVCR in Prague

Dear users,

let us inform you that due to a planned upgrade of the network connection in the Institute of Physics of the Czech Academy of Sciences in Prague, the local clusters kalpa and luna + disk array /storage/praha4-fzu/ will be temporary unavailable at the veekend, 23-24 January.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum

 

 


Ivana Krenkova, Thu Jan 21 08:00:00 CET 2016

21.10.2015 16:30 - Unexpected power outage in Brno UKB (clusters perian)

Dear users,

let us inform you that due to an unexpected power outage in Brno's server room in UKB the local cluster Perian was temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum & CERIT-SC

 


Ivana Krenkova, Wed Oct 21 03:50:00 CEST 2015

14.10.2015 5-11 PM - Kerberos service outage

Dear users,

Let us inform you that yesterday in the evening (17-23 hrs.) due to a violation of the integrity of the KDC server database that operates Kerberos, some of database records were temporary unavailable. Unfortunately it caused problems with operations requiring Kerberos (typically saving data from running jobs to a /storage etc.).

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, Thu Oct 15 01:50:00 CEST 2015

9.10.2015 - MetaCloud outage

Dear users,

Let us inform you that the MetaCloud front-end is unavailable due to a HW fault in its storage array. Virtual machines created beforehand are still operational, but new ones cannot be instantiated and you also cannot manage existing machines through the cloud management interface (OpenNebula). Thank you for your patience.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, Fri Oct 09 01:50:00 CEST 2015

8.-9.10.2015 - Planned system actualisation on /storage/plzen1/ and GALAXY portal outage

Dear users,

Let us inform you that From October 8 to 9 the Pilsen's /storage/plzen1/ will be unavailable due to moving on a new hardware

*****************************************
IMPORTANT
Portal GALAXY, hosted on the storage will be unavailable during the outage.
*****************************************

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 

 

 


Ivana Krenkova, Wed Oct 07 01:50:00 CEST 2015

18.8.-18.10.2015 - Planed service maintenance of zigur and zapat clusters and disk array /storage/jihlava1-cerit/

Due to HW problems (being solved with original supplier), the zigur and zapat clusters will be available 1 month later, in the second half of October.

With many thanks for understanding.

--

Dear users,

From August 18, due to moving to Brno,  zigur and zapat clusters and disk array /storage/jihlava1-cerit/ will not be available temporarily.

The clusters are covered by maintenance contract therefore the move will be done by the original supplier, approx. time of moving is a month (144 nodes of cluster plus disk array).

Influence on the running jobs:


With many thanks for understanding,

Ivana Krenkova
MetaCentrum & CERIT-SC
 

 


Ivana Křenková, Thu Oct 01 15:50:00 CEST 2015

22.9.-23. 9.2015 - Planned system actualisation on archieval storage in Brno

Dear users,

Let us inform you that from Tuesday September 22 (10 AM) to Wednesday September 23 the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to an actualisation of the system.

*****************************************
IMPORTANT
The HSM hosts data from Jihlava /storage/jihlava1-cerit/ and older /storage/brno1/. We strongly recommend you to transfer all data used in your jobs to another storage (for example /storage/brno6). In case you need any data from these archieval storages during the outage, please inform us in advance via e-mail meta@cesnet.cz.
*****************************************

Influence on the running jobs:

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, Wed Sep 16 01:50:00 CEST 2015

18. 9.2015 -? - Outage on archieval storage in Brno

Dear users,

Let us inform you that from September 18 the Brno's /storage/brno4-cerit-hsm/ is not available due to an SW failure of HSM system. Major software patches (bug fixes) will be applied by the system vendor.

IMPORTANT: The HSM hosts data from Jihlava /storage/jihlava1-cerit/ and older /storage/brno1/ (/storage/home)

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova,
MetaCentrum & CERIT-SC

 

 


Ivana Krenkova, Wed Sep 16 01:30:00 CEST 2015

29.8.2015 - Power outage in Prague (frontend and cluster tarkil + /storage/praha1)

Dear users,

let us inform you that due to an unexpected power outage in Prague's server room the frontend and local cluster Tarkil, Mudrc, as well as the /storage/praha1 are temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova

MetaCentrum

 

 


Ivana Krenkova, Sat Aug 29 03:50:00 CEST 2015

24.-31.8.2015 - Planed service maintenance of doom cluster and disk array /storage/ostrava1/

Dear users,

Let us inform you that due to a power outage in Jihlava's server room today, the local cluster Doom, as well as the /storage/ostrava1/ are temporary unavailable. The computing nodes will be gradually returned back to normal operation later this day.

From August 24 to 31, due to moving to Brno, doom cluster and disk array /storage/ostrava1/ will not be available temporarily. Please copy to another disk storade date you will need for your calsulation during these few days.

 

With many thanks for understanding,

Ivana Krenkova
MetaCentrum


 

 


Ivana Křenková, Tue Aug 11 15:50:00 CEST 2015

25.6.2015 10AM - Hadoop cluster planned outage

Dear users,

Let us inform you that on Tuesday (June 25, 10:00 a.m.) the Hadoop cluster will be shortly unavailable due to a HW maintainance - replacing of CMOS battery on hador-c1.ics.muni.cz server.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 

 


Ivana Krenkova, Fri Jun 12 03:50:00 CEST 2015

22.6.2014 10-11 PM - Skirit frontend planned outage

Let us inform you that on Monday, June 22 10AM, the skirit frontend will be shortly unavailable due to an upgrade. All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,
MetaCentrum

Ivana Křenková, Fri Jun 19 23:00:00 CEST 2015

16.6.2015 10 - 12 AM - Planed power outage in Prague (frontend and cluster tarkil + /storage/praha1)

Dear users,

let us inform you that due to a planned outage of the network connection, frontend tarkil, cluster tarkil and disk array /storage/praha1/ will be temporally unavailable. Jobs running on the affected cluster or using the /storage/praha1/ will be temporarly suspended. Shortly before (and of course also during) the outage there will be no possibility to start a new job on the affected cluster. 

Please, terminate all interactive jobs running from the tarkil frontend until Tuesday morning. All running processes on the frontend will be terminated during the outage.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum

 

 


Ivana Krenkova, Fri Jun 12 03:50:00 CEST 2015

18.5.2014 10-12 PM - Skirit frontend planned outage

Let us inform you that on Monday, May 18, the skirit frontend will be shortly unavailable due to an upgrade. All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum


Ivana Křenková, Thu May 14 23:00:00 CEST 2015

31.3.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum & CERIT-SC

 


Ivana Krenkova, Tue Mar 31 03:50:00 CEST 2015

24.-27.3.2015 - Scheduled downtime of the 'metacloud-dukan' cluster

Dear Users!

This is to inform you that there will be a scheduled downtime of the 'metacloud-dukan' cluster, part of the physical resources in MetaCloud. This will be the last in a series of outages that were required to extend, improve and physically move our cloud infrastructure. The downtime well begin on 24 March and end on 27 March. All virtual machines running on nodes dukan{1..10}.ics.muni.cz will be stopped. During the outage, the hypervisor will change from XEN to KVM, finally unifying hypervisors used on all resources across MetaCloud.

How to tell if the outage affects your virtual machines

Use the OpenNebula dashboard to display a list of all your virtual machines (Virtual Resources → Virtual Machines). The 'Host' column shows the physical node name for each VM. The outage will affect all virtual machines on nodes dukan{1..10}.ics.muni.cz. You may also filter the contents of the VMs table using the Search box on the top of the page.

What will happen with my virtual machines during the outage

All affected VMs must be stopped. It will be a great help to us if you can stop your own machines before end of business on Monday, 23 March. Otherwise, we will stop you VMs and move them to storage as the downtime you will be able to start your machines again. Since the hypervisor will change from XEN to KVM, some machines may fail to start properly. Therefore, do not hesitate to contact us in case any of your VMs acts  strangely. Unfortunately, it is not possible to check for compatibility with KVM beforehand, and can be only done experimentally. Standard MetaCentrum images, however, are already tuned for KVM and are expected to cope without glitches.
Thank you for your understanding. Be assured that this is the last planned downtime for the foreseeable future.

Best regards, MetaCloud

 


Ivana Křenková, Tue Mar 10 15:50:00 CET 2015

3.3.2015 10-12 hod. - 3.12.2014: Unexpected power outage in Prague (cluster luna)

Dear users,

let us inform you that due to todays unexpected power outage in Prague's server room the local cluste luna is temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...


Ivana Křenková
MetaCentrum .


Ivana Křenková, Tue Mar 03 15:50:00 CET 2015

13.1.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.

We apologize for any inconvience caused.

Ivana Krenkova
MetaCentrum & CERIT-SC

 


Ivana Krenkova, Tue Jan 13 03:50:00 CET 2015

10.1.2015 - Unexpected power outage in Jihlava (clusters zigur a zapat)

Dear users,

let us inform you that due to todays unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Krenkova
MetaCentrum & CERIT-SC.


Ivana Krenkova, Sat Jan 10 03:50:00 CET 2015

- Possible problem of memory writes on zebra cluster

After moving nodes of the zewura SMP cluster (renamed to zebra1-12) to the new computer room some of the nodes appeare to exhibit very rare memory write failures under very intesive memory stress test. The problem is not reproducible, it occured only few times during several days of testing. We consider it almost impossible to occure in normal operation. The problem was reported to the supplier's technical support for futher detailed diagnostics.

Nodes are being returned to the normal operation. Despite the problems are not expected, we kindly ask the users for reporting any suspicious behaviour.

We apologize for any inconvenience caused.

Ivana Krenkova
MetaCentrum & CERIT-SC.


Ivana Krenkova, Tue Dec 09 03:50:00 CET 2014

3.12.2014 - Unexpected power outage in Jihlava (clusters zigur a zapat)

Dear users,

let us inform you that due to todays unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Krenkova
MetaCentrum & CERIT-SC.


Ivana Krenkova, Thu Dec 03 03:50:00 CET 2015

3. -4. 12. 2014 - Planned system actualisation on archieval storages in Pilsen and Brno

Let us inform you that from Wednesdey December 3 (8.30 AM) to Thuersday December 4 (20 PM) the Pilsen's /storage/plzen2-archieve/ and Brno's /storage/brno4-cerit-hsm/ will be unavailable due to an actualisation of the system. In case you need any data from these archieval storages during the outage, please inform us in advance via e-mail meta@cesnet.cz.

The other two archieval storages (/storage/jihlava2-archive and /storage/brno5-archive) will not be affected.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Krenkova


Ivana Křenková, Tue Nov 25 10:00:00 CET 2014

28.11.2014 9 - 13 PM - Planed power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to a planned power outage in the Jihlava's server room, the local clusters with property 'jihlava' will be temporarly unavailable on Friday 28.11.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Krenkova
MetaCentrum & CERIT-SC.

 

 


Ivana Krenkova, Fri Nov 21 03:50:00 CET 2014

31.10.2014 - Data transfer finished -- brno3-cerit now in normal operation

Today morning, the transfer of brno3-cerit data (temporarily stored in Jihlava) has been finished -- the brno3-cerit storage is now in normal operation mode.

Attention: Under specific circumstances (particularly, when your jobs have been finishing during synchronization), some data may not been synchronized -- if so, you'll find your data in Jihlava's location, actually available via /auto/jihlava1-cerit/brno3/export/home/$USER (please, transfer the missing data on your own -- we'll delete them after a few weeks).

With best regards
Tom Rebok.


Tom Rebok, Fri Oct 31 16:33:00 CET 2014

29.-30.10.2014 - Returning data back to Jihlava -- short outage of brno3-cerit disk array

Since we managed to repair the array /storage/brno3-cerit, the data (temporarily hosted in Jihlava) will be returned back to Brno

*** on Wednesday, 29th of October ***

Since it is not possible to perform this transfer transparently, it is necessary to operate the /storage/brno3 array in a not fully consistent state for about 1-2 days.

To minimize the impacts of this transfer on you and your computations, it will be managed as follows:

Note: If you change particular data during Wednesday/Thursday in /storage/brno3/home/$LOGIN, the data can be overwritten by data synchronised/copied from Jihlava.

The running jobs should not be influenced by this transfer.

We are sorry for inconvenience.

With best regards and thanks for understanding,
Tomas Rebok,
MetaCentrum NGI.


Tom Rebok, Thu Oct 23 01:40:00 CEST 2014

4.10.2014 - Unexpected power otage in Ostrava (GPU cluster doom)

Dear users,

let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...


Ivana Křenková
MetaCentrum


Tom Rebok, Sat Oct 04 11:05:00 CEST 2014

1. 10. 2014 9:00 - 16:00 - Planned system actualisation on /storage/brno4-cerit-hsm/

Hierarchical storage in Brno /storage/brno4-cerit-hsm/ will be inaccessible on October 1, 2014, from 9 AM till 16 AM (expected). Major software patches (bug fixes) will be applied by the system vendor.

With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, Wed Oct 01 13:11:00 CEST 2014

29.9.2014 - Unexpected outage of /storage/brno2, some fronteds, and nodes

Because of several SW problems that have recently occured, the disk array /storage/brno2/, some frontends and nodes were not working properly today. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused.

Ivana Křenková,

MetaCentrum


Ivana Křenková, Mon Sep 29 23:00:00 CEST 2014

26.9.2014 - Unavailability of /storage/brno3-cerit

Dear users,

let us inform you, due to an unexpected short power outage on the CERIT-SC server room last night (25.9., approx 9 PM) the the disk array /storage/brno3-cerit/ filesystem is not working properly. We work on data recovery at the moment. The user data (208 TB) are being coppied (temporary) to Jihlava (/auto/jihlava1-cerit/brno3/export), with expected time about 1 or 2 weeks (due to the huge volume of data). In case you need your data urgently, please contact us at meta@cesnet.cz, we will copy it with a higher priority.

Jihlava's disk array will serve temporary (during the Brno's disk array recovery) as /home for zewura and zegox clusters, and zuphux frontend. All accessible data will be available also via simlink /storage/brno3-cerit. All the data will return from Jihlava to Brno after the Brno's disk array recovery.

With apologies for the inconvenience and with thanks for your understanding,

MetaCentrum & CERIT-SC


Ivana Křenková, Fri Sep 26 15:00:00 CEST 2014

26.9.2014 - Unexpected outage of /storage/brno3-cerit

Dear users,

let us inform you, due to an unexpected short power outage, the disk array /storage/brno3-cerit/ is temporarly unavailable today. We work on data recovery at the moment. In case you need your data very urgently, please contact us at meta@cesnet.cz, we ensure copying your data to another disk storage.

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková, MetaCentrum


Ivana Křenková, Fri Sep 26 04:00:00 CEST 2014

21.9.2014 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Krenkova
MetaCentrum & CERIT-SC.


Ivana Krenkova, Mon Sep 21 03:50:00 CEST 2015

19.8.2014 - Unexpected power otage in Ostrava (GPU cluster doom)

Dear users,

let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...


Ivana Křenková
MetaCentrum


Tom Rebok, Tue Aug 19 11:05:00 CEST 2014

15. 8. 2014 14:45 - 22:00 - Unexpected power outage in Brno server rooms, some services may still not work (e.g., license server, portal)

Dear users,

today, another unexpected power outage has occured, this time in Brno server rooms. Because of this, the Brno part of MetaCentrum infrastructure has been paralyzed, including several central services hosted there (e.g., scheduler, license server, disk storages, ...). The jobs running during the outage had been unfortunately stopped.

Most of the nodes and services should be available now. However, a few power circuits couldn't be revived and a deeper inspection of power supplies should be performed in order to detect the failing ones -- thus, several services (e.g., license server and parts of the portal) still not work.

We're really sorry for the troubles caused -- unfortunately, we're pulling the shorter end of the rope in the fight "higher power" vs. man. :-(

Tom Rebok
MetaCentrum


Tom Rebok, Sat Aug 16 07:44:00 CEST 2014

19.8.2014 11:00-13:00 - Skirit frontend planned outage

Let us inform you that on Tuesday (August 19, 11:00 p.m.) the skirit frontend will be shortly unavailable due to a SW upgrade. All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum


Ivana Křenková, Thu Aug 14 23:00:00 CEST 2014

15.8.2014 - Unexpected power otage in Ostrava (GPU cluster doom)

Dear users,

let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...


Ivana Křenková
MetaCentrum


Tom Rebok, Fri Aug 15 11:05:00 CEST 2014

7.8.2014 3:50 - 9:00 - Unexpected power outage in Jihlava (clusters zigur a zapat + /storage/jihlava1)

Dear users,

let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Tom Rebok
MetaCentrum & CERIT-SC.


Tom Rebok, Thu Aug 07 11:05:00 CEST 2014

25.7.2014 14:00 - 14:30 - Connectivity problems in Pilsen

Today, around 2p.m., there were some unexpected connectivity problems observed at server rooms of the University of West Bohemia, which affected our pilsen nodes as well. The major problems were noticed between 2pm and 2:30pm, however, some consequent minor problems could be noticed even after that time.

The connectivity should be already restored. (Nevertheless, some related service works still occur...)

We apologize for any inconvenience caused.

Tomáš Rebok,
MetaCentrum & CERIT-SC.


Tom Rebok, Fri Jul 25 15:26:00 CEST 2014

-

V noci na dnešek došlo k havárii AFS serveru, která vyvolala rovněž nečekané potíže v klinstké části AFS subsystému. V důsledku těchto poruch jsou nedostupné některé svazky na AFS (nejsou dostupné některé SW moduly) a nejde se přihlásit na některé výpočetní uzly a čelní uzly postižené výše zmíněnou chybou. Na opravě pracujeme.

Velmi se omlouváme za způsobené komplikace.

Ivana Křenková
MetaCentrum & CERIT-SC

 

 


Ivana Křenková, Thu Jan 01 01:00:00 CET 1970

-

V důsledku nočních masivních síťových útoků nebyly dnes přístupné některé autentizované služby -- správa osobních údajů, RT rozhraní, autentizovaná část webu a wiki, apod. Problémy měly i některé brněnské uzly centra CERIT-SC, krátce i frontend skirit a plánovací systémy.

V tuto chvíli jsou všechny služby obnoveny. Pokud narazíte na problém, prosím reportujte.

Velmi se omlouváme za způsobené komplikace.

Ivana Křenková
MetaCentrum & CERIT-SC

 

 


Ivana Křenková, Thu Jan 01 01:00:00 CET 1970

28.4.2014 - Unexpected power outage in Jihlava

Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were partially temporarly unavailable. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Křenková
MetaCentrum & CERIT-SC

 


Ivana Křenková, Mon Apr 28 14:00:00 CEST 2014

16.4.2014 16:00 - Unexpected outage of /storage/brno2 and fronted skirit

Because of several SW problems that have recently occured, the disk array /storage/brno2/ and frontend skirit are not working properly today again.

We apologize for any inconvenience caused.

Ivana Křenková, MetaCentrum


Ivana Křenková, Wed Apr 16 04:00:00 CEST 2014

10.4.2014 - Unexpected outage of /storage/brno2, some fronteds, and nodes

Because of several SW problems that have recently occured, the disk array /storage/brno2/, some frontends and nodes were not working properly today. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused.

Ivana Křenková,

MetaCentrum


Ivana Křenková, Thu Apr 10 23:00:00 CEST 2014

23.3.2014 11:00 PM - Zuphux frontend planned outage

Let us inform you that on Saturday (March 23, 23:00 p.m.) the zuphux frontend will be shortly unavailable due to a SW upgrade (Debian 6 -> Debian 7). All running processes on the frontend will be terminated during the outage.

You can use any of the other frontends during the outage:
https://wiki.metacentrum.cz/wiki/Frontend

With apologies for the inconvenience and with thanks for your understanding.

Ivana Křenková,

MetaCentrum & CERIT-SC


Ivana Křenková, Wed Mar 19 23:00:00 CET 2014

25.-26. 2. 2014 - Service maintenance of the disk array /storage/brno1 (/storage/home)

Because of several HW/SW problems that have recently occured with the disk array /storage/brno1 (/storage/home), its complex service maintenance and SW upgrade has to be urgently performed.

Unfortunately, this maintenance cannot be performed on the live system; thus, the disk array has to be ***PUT OUT OF OPERATION*** (and made inaccessible)

on Tuesday, 25. February 2014 during morning hours
(The assumed shutdown duration is 1-2 days.)

Influence on the running jobs:

We're really sorry for the problems that may occur. Unfortunatelly, the current condition of the /storage/brno1 (/storage/home) disk array cannot be left untouched any more -- this would result in bigger problems in the future.

With many thanks for understanding
Tomáš Rebok.


Tom Rebok, Thu Feb 20 22:05:00 CET 2014

6. 1. 2014 - Unexpected power outage in Jihlava

Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Křenková
MetaCentrum & CERIT-SC


Ivana Křenková, Mon Jan 06 14:14:00 CET 2014

7. 12. 2013 - Outage in Brno

Let us inform you that due to a reconstruction of the Brno's server room at FI MU, the local clusters with propery 'brno' can be  temporarly unavailable on Saturday 7.12. We apologize for any inconvenience caused -- we're unable to influence these circumstances...

Ivana Křenková, MetaCentrum


Ivana Křenková, Tue Nov 05 15:17:00 CET 2013

5. 11. 2013 - Unexpected power outage in Jihlava (Zigur and Zapat clusters)

Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were  temporarly unavailable.The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

 

Ivana Křenková
MetaCentrum & CERIT-SC

Ivana Křenková, Tue Nov 05 15:17:00 CET 2013

1. 10. 2013 - Outage in Brno, October 1, 2012

All computing nodes located in the computing room of ICS MU (with property "brno", except machines zewura [1-8]) will be down on Tuesday October 1st due to works on electric network extension for expected new cluster of the CERIT-SC center.

Long jobs queues (more than 4 days) were disabled on that clusters. All the other  queues will be disabled later. Running jobs will be killed on switching the machines off. Please finish all jobs until end of September. Running jobs will be killed on switching the machines off.

At the same time, the frontend skirit.ics.muni.cz will not be available during the outage.

We are sorry for temporary unavailability of the resources.


Ivana Křenková, Thu Sep 26 16:17:00 CEST 2013

9. 9. 2013 9:00 - 17:00 - Planned system actualisation on /storage/plzen2-archieve/

On Monday between 9:00 a.m. and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ will be unavailable due to an actualisation of the system. 
With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, Tue Sep 03 13:11:00 CEST 2013

13.-18. 8. 2013 - Planned prophylaxis in Plzeň

Regular annual prophylaxis of IT systems in Plzen's West Bohemian University is planned for this week (Tue-Fri). Some outages of CESNET services located in Plzen may occur (AFS and Matlab license server temporary unavailability, network connectivity problems).

With apologies for the inconvenience.


Ivana Křenková, Tue Aug 13 09:51:00 CEST 2013

13. 8. 2013 0:00 - 8:00 - Planned CERIT-SC's HA server outage

Let us inform you that on Tuesday, August 12th  (0:00 - 8:00 a.m.) will be temporarly unavailable servers zuphux.cerit-sc.cz (frontend) and wagap.cerit-sc.cz (Torque server).

With apologies for the inconvenience and with thanks for your understanding.

 


Ivana Křenková, Mon Aug 12 12:14:00 CEST 2013

9.8.2013 - Unexpected power outage in Jihlava (zigur and zapat clusters)

Due to unfavorable weather conditions of the last days (and elimination of their consequences) there's been an unexpected power outage in Jihlava, which hit the CERIT-SC's cluster room, and which affected the zigur and zapat clusters.

The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.

We apologize for any inconvenience caused -- we're unable to influence these circumstances...

T. Rebok,
MetaCentrum & CERIT-SC.


Tom Rebok, Fri Aug 09 11:32:00 CEST 2013

7.8.2013 11:45PM - Short power outage at Jihlava

The following machines were affected: zapat23 zapat98 zapat99 zapat100 zapat101 zapat111 zigur1 zigur3 zigur28 zigur30 zigur31


Martin Kuba, Thu Aug 08 11:41:00 CEST 2013

29. 7. 2013 - Power outage in Jihlava's server room

Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat and disk array /storage/jihlava1-cerit are temporarly unavailable. Unfortunatelly all running jobs have been terminated.

With apologies for the inconvenience and with thanks for your understanding.

 


Ivana Křenková, Mon Jul 29 10:00:00 CEST 2013

10. 8. 2013 7:00 - 10.00 - Planned outage in JCU's server room

Let us inform you that on Saturday, August 10th  (7:00 - 10:00 a.m.) all clusters and disk array located in the JCU's server room (haldir, hildor, hermes, and /storage/budejovice1/) will be shortly unavailable due to a service on the electrical substation and forced interruption of power supply.

Accepting jobs in the longest queues will be suspended on these machines soon. The walltime limit in the priority queues "jcu" and "jcu2" will decreased gradually to prevent running any job during the outage. In the meantime, please use queues "long" or "preemptible" running on the other clusters for long jobs. Remaining running jobs will be killed on switching the machines off.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, Thu Jul 18 10:00:00 CEST 2013

10. 8. 2013 7:00 - 10.00 - Planned system actualisation on /storage/plzen2-archieve/

Let us inform you that today between 14:00 and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ can be shortly unavailable due to an actualisation of the system.
With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, Tue Jul 09 10:00:00 CEST 2013

10. 8. 2013 7:00 - 10.00 - Planned system actualisation on /storage/plzen2-archieve/

Let us inform you that today between 14:00 and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ can be shortly unavailable due to an actualisation of the system.
With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, Tue Jul 09 10:00:00 CEST 2013

18. 6. 2013 10.00 - Skirit frontend outage

Let us inform you that on Tuesday (June 18, 10:00 a.m.) the skirit frontend will be shortly unavailable due to a HW upgrade. At the same time the system will be upgraded (Debian 5 -> Debian 6).

You can use any of the other frontends during the outage:

With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, Sun Jun 16 10:00:00 CEST 2013

16. 5. 2013 - Air condition outage in Plzen server room

Let us inform you that due to an unexpected event on air condition server room and overheating of the local clusters in the Pilzen's -- machines Gram, Minos, Nympha, Konos, Ajax, and disk array /storage/plzen1 are unavailable from todays evening.

With apologies for the inconvenience and with thanks for your understanding.


Ivana Křenková, Fri May 17 10:10:00 CEST 2013

16. 5. 2013 - Brno's disk array outage (/storage/brno1)

Dnes došlo v důsledku servisního zásahu dodavatele k neplánovanému výpadku staršího brněnského diskového pole. Dočasně není dostupný /storage/brno1, /afs a SW moduly. Omlouváme se za nepříjemnosti.


Petr Hanousek, Thu May 16 12:00:00 CEST 2013

Monday 6/05/2013 10:00 - Power switching in Plzen server room

On Monday 6/05/2013 from 10:00 am will start the power switching in Plzen server room. It will be necessary to switch off the Gram and Minos clusters and cloud server Banakil. In the bad case can occur also the switch off the Nympha cluster and the disc arrays. The outage should not touch the Konos cluster. We will use this wiring occasion to reinstall the Minos cluster so we will make it accessible later. Sorry for inconveniences.

Petr Hanousek, Fri May 03 14:10:00 CEST 2013

12. 4. 2013 - Perian cluster/frontend outage + system upgrade

Let us inform you that due to an unexpected event in the Brno's server
room the perian frontend as well as cluster nodes are unavailable from
Friday.

We plan to utilize this outage to upgrade the system of the affected
nodes (Debian 5 -> Debian 6) -- once upgraded, the nodes will be
immediately returned back to the operation (starting by the frontend).
All the perian nodes should be available during the next week...

With apologies for the inconvenience and with thanks for your understanding.

 


Tomáš Rebok, Fri Apr 12 17:14:00 CEST 2013

11. 4. 2013 - Power outage in Prague server room and cluster Tarkil reinstallation

Today we suffered unexpected power outage in Prague's server room which resulted into shutdown of cluster Tarkil and the frontend tarkil.cesnet.cz. We apologize for interruption of the running jobs.

After the power supply restoration we have utilized the accident to make the planned reinstallation of the cluster and the frontend. The planned works including the move of certain services and possible migration of user data to the new disc array will take approximately a week. We will write a news note after the action. In the meantime please use other clusters and frontend, see our wiki for details.

During the reinstallation phase you will not be able to regularly access your data stored on the local discs of the affected machines. However if you need that data urgently, please contact our User support at meta@cesnet.cz e-mail address.


Petr Hanousek, Thu Apr 11 17:14:00 CEST 2013

5. 3. 2013 - New trouble ticketing system

On 5th March 2013 from 9:00 till approx 12:00 will be unavailable our trouble ticketing system (RT - rt3.cesnet.cz) due to necessary upgrade. During the outage will not be accessible neither the web nor the mail interface. E-mails sent during the outage (ie. for address meta@cesnet.cz) will be delivered after its end. We appologize for the half-day late response on requests.


Petr Hanousek, Tue Mar 05 17:08:00 CET 2013

22. - 25. 10. 2012 - Scheduled downtime in Pilsen

All computing nodes located in the computing room of ZČU (ajax, konos, minos[20-35], nympha) will be down for the period October 22-25 due to moving to the new server room. Currently jobs are held in queues. Running jobs will be killed on switching the machines off.

We are sorry for temporary unavailability of the resources.


Ivana Křenková, Mon Oct 22 16:25:00 CEST 2012

10.-11.10.2012 - Reconstruction of electrical wiring in Pilsen - afterworks

The takeover of work on switching Pilsen's UL011 to energocentrum was revealed serious defect - failure of some support systems (measurement and control). The repair take unfortunately another switch off (killing of running jobs). The works will take place on the night of Wednesday to Thursday, October 10, 2012 (21:00 - 5:00). Sorry for the inconvenience.


Petr Hanousek, Tue Oct 02 16:21:00 CEST 2012

14.9.2012 - Filled volume /storage/brno1

Volume /storage/brno1 is filled to 100 percent. Moreover, there is also probably damaged the file system, so the volume is not currently suitable for working with the data. Please use the volumes /storage/brno2 (11TB available) and /storage/plzen1 (27TB available) for your work. Unfortunately I cannot estimate the time needed for repair so far.

In this context I would like to ask you to delete all unnecessary files stored in mentioned volumes.


Petr Hanousek, Fri Sep 14 16:20:00 CEST 2012

19. - 20.9.2012 - Reconstruction of electrical wiring in Pilsen vol 2

On the night of 19 on September 20, 2012 will be reconstructed the wiring in a server room in Pilsen. Machines will be switched off in Wednesday 19th in the afternoon, launch is anticipated in Thursday 30th in the morning. From Thursday morning should be finally available the "long" queue on affected machines.

Besides mentioned clusters will be also unavailable disk volume /storage/plzen1.

We apologize for the temporary inconveniences.


Petr Hanousek, Thu Sep 13 16:09:00 CEST 2012

29.8.2012 - Delayed reconstruction of electrical wiring in Pilsen

Reported outage for tomorrow is canceled because of problems at the supplier's works. We will inform you about newly planned suspension through this channel. 'Long' queue on affected machines will remain closed for now.


Petr Hanousek, Wed Aug 29 16:05:00 CEST 2012

29.8. - 30.8.2012 - Reconstruction of electrical wiring in Pilsen

On the night of 29 on August 30, 2012 will be reconstructed the wiring in a server room in Pilsen. Machines will be switched off in Wednesday 29th in the afternoon, launch is anticipated in Thursday 30th in the morning. The "long" queue is already suspended for taking jobs on these machines, all possibly running jobs will be killed in the time of power down.

Besides mentioned clusters will be also unavailable disk volume /storage/plzen1.

We apologize for the temporary inconveniences.


Petr Hanousek, Wed Aug 22 11:27:00 CEST 2012