You can read this as RSS feed.
Actualization 10:50 AM
the storage is back in operation
--
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
update 1PM:
the disk array is back in operation
--
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
update 26.8., 15 PM: the disk array is back in operation and the data should be readable. Please report any problems. Thank you for your understanding.
update 26.8. from 10:30 AM: during this morning the disk array will be briefly unavailable, we are trying to re-access unreadable data. We apologize for the inconvenience.
update 20.8.:
We regret to inform you that we have been experiencing significant hardware issues with the /storage/brno12-cerit/ directory since Sunday.
A small part of the data in /storage/brno12-cerit is now inaccessible due to a failure on one of the disk arrays, attempting to read it is showing up as an Input/Output error (in terms of blocks of data this is about 1.1%, but since large files over 4MB are spread across multiple devices it is more likely that at least some of them are affected). The fault is being addressed by the manufacturer's support. So far, the data is not definitively lost, but we don't currently know when it will be made available, or whether it will all be OK in the end. If you need some of them quickly, it may be more efficient reload the data (if it was primary input) or recalculate what is needed.
Otherwise, right now /storage/brno12-cerit is running normally, and there's no particular reason to assume that other data is more at risk than usual (however, given the size of the repository, this is not independently backed up, certainly it is not intended for archival or otherwise irreplaceable data), except that there may still be some limitations on operation while the broken piece of hardware is repaired.
Please note that due to the priority to increase the maximum capacity offered, it is not possible to perform a full backup of all data on storage of this size.
To ensure full backups we would need to at least double the funding to purchase suitable HW. As the archive purposes cover the disk arrays of the CESNET Data Care departement, and the branch repositories are also being prepared within the EOSC project, we only backup on our disk arrays in the form of snapshots. These offer some protection in case a user inadvertently deletes some of his files. In general, data that existed same days before the accident can be restored. However, snapshots are stored on the same disk arrays as the data itself, so in the event of a hardware failure these backups may be lost :-(
https://docs.metacentrum.cz/data/metacentrum-backup/
We are very sorry, we try to do our best to get back the lost data together with the HW vendor.
If you need it very urgently, please send the jobs to the system once again. We are able to make your priority higher (to start jobs as soon as possible), if needed.
Thank you for your understanding.
-.
update 19.8.: update 19.8.: the disk array is only working in limited mode, with short outages. If possible, limit work on this array. We are trying to stabilize the situation.
update 18.8. at 8PM: the storage is back in operation
--
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Dear user,
A while ago, there was a network failure on the local network in Brno (a broken cable in Mendel University), which caused the unavailability of some computing clusters in this location (tyra+aman+zenon). We have reported the outage and are waiting for a replacement internet connection.
With apologies and thanks for your understanding
MetaCentrum team
In the archive repository /storage/du-cesnet/ (du4.cesnet.cz) a mechanical failure of the tape robot occurred in winter. Data is still being transferred to the object storage and access to the data on the tapes is very limited. After discussion with DU colleagues, we removed access to the mentioned storage from our machines (to speed up the transfer). If you need your data as a priority, please contact CESNET data storage at du-support@cesnet.cz.
We apologize for the inconvenience.
Thank you for your understanding,
your MetaCentrum Team
update: 23.5. at 9:30 a.m. back in operation
--
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Update May 13, 11:30: storage is fully back in operation
---
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Dear users,
on 19 - 21 April and 24 April in the afternoon/evening/night hours, software upgrades will take place in the backbone routers of the network. The outage will be at the times indicated and between 30 - 60 minutes (see attached schedule).
=======================================================================
*Friday 19.4.2023 17:00 - 21:00 * - Prague-Sitel, Plzeň1,2
*Friday 19.4.2023 20:00 - 00:00* - Jihlava
*Saturday 20.4.2023 15:00 - 19:00* - Prague - ÚMG - UJV Řež
*Saturday 20.4.2023 19:00 - 00:00* - Olomouc1,2 - České Budějovice
*Sunday 21.4.2023 00:00 - 05:00 - *Prague1 - Brno1
*Wednesday 24.4.2023 00:00 - 05:00 - *Praha2 - Brno2
We apologize for any inconvenience,
MetaCentrum
Dear user of MetaCentrum Cloud [1],
Today 11.3.2024 (Monday) in the morning and part of the afternoon (until approx. 18:00) the new instance of e-INFRA CZ G2 OpenStack cloud in Brno [1] is and will continue to be unavailable, there was an unplanned outage caused by planned cloud maintenance. Outage affects all API services, already running virtual servers remain functional. The main G1 OpenStack cloud in Brno [2] is not affected.
[1] https://brno.openstack.cloud.e-infra.cz/
[2] https://cloud.metacentrum.cz/ https://cloud.muni.cz/
We apologize for any inconvenience,
MetaCentrum Cloud team
Status update: as of 10AM, the disk array is back will full functionality
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
update 11:50 AM - the disk array is now fixed and available again
Dear users,
currently the disk array /storage/brno12-cerit/ is unavailable, we are working on fixing the problem. Also the zuphux frontend is unavailable.
If possible, use other storage and frontends for now.
Thank you for your understanding,
your MetaCentrum Team
Due to maintenance there will be a short outage on the /storage/brno2/ disk array on Saturday 19. 2. from 9 am.
During the outage it won't be possible to log in to the skirit, perian and onyx frontends and the PBS server meta-pbs.metacentrum.cz won't submit new jobs to the Brno cluster.
OnDemand will also be affected (using the home directory of /storage/brno2/).
We apologize for any inconvenience.
Dear users,
currently the brno2 storage is down due to yet unspecified disc error. This means also the skirit frontend is not accessible.
We are investigating the cause. If possible, use other storages and frontends meanwhile.
Thank you for your understanding,
your MetaCentrum Team
Dear users,
The https://usegalaxy.cz service will be migrated to the more stable environment of VMWare cluster on Thursday Aug 24. Existing user data will be migrated as well.
The service will become unavailable from 10 am CEST (after that time we do not guarantee correct migration of newer data, though), and the outage is expected to end in early afternoon. However, the IP address and DNS records are going to be changed as well, their propagation will take some time. Therefore, the service is expected to be fully available again from Friday Aug 25.
With apologies and thanks for understanding
Galaxy MetaCenter Team
Dear users,
on the 1st of September the elmo.elixir-czech.cz will be on downtime.
To access computational resources, please use any other frontend, see https://docs.metacentrum.cz/basics/concepts/#frontends-storages-homes
With apologies and thanks for understanding
MetaCenter Team
Dear user,
This afternoon (14 July) after 4PM there will be a short outage of data connection in Průhonice (ibot cluster). We have limited the submission of new jobs to this cluster, we will resume traffic as soon as the network connection is restored.
Running jobs that copy output back to the disk array will fail to do so, and data will remain in the scratch on the appropriate node where it was running. The data on the compute nodes can be accessed from any frontend using the following shortcut:
go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME FOR EXAMPLE tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz
With apologies and thanks for understanding
MetaCenter Team
Update: the storage is slow, we are working on a fix
------
Dear user,
Today afternoon (7 July) there was a HW failure of the /storage/brno1-cerit/ disk array. We are working on getting it back up and running in cooperation with the supplier.
Running jobs that copy output back to the array fail to do this, and the data remains in the scratch on the appropriate node where it was running. To access the data on the compute nodes, use the following shortcut:
go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME FOR EXAMPLE tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz
You can use other frontends (https://wiki.metacentrum.cz/wiki/Frontend) and disk arrays during the outage.
With apologies and thanks for understanding
MetaCenter Team
Dear user of Cloud MetaCentrum [1],
There will be a reconfiguration of the Metacenter OpenStack cloud block storage in order to increase its capacity scheduled on Tuesday 20.6. between 5:00 PM and 10:00 PM CET.
From our experience we know that even little configuration change may cause a short outage (10-30 minutes) in relation with approximately 3K volumes that are now allocated. VMs operations will not be affected, the Main OpenStack API will be available as well as the Horizon UI, Cinder block storage and API will be temporarily unavailable preventing volumes creation.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
Dear users,
we are sorry to announce that due to hardware failure the storage brno2 is down.
Consequently it is not possible to log in to frontends skirit, perian and onyx.
Currently we cannot tell whether/when the storage will be up again.
We will update you in this matter as soon as possible.
If you have any questions concerning your data and running jobs, contact us at meta@cesnet.cz.
We are very sorry for the inconvenience,
your MetaCentrum team.
Dear users,
On 12-15 May, there will be a planned shutdown of most servers in the server room at the FZÚ AV ČR due to regular annual inspection of the electricity. The outage will include all nodes of the luna cluster, including the luna frontend and the storage-praha6-fzu disk array. The outage will also be used to replace faulty RAM in some servers.
We apologize for the inconvenience,
Your MetaCentrum support team.
Update 03/27/2023: There is another problem, it will be fixed in a few hours. Please be patient. The disk array was returned to service in the afternoon the same day.
Update 03/24/2023: The /storage/brno2/ disk array is back in full operation. Data remains intact.
-----------
Dear user,
On Saturday afternoon (18 March) there was a HW failure of the /storage/brno2/ disk array. We are working on getting it back up and running in cooperation with the supplier. We are not yet able to say when the array will be operational. The supplier is proceeding carefully so that we do not lose the stored data.
It is not possible to log in to frontends where this array serves as /home (skirit, onyx) and the disk array cannot be accessed from elsewhere (from other frontends or nodes). OnDemand is also affected.
Running jobs that copy output back to the array fail to do this, and the data remains in the scratch on the appropriate node where it was running. To access the data on the compute nodes, use the following shortcut:
go_to_scratch JOB_NUMBER_INCLUDING_PBS_SERVER_NAME FOR EXAMPLE tarkil.grid.cesnet.cz$ go_to_scratch 79868.meta-pbs.metacentrum.cz
You can use other frontends (https://wiki.metacentrum.cz/wiki/Frontend) and disk arrays during the outage.
With apologies and thanks for understanding
MetaCenter Team
update
Metacentrum OpenStack (CESNET_MCC), Status 2022-10-21 9:00
Openstack is functional, but limited amount of servers/hypervisors running around 40 VMs are without a network. We are working on VM migrations where possible.
---
Dear user,
Today we are experiencing numerous short-term outages on the local network in Brno, which are causing short-term unavailability of the cerit-pbs scheduling system and some machines. The cause is being investigated by local network specialists.
With apologies and thanks for your understanding
MetaCentrum team
Dear users,
on Thursday 1st of September there will be power outage in the CEITEC server room. Consequently the cl;usters krux, lex and zubat as well as brno14-ceitec storage will be inaccessible. The downtime is planned to last between 5 a.m. and 12 a.m.
Jobs running on the affected clusters will be held by PBS to be run after the outage is over and no action on users' side is needed.
Jobs running elsewhere may be affected if they copy data to/from brno14-ceitec storage while the storage is down. If your jobs fail due to this reason at start, resubmit them after the outage is over. If your finising jobs fail due to the inability to copy results to brno14-ceitec, please fetch the files manually from scratch directory.
We apologize for the inconvenience,
your MetaCentrum support team.
Dear users,
on Thursday 14th July there will be power outage due to maintenance in the facilities of Technical university of Liberec. Consequently /storage/liberec3-tul, charon.nti.tul.cz frontend and charon cluster will be powered down. The downtime is planned to last the whole day.
No action is needed on the users' side. Jobs whose walltime would collide with the start of downtime will be held by PBS to be run after the outage is over.
We apologize for the inconvenience,
your MetaCentrum support team.
Dear users,
Due to an unplanned crash of the /storage/brno6/ disk array, which we were going to shut down in the next few days due to its age, we are forced to speed up this process. Most of your data from the /storage/brno6/ array can be found in the /storage/brno2/home/LOGIN/brno6/ directory.
The last full synchronization took place during the night from Wednesday to Thursday, and another partial synchronization took place during the downtime. Some of the data you uploaded to the array in the last few hours may not have been copied yet.
If we can get the old array back up and running, we will try to sync the newest data. Finally, the /storage/brno6/ disk array HW will be decommissioned without replacement, for working with data in Brno please use the /storage/brno2/ disk array, where the data have been transferred or any other disk array available in MetaCenter. Symlink /storage/brno6/ leads to the old field in the violation and will be deleted together with the HW shutdown.
We apologize for any inconvenience,
MetaCentrum
Dear user of Cloud MetaCentrum,
There is planned load and performance cloud infrastructure testing scheduled on Friday 2022-06-24 from 14:00 to 16:00 (CEST).
Planned testing scenarios should not affect/interrupt any cloud functionality, but will result in extensive infrastructure load visible to end users as additional OpenStack API and UI latences.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
update 3. 6. 2022 3 PM
After upgrading the disk array, there were problems with the new file system. The problem has been fixed and the array is available again, you can start using it.
On Thuersday, June 2, the disk arrays will be upgraded in Prague (capacity, redundancy, and speed increase), during which it will be necessary to stop the arrays for a short time.
If everything goes according to plan, short outages of the storage-vestec1 (= praha1) array can be expected. In the coming days, there should be a significant increase in available capacity.
We will try to minimize the impact on running jobs as much as possible.
At the same time, the quota for the size of stored data will be increased to 0.5T -> 2TB and the quota for the number of files to 2 million.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
update 24. 5. 2022
All OpenStack services are now available after the unplanned power outage from 2022-05-22.
You may now start your VMs. If you experience any issues, please contact us at cloud@metacentrum.cz.
We apologize for any inconvenience.
--
Dear user,
During the night of 22nd to 23rd May, there was an unplanned power failure in data centre A510 (FI MU Brno). The backup power supply did not come on.
Most of the systems in the datacenter are running again, the problem occures in MetaCentrum Cloud.
The outage also affects the zuphux.cerit-sc.cz frontend, some clusters and Rancher (Kubernetes), which run from the cloud.
We apologize for any inconvenience,
MetaCentrum team
Dear user of Cloud MetaCentrum,
On Wednesday, April 13, 2022, at 12:00 AM to 8:00 PM, a power outage is planned for part of the A510 datacenter. The outage should be uneventful (thanks to the backup power supply) and should last 1-2 hours. We do not anticipate any issues, but during a full outage, selected user vm's in openstack may be unavailable.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
Update:
The MetaCentrum OpenStack cloud [1] is experiencing an unplanned series of network outages after yesterday's reconfiguration of HW network elements. The estimated time when outages may still occur is Friday, April 8, 2022 from 8:00 AM to 8:00 PM.
This is an extension of the announced outage scheduled for April 7, 2022.
Thank you for your understanding,
MetaCenter Cloud Team
--
Dear user of Cloud MetaCentrum,
Let us inform you that Metacentrum OpenStack cloud [1] planned networking maintenance is scheduled on Thursday 2022-04-07 from 7:00 to 20:00 (CEST). We plan to improve network stability by upgrading cloud network switches firmware and reconfiguration. We expect OpenStack cloud API and UI functionality will be unaffected. Selected cloud hypervisors (and there located cloud user VMs) may suffer from short networking outages.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
On Monday, March 28, the storage-praha5-elixir disk array will be upgraded (capacity, redundancy, and speed increase, OS upgrade, IP addresses change). The storage will be temporarily shut down during the upgrade. Occasional unavailability of the storage can be expected during the day. We do not recommend using the field at that time.
Sorry for the inconvenience,
MetaCentrum
Dear user of Cloud MetaCentrum,
Let us inform you that Metacentrum OpenStack cloud [1] planned outage is scheduled on Friday 2022-03-04 from 14:00 to 16:00 (CET). The planned cloud improvements are migration of core controller servers to another resource pool and also production ipv6 address support.
We expect OpenStack cloud API and UI downtime will be up to 15 minutes. Users' running virtual servers will not be affected.
We apologize for any inconvenience,
MetaCentrum Cloud team
[1] cloud.metacentrum.cz, cloud.muni.cz, cloud.cerit-sc.cz
On Wednesday, January 26, the disk arrays will be upgraded in Prague (capacity increase), during which it will be necessary to stop the arrays for a short time.
If everything goes according to plan, short outages of the storage-vestec1 (= praha1) array in the morning and storage-praha5-elixir in the afternoon can be expected. In the coming days, there should be a significant increase in available capacity.
We will try to minimize the impact on running jobs as much as possible.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Dear user of Cloud MetaCentrum,
let us inform you about the planned upgrade of cloud 'Cloud MetaCentrum' (OpenStack) infrastructure which is scheduled on 12.1.2022 (DD.MM.YYYY) from 9:00 to 16:00. This upgrade is due preparation for adding support of IPv6 protocol.
We don't expect any issues. But any feedback about problems during upgrade is welcome.
We apologize for any inconvenience,
MetaCentrum Cloud team
On Thursday 16th starting at 7:00 a.m. there will be planned power outage in CEITEC server room. Consequently the clusters krux, zubat and lex, as well as perian frontend and brno9-ceitec storage, will be down. The planned outage duration is till 12 a.m.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
From Wednesday December 12 (6 PM) to Thuersday December 2 (12 AM), the old disk array /storage/brno6/, will be upgraded to a new hardware. Try to limit the work on this disk array. Running processes that use long-running files directly in /storage/brno6 may crash after switching.
/storage/brno6/
storage-brno6.metacentrum.cz
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Dear user of Cloud MetaCentrum,
Let us inform you about the planned outage of the API and dashboard component in cloud 'Cloud MetaCentrum' (OpenStack). This scheduled outage is due to an reverse proxy upgrade. This outage affects API and dashboard access to Openstack, virtual machines should not be affected. The outage is scheduled on 21.10.2021 (DD.MM.YYYY) in the time of 8:30 am - 16:00 am CEST (UTC+2:00).
We apologize for any inconvenience,
MetaCentrum Cloud team
The disk array /storage/budejovice1/home / and cluster hidlor are temporarily unavailable due to an unplanned power failure.
We try to locate and correct the defect in cooperation with local administrators.
We apologize for any inconvenience caused.
We apologize for any inconvenience caused.
The disk array /storage/budejovice1/home / is temporarily unavailable due to an unplanned network failure.
We try to locate and correct the defect in cooperation with local administrators. The /storage/budejovice1/ is temporarily unavailable. The storage itself is fully functional, you just can't access the data. We are unable to estimate downtime at this time.
We apologize for any inconvenience caused.
Data is transferred to the new HW, in case of problems do not hesitate to contact.
Quotas have been set for the number and size of files, by default 3 TB and 2 million files.
From Thursday July 29 to Sunday April 1, the old disk array /storage/brno2/, will be upgraded to a new hardware. Due to the huge amount of data, we estimate that the final synchronization will take several days, so please be patient. Try to limit the work on this disk array.
/storage/brno2/
storage-brno2.metacentrum.cz
Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.
List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Update April 26, 2021 - data is transferred to the new disk array. But there are occasional problems with the stability of the new disk array reported. We are working intensively to solve the stability problem. Please be patient.
Please check, whether your data on the new storage is complete. If not, you can copy it from the old storage, which has been renamed to storage-plzen1a.metacentrum.cz.
Please keep in mind that the storages cannot be operated interactively in a shell (see https://wiki.metacentrum.cz/wiki/Working_with_data#ssh_protocol). You can list the content of your home directory by the command
ssh user_name@storage-plzen1a.metacentrum.cz ls
You can fetch the data then
scp user_name@storage-plzen1a.metacentrum.cz:~/some_directory .From Thursday 22 to Sunday 25 April, the old disk array storage-plzen1.metacentrum.cz (/storage/plzen1/), serving as the /home for Pilsen's clusters, will be upgraded to a new hardware. Due to the huge amount of data, we estimate that the final synchronization will take several days, so please be patient. Try to limit the work on this disk array.
/storage/plzen1/
storage-plzen1.metacentrum.cz
Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.
List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery
With apologies for the inconvenience and with thanks for your understanding.
Yours,
On Wednesday, February 3, the old storage array storage-praha1.metacentrum.cz /storage/praha1/, serving as the /home for Prague's clusters, will be upgradet to a new hardware.
Influence on the running jobs:
Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.
List of storages: https://wiki.metacentrum.cz/wiki/NFS4_Servery
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Dear users,
let us inform you that on Saturday Dec 5 and Sunday Dec 6 will occurre a planned outage in Pague server room due to the repair of electrical wiring.
Tarkil cluster will be shut down for the duration of the repair. We will try to keep the /storage/praha1/ disk array in operation from a backup source.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Yours,
MetaCentrum
Dear users,
let us inform you that due to todays unexpected network outage in Pilsen and Ceske Budejovice. Some frontends, clusters and disk arrays maight be unavailable. We work on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
We apologize for any inconvenience caused.
The disk array /storage/praha1/home / is temporarily unavailable due to an unplanned HW/SW failure.
The outage also affected the frontend tarkil, as well as computing clusters with home directory on the disk array
(adan, luna, kalpa, tarkil, ...)
We apologize for any inconvenience caused.
Dear user of Cloud MetaCentrum,
Let us inform you about the planned outage of the network overlay in
cloud 'Cloud MetaCentrum' (OpenStack). This scheduled outage is
necessary due to an upgrade of the network overlay which cannot be
performed without downtime. The outage is scheduled on 16.07.2020
(DD.MM.YYYY) in the time of 8:00 am - 12:00 pm CEST (UTC+2:00). During
the outage, you will not be able to access your machines, nor your
machines will be able to access the internet. The computation of your
machines should not be affected.
We apologize for any inconvenience,
MetaCentrum Cloud team
Dear user of MetaCentrum Cloud.
Due to an upgrade of MetaCentrum Cloud (OpenStack) from Stein to Train release, OpenStack control plane will be unavailable on May 27th 2020. Outage will start at 8:00 AM CET and will continue until 6:00 PM CET of the same day. During the time of upgrade, the OpenStack API (including dashboard) will not be accessible. Virtual instances should be accessible and working throughout the outage, however it is not recommended to plan critical processes during that time.
We would like to inform you about a planned outage of all worker nodes luna at the weekend May, 16-17. The outage is due to a planned electricity shortage at the locality Slovanka.
We are going to shutdown all worker nodes luna on Saturday, May 16, morning at 6:00. The worker nodes luna will be available again on Monday, May 18, morning.
The disk arrays /storage/praha4-fzu/home and /storage/praha6-fzu/home/ will be on outage too.
Thank you for your understanding.
Best regards
MetaCentrum
The disk array storage-budejovice1.metacentrum.cz / storage / budejovice1 / home / is temporarily unavailable due to an unplanned HW/SW failure which has occured today in the night.
The outage also affected the frontend hildor, as well as computing clusters with home directory on the disk array.
Influence on the running jobs:
We apologize for any inconvenience caused.
Due to maintenance reasons there will be outage on disk fields /storage/brno2 and /storage/brno6 on 19. 2. between 13 and 14 PM. During the outage it will not be possible to log on to skirit and perian frontends and the PBS server meta-pbs.metacentrum.cz won't submit new jobs to Brno clusters.
We apologize for any inconvenience.
Please note that on 11.2, 10-14 h, there will be planned outage on the computational node charon.nti.tul.cz.
We apologize for any inconvenience.
Update: After noon, the network problem was resolved.
Repeated short failures of the university network segment in Brno cause failure of the cerit-pbs PBS server, non-updating of PBSmon application and partial outages of OpenStack.
We're working to fix the issue.
We apologize for any inconvenience.
let us inform us about the scheduled outage of the clusters carex.ibot.cas.cz and draba.ibot.cas.cz and the /storage/pruhonice1-ibot/ disk array in Průhonice on January 14 - 16 due to the planned HW upgrade.
We apologize for any inconvenience.
Dear MetaCentrum Cloud user,
let us inform you about the scheduled outage of MetaCentrum Cloud
(OpenStack) on December 16th (Monday) 2019 due to a major upgrade of the
OpenStack control plane (from Rocky version to Stein). The outage will
start at 7:00 AM (CET, UTC+1:00) and will continue until 6:00 PM of the
same day. During the time of upgrade, the OpenStack API (including
dashboard) will not be accessible. Virtual machines should be accessible
throughout the outage, however it is not recommended to run critical
processes during that time.
Thank you for your patience.
We apologize for any inconvenience.
We apologize for any inconvenience caused.
Influence on the running jobs:
We apologize for any inconvenience caused.
Dear user of Cloud2 MetaCentrum,
Let us inform you about the planned outage of the network overlay in cloud 'Cloud2 MetaCentrum' (OpenStack). This scheduled outage is necessary due to an upgrade of the network overlay which cannot be performed without downtime. The outage is scheduled on 4. 9. 2019 (DD.MM.YYYY) in the time of 7:00 am - 12:00 am CEST (UTC+2:00).
During the outage, you will not be able to access your machines, nor your machines will be able to access the internet. The computation of your machines should not be affected.
We apologize for any inconvenience.
Dear user of Cloud2 MetaCentrum,
Let us inform you about the planned outage of the network overlay in cloud 'Cloud2 MetaCentrum' (OpenStack). This scheduled outage is necessary due to an upgrade of the network overlay which cannot be performed without downtime. The outage is scheduled on 21.08.2019 (DD.MM.YYYY) in the time of 7:00 am - 10:00 am CEST (UTC+2:00).
During the outage, you will not be able to access your machines, nor your machines will be able to access the internet. The computation of your machines should not be affected.
We apologize for any inconvenience.
Let us inform you that due to a planned central diesel gregate revision in Jihlava's server room the du2.cesnet.cz (/storage/jihlava2-archive/) and ceph object storage will be temporarly unavailable on Wednesday, 17 July between 5 and 7 AM.
We apologize for any inconvenience caused.
MetaCentrum
Dear users,
let us inform you that due to todays unexpected network outage in Brno's server room some Brno's clusters and disk arrays are unavailable. We work on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Dear users,
let us inform you that due to todays unexpected heating outage (early morning) in Brno's server room some CERIT-SC clusters and disk array are unavailable. We work on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
We apologize for any inconvenience caused.
Dear users,
let us inform you that due to todays unexpected power or network outage (2 PM) in Prague's server room some CERIT-SC clusters and disk array are unavailable. We work on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Dear users,
let us inform you that due to a planned central switch firmware upgrade in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home will be aprox. 10 minutes unavailable on Wednesday, 20 February between 10 and 11 AM.
We apologize for any inconvenience caused.
MetaCentrum
Dear users,
let us inform you that due to a planned network connectivity upgrade in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home will be unavailable on Wednesday, 20 February.
We apologize for any inconvenience caused.
MetaCentrum
we are actually facing a problem with /storage/praha1/ file system. Unfortunately, some machines with /home on this storage (luna, tarkil) are not working properly. We apologize for any inconvenience caused.
On Wednesday, January 9, the old storage array storage-brno7-cerit.metacentrum.cz (/storage/brno7-cerit /) will be shut down.
Influence on the running jobs:
The storage array storage-brno6.metacentrum.cz (/storage /brno6/) have been back in operation since Friday January 4.
The failure of the disk array was very serious. Fortunately, much of the data was saved, but a small part of the data (primarily those manipulated at the time of the malfunction) could be lost or damaged.
Please check your data stored in the /storage/brno6/ file system.
Please note that large disk arrays are not completely backed up, only snapshots (stored in the same field) are performed. Therefore, the data is not protected in the event of a total failure of such a disk array (as in the case of brno6 from last month). If you have any data for archiving, keep the primary copy elsewhere, or entrust the data to the CESNET DataCare https://du.cesnet.cz/.
With apologies for the inconvenience and with thanks for your understanding.
Yours,
Dear users,
let us inform you that due to todays unexpected power outage in Prague's server room the local clusters luna and kalpa and disk array /storage/praha4-fzu/home are unavailable. The vendor works on the repair.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Due to repeated HW failure in /storage/brno6/, the data was moved to another storage /storage/brno1/, with the unchanged symlink /storage/brno6/.
The defective storage is being repaired by the vendor (replacement of the controller). Once repaired, the data will be returned to the original location.
Let us inform you that on Friday 23 the /storage/brno11-elixir/ (storage-brno11-elixir.metacentrum.cz) will be 10 minutes unavailable (between 3 and 4 PM) due to the HW upgrade.
Influence on the running jobs:
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
MetaCentrum
we are actually facing a HW problem with /storage/brno6/ file system. MetaCloud web page (OpenNebula https://cloud.metacentrum.cz/) is not working, from this reason, too. Update: back in operation since Nov 21
The problem with access to /storage/brno6/home/ persists.
We apologize for any inconvenience caused.
Dear CESNET MetaCentrum and Storage facility user,
We would like to inform you that the hierarchical storage in Pilsen (du1.cesnet.cz, /storage/plzen2-archive in MetaCentrum) will be permanently decommissioned.
If you have no data in this storage facility, this mail is not relevant for you. All your data from plzen2-archive will be transferred by storage administrators to a new storage facility.
This e-mail is to inform you about the plan and the schedule.
Data in Pilsen will be made permanently inaccessible for the users during the evening of 26th October. We'll start final synchronisation of recent changes to Ostrava storage facility, i.e., du4.cesnet.cz, /storage/du-cesnet in MetaCentrum (note the change in naming convention). The data will be inaccessible during the transfer period. We expect to make the data available in the new location in Ostrava again in the evening of 28th October. The data will be permanently available in Ostrava since then.
Kindly note new Data Storage Terms of Service (ToS) and the changes they introduce. Policies for archival (long-term) data and temporary backups have been distinguished. You can find full text of the ToS on https://du.cesnet.cz/en/provozni_pravidla/start, and we also have a short description of most important changes on https://du.cesnet.cz/en/navody/faq/start#handling_archives_and_backups. Both archive as well as backup policies are available to MetaCentrum users.
Data from Pilsen is considered an archive and it is handled as such.
If you have any questions or need any kind of help, please contact our user support (by replying to this mail and/or on support@cesnet.cz).
Thank you for your cooperation.
With kind regards,
Your CESNET MetaCentrum and Data Storage team
we are actually facing a problem with /storage/brno2/ file system. Unfortunately, some machines with /home on this storage are not working properly. In the meantime, please use machines in other localities or CERIT-SC machines in Brno (PBS server wagap-pro, frontend zuphux.cerit-sc.cz). We apologize for the inconveniences caused.
We apologize for any inconvenience caused.
We apologize for any inconvenience caused.
Actualization 2018-02-12 11 AM: AFS is working properly again
An AFS server crash occurred this weekend, also causing unexpected problems in the vicious part of the AFS subsystem. As a result of these failures, some volumes are not available on AFS (and also SW modules are not available) and can not be logged on to some computational nodes and frontends. We're working on the repair.
We apologize for any inconvenience caused.
Due to the failure of the network connectivity in the Brno location, there are no services requiring a network connection hosted in Brno - MetaCloud, Brno machines ... We are working on the remedy.
With apologies for the inconvenience and with thanks for your understanding.
MetaCentrum
Dear users,
MetaCentre administrators track the situation with recent bugs in processors (known as Meltdown and Specter, for more information see https://spectreattack.com/).
We evaluate the real impacts of infrastructure vulnerabilities. We have applied the available updates in the VMWare and MetaCloud environments. For part of the computational nodes we monitor available updates and evaluate their impact on the Metacentra environment (they are tested for performance limitations). The computing nodes are being updated gradually. If the situation requires, we could force the immediate restart of the computing resources and stop all active tasks. Especially for the upcoming long tasks, please consider postponing their execution at a later time, especially if your tasks can not be restarted.
We apologize for any inconvenience caused.
MetaCentrum
Dear users,
let us inform you that due to todays unexpected power outage in Prague's server room the local clusters luna and kalpa are unavailable.
The vendor works on the repair, the length of the outage can not be estimated.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
MetaCentrum
Let us inform you that on Thursday December 7 the /storage/budejovice1/ (storage-budejovice1.metacentrum.cz) will be moved to a new hardware and will be several hours unavailable during the final synchronization. Shared disk space at hildor*:/scratch.shared, mounted from this storage, will not be available too.
Influence on the running jobs:
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
MetaCentrum
Dear users,
Due to a bug in the new version of PBS Pro the walltime of almost all running jobs was reseted. The PBS Pro could not recognized the CPU usage, significantly overestimated the cpu usage time and jobs unexpectedly ended. We reported the error to PBS Pro developers and returned PBS Pro server to the previous version.
With apologies for the inconvenience and with thanks for your understanding.
Dear users,
Let us inform you that due to a planned power outage in Ceske Budejovice the clusters hildor/haldir/hagrid and disk array /storage/budejovice1/ will be temporary unavailable on Friday October 6 (7-10 AM). Unfortunately all running jobs will be terminated. Please copy the data you will need for your calculation during these few days to another disk array.
With apologies for the inconvenience and with thanks for your understanding.
Dear users,
Given a pressing need to update firmware in cloud nodes dukan19 through dukan25 we will have to briefly power off virtual machines using those nodes. The intervention is scheduled for Tuesday 25 July. Each node, hence each collocated virtual machine, will be powered off for approximately 20 minutes. We will boot the virtual machines afterwards. There will be no data loss. Affected users have been notified by e-mail.
With apologies for the inconvenience and with thanks for your understanding,
MetaCloud team
Dear users,
On Monday 5th June we are going to migrate virtual machines away from nodes dukan1-10. Affected machines will be powered off temporarily. There will be no data loss. Machines with private network addresses (currently in range 10.4.0.*) require special treatment. Given the current configuration of our network their private IP addresses will have to change. Please, look up the new IP addresses of your virtual machines through the MetaCloud interface after that date. Affected users have already been notified by e-mail.
Dear users,
Let us inform you that due to a planned power outage in Ceske Budejovice the clusters hildor/haldir/hagrid and disk array /storage/budejovice1/ will be temporary unavailable on Sunday June 4 (7:45-10 AM). Unfortunately all running jobs will be terminated. Please copy the data you will need for your calculation during these few days to another disk array.
With apologies for the inconvenience and with thanks for your understanding.
On May 11th, server zuphux will be restarted to a new OS version (Centos 7.3).
At the same time, the planning system in the Torque environment (@wagap) will no longer accept new jobs. Existing jobs will be counted on the remaining nodes. The remaining computational nodes in the Torque environment will be gradually converted to PBS Pro. Machines currently available in a PBS Pro environment are labeled by "Pro" in the PBSMon application https://metavo.metacentrum.cz/pbsmon2/nodes/physical .
Frontend zuphux.cerit-sc.cz will be set by default to PBSPro (@wagap-pro) environment.
With apologies for the inconvenience and with thanks for your understanding.
CERIT-SC users support
On Friday April 4, from 15:45, the frontend zuphux will be temporary unavaibale due to an unplanned emergency service of critical disk array controllers. Estimated time of the outage is 2 hours. Other frontends can be used during the outage:
https://wiki.metacentrum.cz/wiki/Frontend
Other services running from the affected disk array (Torque server @wagap and PBS Pro server @wagap-pro) will be migrated to another server on Thuersday evening, with some very short outages on Thuersday and Friday evenings.
With apologies for the inconvenience and with thanks for your understanding.CERIT_SC support
Dear users,
after the upgrade of the HSM storage-brno4-cerit-hsm.metacentrum.cz (the upgrade was realised by the vendor on February 14-15) unexpexted error occured, the HSM is particulary available. The vendor works on the repair, the length of the outage can not be estimated.
With apologies for the inconvenience and with thanks for your understanding.
Today (around 4 AM) occured an accident on watter cooling system in Pilsen, which affected all Pilsen computing nodes, frontends, and /storage/plzen1/. The machines are back in operation (Nevertheless, some related service works still occur...)
We apologize for any inconvenience caused.
Ivana Křenková,
MetaCentrum
Dear users,
after the upgrade of the HSM storage-brno4-cerit-hsm.metacentrum.cz (the upgrade was realised by the vendor on February 14-15) unexpexted error occured, the HSM is unavailable now. The vendor works on the repair, the length of the outage can not be estimated.
With apologies for the inconvenience and with thanks for your understanding.
Dear users,
Let us inform you that from Wednesday February 14 (9 AM) to February 15 (6 PM) the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to a security actualisation of the system.
IMPORTANT: The HSM still hosts data from Jihlava /storage/jihlava1-cerit/
Influence on the running jobs:
Let us inform you that on Monday January 23 the Prague's /storage/praha1/ (storage-praha1.metacentrum.cz) will be moved to a new hardware and will be several hours unavailable during the final synchronization. Shared disk space at *:/scratch.shared, mounted from this storage, will not be available too.
Influence on the running jobs:
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
MetaCentrum
Dear users,
the OpenNebula upgrade announced earlier will take place on 11 January. At that time, the front-end will be unavailable for some time, and virtual machines running in the dukan.ics.muni.cz cluster will be restarted as we update the nodes.
Please be aware that there may be issues especially with older virtual machines instantiated with the previous OpenNebula version (2015 and earlier). Please contact us (cloud@metacentrum.cz) in case of trouble.
Dear users,
Let us inform you that on Thuersday (Dec 15, 11PM - 2AM.) the Torque server wagap.cerit-sc.cz will be temporary unavailable due to a SW upgrade. Sending new jobs and manipulating with jobs in the system will not be allowed during the outage.
With apologies for the inconvenience and with thanks for your understanding.
Dear users,
Let us inform you that due to an unexpected power outage in Ceske Budejovice the clusters hildor/haldir/hagrid are temporary unavailable. Unfortunately all running jobs have been terminated.
With apologies for the inconvenience and with thanks for your understanding.
Let us inform you that the tarkil.cesnet.cz frontend is unavailable due to a migration to another HW. All running processes on the frontend were terminated.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum
Let us inform you that on Thuersday October 27 (10 AM) the Brno's /storage/brno3-cerit/ (storage-brno3-cerit.metacentrum.cz) will be moved to a new hardware.
Influence on the running jobs:
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova,
MetaCentrum & CERIT-SC
Let us inform you that on Tuesday (August 30, 10 PM - 0 AM) the zuphux frontend will be shortly unavailable due to a migration to another HW. All running processes on the frontend will be terminated during the outage.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum
Dear users,
Let us inform you that from July 25 to 29, hildor, haldir, hagrid clusters and disk array /storage/budejovice1/ will not be temporarily available due to moving to another server room. Please copy the data you will need for your calculation during these few days to another disk array.
With many thanks for understanding,
Ivana Krenkova
MetaCentrum
Dear users,
Let us inform you that due to a planned power outage in UK's Karolina server room the local servers eru1, eru2, acharon, AFS servers asterix, obelix, sal will be temporary unavailable tomorrow (April 27), 10-11 PM.
With apologies for the inconvenience and with thanks for your understanding.
Dear users,
CERIT-SC's resources in the OpenNebula MetaCloud (phys. nodes hda*) will be under maintenance this Thursday 21th April from 10:30pm. Your virtual machine(s) will be only paused (you won't loose your running state) and one by one resumed. Optimistic estimate is that each VM shouldn't be down for more than 30 minutes. Whole maintenance can take up to 2 hours.
Dear users,
let us inform you that due to a planned power outage in Brno's server room in UKB the local clusters lex, krux, zubat and disk arrays brno9-ceitec + brno10-ceitec-hsm will be temporary unavailable.
We apologize for any inconvience caused.
Dear users,
let us inform you that due to a unexpected air conditioning outage in Brno's CERIT-SC server room today in the morning, a part of local clusters zigur, zapat, and zebra has been switched off as a prevention of overheating. The computing nodes will be gradually returned back to normal operation. Unfortunatelly all running jobs on affected nodes have been terminated.
We apologize for any inconvience caused.
Dear users,
Let us inform you that due to an unexpected power outage the clusters hermes/hildor/haldir are temporary unavailable.
With apologies for the inconvenience and with thanks for your understanding.
Dear users,
Let us inform you the sendmail of the PBS server sent not actual error reports about terminated jobs via e-mails today in the night.
With apologies for the inconvenience and with thanks for your understanding.
Dear users,
Let us inform you that from Wednesday March 2 (9 AM) to March 3 (6 PM) the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to a security actualisation of the system.
*****************************************
IMPORTANT:
The HSM hosts data from Jihlava /storage/jihlava1-cerit/
*****************************************
Influence on the running jobs:
Dear users,
Let us inform you that on Tuesday, February 23 the Brno's /storage/brno6/ will be unavailable due to battery replacement by the supplier.
Influence on the running jobs:
Moreover, the user interface (Sunstone) as well as the programming interface (API) for MetaCloud will be unavailable for several hours. Existing virtual machines will not be affected! It will be, however, impossible to create new ones or manage existing ones during the outage.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková
MetaCentrum & CERIT-SC
Dear users,
Let us inform you that on Friday (February 12, 8:00 a.m.) the Hadoop cluster will be shortly unavailable due to SW upgrade:
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Dear users,
Let us inform you that on Thuersday (February 4, 11:00 a.m.) the Hadoop cluster will be shortly unavailable due to certificates change, machines reboot and preparation of the new experimental cluster based on containers.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Dear users,
Let us inform you that on Monday (July 25, 10:00 a.m.) the Hadoop cluster will be unavailable due to upgrade from CDH 5.5.1 to 5.8.0 (with Hadoop 2.6.0, and Spark 1.6.0) and due to Java environment upgrade.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Dear users,
A long-planned upgrade of the OpenNebula cloud manager will take place on 11 February. The user interface (Sunstone) as well as the programming interface (API) for MetaCloud will be unavailable for several hours. Existing virtual machines will not be affected! It will be, however, impossible to create new ones or manage existing ones during the outage. Please accept our apologies for the inconvenience this may cause you.
Dear users,
let us inform you that due to a planned upgrade of the network connection in the Institute of Physics of the Czech Academy of Sciences in Prague, the local clusters kalpa and luna + disk array /storage/praha4-fzu/ will be temporary unavailable at the veekend, 23-24 January.
We apologize for any inconvience caused.
Dear users,
let us inform you that due to todays unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Krenkova
MetaCentrum & CERIT-SC.
Dear users,
let us inform you that due to an unexpected power outage in Brno's server room in UKB the local cluster Perian was temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.
We apologize for any inconvience caused.
Dear users,
Let us inform you that yesterday in the evening (17-23 hrs.) due to a violation of the integrity of the KDC server database that operates Kerberos, some of database records were temporary unavailable. Unfortunately it caused problems with operations requiring Kerberos (typically saving data from running jobs to a /storage etc.).
Dear users,
Let us inform you that the MetaCloud front-end is unavailable due to a HW fault in its storage array. Virtual machines created beforehand are still operational, but new ones cannot be instantiated and you also cannot manage existing machines through the cloud management interface (OpenNebula). Thank you for your patience.
Dear users,
Let us inform you that From October 8 to 9 the Pilsen's /storage/plzen1/ will be unavailable due to moving on a new hardware
*****************************************
IMPORTANT
Portal GALAXY, hosted on the storage will be unavailable during the outage.
*****************************************
Influence on the running jobs:
Due to HW problems (being solved with original supplier), the zigur and zapat clusters will be available 1 month later, in the second half of October.
With many thanks for understanding.
--
Dear users,
From August 18, due to moving to Brno, zigur and zapat clusters and disk array /storage/jihlava1-cerit/ will not be available temporarily.
The clusters are covered by maintenance contract therefore the move will be done by the original supplier, approx. time of moving is a month (144 nodes of cluster plus disk array).
Influence on the running jobs:
With many thanks for understanding,
Ivana Krenkova
MetaCentrum & CERIT-SC
Dear users,
let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Krenkova
MetaCentrum & CERIT-SC.
Dear users,
Let us inform you that from Tuesday September 22 (10 AM) to Wednesday September 23 the Brno's /storage/brno4-cerit-hsm/ will be unavailable due to an actualisation of the system.
*****************************************
IMPORTANT
The HSM hosts data from Jihlava /storage/jihlava1-cerit/ and older /storage/brno1/. We strongly recommend you to transfer all data used in your jobs to another storage (for example /storage/brno6). In case you need any data from these archieval storages during the outage, please inform us in advance via e-mail meta@cesnet.cz.
*****************************************
Influence on the running jobs:
Dear users,
Let us inform you that from September 18 the Brno's /storage/brno4-cerit-hsm/ is not available due to an SW failure of HSM system. Major software patches (bug fixes) will be applied by the system vendor.
IMPORTANT: The HSM hosts data from Jihlava /storage/jihlava1-cerit/ and older /storage/brno1/ (/storage/home)
Dear users,
let us inform you that due to an unexpected power outage in Prague's server room the frontend and local cluster Tarkil, Mudrc, as well as the /storage/praha1 are temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.
We apologize for any inconvience caused.
Ivana Krenkova
MetaCentrum
Dear users,
Let us inform you that due to a power outage in Jihlava's server room today, the local cluster Doom, as well as the /storage/ostrava1/ are temporary unavailable. The computing nodes will be gradually returned back to normal operation later this day.
From August 24 to 31, due to moving to Brno, doom cluster and disk array /storage/ostrava1/ will not be available temporarily. Please copy to another disk storade date you will need for your calsulation during these few days.
With many thanks for understanding,
Ivana Krenkova
MetaCentrum
Let us inform you that on Monday, June 22 10AM, the skirit frontend will be shortly unavailable due to an upgrade. All running processes on the frontend will be terminated during the outage.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Dear users,
let us inform you that due to a planned outage of the network connection, frontend tarkil, cluster tarkil and disk array /storage/praha1/ will be temporally unavailable. Jobs running on the affected cluster or using the /storage/praha1/ will be temporarly suspended. Shortly before (and of course also during) the outage there will be no possibility to start a new job on the affected cluster.
Please, terminate all interactive jobs running from the tarkil frontend until Tuesday morning. All running processes on the frontend will be terminated during the outage.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Dear users,
Let us inform you that on Tuesday (June 25, 10:00 a.m.) the Hadoop cluster will be shortly unavailable due to a HW maintainance - replacing of CMOS battery on hador-c1.ics.muni.cz server.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum
Let us inform you that on Monday, May 18, the skirit frontend will be shortly unavailable due to an upgrade. All running processes on the frontend will be terminated during the outage.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum
Dear users,
let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.
We apologize for any inconvience caused.
Dear Users!
This is to inform you that there will be a scheduled downtime of the 'metacloud-dukan' cluster, part of the physical resources in MetaCloud. This will be the last in a series of outages that were required to extend, improve and physically move our cloud infrastructure. The downtime well begin on 24 March and end on 27 March. All virtual machines running on nodes dukan{1..10}.ics.muni.cz will be stopped. During the outage, the hypervisor will change from XEN to KVM, finally unifying hypervisors used on all resources across MetaCloud.
How to tell if the outage affects your virtual machines
Use the OpenNebula dashboard to display a list of all your virtual machines (Virtual Resources → Virtual Machines). The 'Host' column shows the physical node name for each VM. The outage will affect all virtual machines on nodes dukan{1..10}.ics.muni.cz. You may also filter the contents of the VMs table using the Search box on the top of the page.
What will happen with my virtual machines during the outage
All affected VMs must be stopped. It will be a great help to us if you can stop your own machines before end of business on Monday, 23 March. Otherwise, we will stop you VMs and move them to storage as the downtime you will be able to start your machines again. Since the hypervisor will change from XEN to KVM, some machines may fail to start properly. Therefore, do not hesitate to contact us in case any of your VMs acts strangely. Unfortunately, it is not possible to check for compatibility with KVM beforehand, and can be only done experimentally. Standard MetaCentrum images, however, are already tuned for KVM and are expected to cope without glitches.
Thank you for your understanding. Be assured that this is the last planned downtime for the foreseeable future.
Best regards, MetaCloud
Dear users,
let us inform you that due to todays unexpected power outage in Prague's server room the local cluste luna is temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum .
Dear users,
let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporary unavailable. The computing nodes will be gradually returned back to normal operation. Unfortunately all running jobs have been terminated.
We apologize for any inconvience caused.
Dear users,
let us inform you that due to todays unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Krenkova
MetaCentrum & CERIT-SC.
After moving nodes of the zewura SMP cluster (renamed to zebra1-12) to the new computer room some of the nodes appeare to exhibit very rare memory write failures under very intesive memory stress test. The problem is not reproducible, it occured only few times during several days of testing. We consider it almost impossible to occure in normal operation. The problem was reported to the supplier's technical support for futher detailed diagnostics.
Nodes are being returned to the normal operation. Despite the problems are not expected, we kindly ask the users for reporting any suspicious behaviour.
We apologize for any inconvenience caused.
Ivana Krenkova
MetaCentrum & CERIT-SC.
Let us inform you that from Wednesdey December 3 (8.30 AM) to Thuersday December 4 (20 PM) the Pilsen's /storage/plzen2-archieve/ and Brno's /storage/brno4-cerit-hsm/ will be unavailable due to an actualisation of the system. In case you need any data from these archieval storages during the outage, please inform us in advance via e-mail meta@cesnet.cz.
The other two archieval storages (/storage/jihlava2-archive and /storage/brno5-archive) will not be affected.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Krenkova
Dear users,
let us inform you that due to a planned power outage in the Jihlava's server room, the local clusters with property 'jihlava' will be temporarly unavailable on Friday 28.11.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Krenkova
MetaCentrum & CERIT-SC.
Today morning, the transfer of brno3-cerit data (temporarily stored in Jihlava) has been finished -- the brno3-cerit storage is now in normal operation mode.
Attention: Under specific circumstances (particularly, when your jobs have been finishing during synchronization), some data may not been synchronized -- if so, you'll find your data in Jihlava's location, actually available via /auto/jihlava1-cerit/brno3/export/home/$USER (please, transfer the missing data on your own -- we'll delete them after a few weeks).
With best regards
Tom Rebok.
Since we managed to repair the array /storage/brno3-cerit
, the data (temporarily hosted in Jihlava) will be returned back to Brno
*** on Wednesday, 29th of October ***
Since it is not possible to perform this transfer transparently, it is necessary to operate the /storage/brno3 array in a not fully consistent state for about 1-2 days.
To minimize the impacts of this transfer on you and your computations, it will be managed as follows:
Note: If you change particular data during Wednesday/Thursday in /storage/brno3/home/$LOGIN, the data can be overwritten by data synchronised/copied from Jihlava.
The running jobs should not be influenced by this transfer.
We are sorry for inconvenience.
With best regards and thanks for understanding,
Tomas Rebok,
MetaCentrum NGI.
Dear users,
let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum
Hierarchical storage in Brno /storage/brno4-cerit-hsm/ will be inaccessible on October 1, 2014, from 9 AM till 16 AM (expected). Major software patches (bug fixes) will be applied by the system vendor.
Because of several SW problems that have recently occured, the disk array /storage/brno2/, some frontends and nodes were not working properly today. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused.
Ivana Křenková,
MetaCentrum
Dear users,
let us inform you, due to an unexpected short power outage on the CERIT-SC server room last night (25.9., approx 9 PM) the the disk array /storage/brno3-cerit/ filesystem is not working properly. We work on data recovery at the moment. The user data (208 TB) are being coppied (temporary) to Jihlava (/auto/jihlava1-cerit/brno3/export), with expected time about 1 or 2 weeks (due to the huge volume of data). In case you need your data urgently, please contact us at meta@cesnet.cz, we will copy it with a higher priority.
Jihlava's disk array will serve temporary (during the Brno's disk array recovery) as /home for zewura and zegox clusters, and zuphux frontend. All accessible data will be available also via simlink /storage/brno3-cerit. All the data will return from Jihlava to Brno after the Brno's disk array recovery.
With apologies for the inconvenience and with thanks for your understanding,
MetaCentrum & CERIT-SC
Dear users,
let us inform you, due to an unexpected short power outage, the disk array /storage/brno3-cerit/ is temporarly unavailable today. We work on data recovery at the moment. In case you need your data very urgently, please contact us at meta@cesnet.cz, we ensure copying your data to another disk storage.
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková, MetaCentrum
Dear users,
let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum
Dear users,
today, another unexpected power outage has occured, this time in Brno server rooms. Because of this, the Brno part of MetaCentrum infrastructure has been paralyzed, including several central services hosted there (e.g., scheduler, license server, disk storages, ...). The jobs running during the outage had been unfortunately stopped.
Most of the nodes and services should be available now. However, a few power circuits couldn't be revived and a deeper inspection of power supplies should be performed in order to detect the failing ones -- thus, several services (e.g., license server and parts of the portal) still not work.
We're really sorry for the troubles caused -- unfortunately, we're pulling the shorter end of the rope in the fight "higher power" vs. man. :-(
Tom Rebok
MetaCentrum
Dear users,
let us inform you that due to an unexpected power outage in Ostrava's server room the local cluster Doom, as well as the /storage/ostrava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum
Let us inform you that on Tuesday (August 19, 11:00 p.m.) the skirit frontend will be shortly unavailable due to a SW upgrade. All running processes on the frontend will be terminated during the outage.
You can use any of the other frontends:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum
Dear users,
let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat, as well as the /storage/jihlava1 were temporarly unavailable. The computing nodes were already returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Tom Rebok
MetaCentrum & CERIT-SC.
Today, around 2p.m., there were some unexpected connectivity problems observed at server rooms of the University of West Bohemia, which affected our pilsen nodes as well. The major problems were noticed between 2pm and 2:30pm, however, some consequent minor problems could be noticed even after that time.
The connectivity should be already restored. (Nevertheless, some related service works still occur...)
We apologize for any inconvenience caused.
Tomáš Rebok,
MetaCentrum & CERIT-SC.
Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were partially temporarly unavailable. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum & CERIT-SC
Because of several SW problems that have recently occured, the disk array /storage/brno2/ and frontend skirit are not working properly today again.
We apologize for any inconvenience caused.
Ivana Křenková, MetaCentrum
Because of several SW problems that have recently occured, the disk array /storage/brno2/, some frontends and nodes were not working properly today. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused.
Ivana Křenková,
MetaCentrum
Let us inform you that on Saturday (March 23, 23:00 p.m.) the zuphux frontend will be shortly unavailable due to a SW upgrade (Debian 6 -> Debian 7). All running proccesses on the frontend will be terminated during the outage.
You can use any of the other frontends during the outage:
https://wiki.metacentrum.cz/wiki/Frontend
With apologies for the inconvenience and with thanks for your understanding.
Ivana Křenková,
MetaCentrum & CREIT-SC
Because of several HW/SW problems that have recently occured with the disk array /storage/brno1 (/storage/home), its complex service maintenance and SW upgrade has to be urgently performed.
Unfortunately, this maintenance cannot be performed on the live system; thus, the disk array has to be ***PUT OUT OF OPERATION*** (and made inaccessible)
on Tuesday, 25. February 2014 during morning hours
(The assumed shutdown duration is 1-2 days.)
Influence on the running jobs:
We're really sorry for the problems that may occur. Unfortunatelly, the current condition of the /storage/brno1 (/storage/home) disk array cannot be left untouched any more -- this would result in bigger problems in the future.
With many thanks for understanding
Tomáš Rebok.
Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable. The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
Ivana Křenková
MetaCentrum & CERIT-SC
Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat were temporarly unavailable.The computing nodes will be gradually returned back to normal operation, however, the running jobs had been unfortunately stopped.
We apologize for any inconvenience caused -- we're unable to influence these circumstances...
All computing nodes located in the computing room of ICS MU (with property "brno", except machines zewura [1-8]) will be down on Tuesday October 1st due to works on electric network extension for expected new cluster of the CERIT-SC center.
Long jobs queues (more than 4 days) were disabled on that clusters. All the other queues will be disabled later. Running jobs will be killed on switching the machines off. Please finish all jobs until end of September. Running jobs will be killed on switching the machines off.
At the same time, the frontend skirit.ics.muni.cz will not be available during the outage.
We are sorry for temporary unavailability of the resources.
On Monday between 9:00 a.m. and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ will be unavailable due to an actualisation of the system.
With apologies for the inconvenience and with thanks for your understanding.
The following machines were affected: zapat23 zapat98 zapat99 zapat100 zapat101 zapat111 zigur1 zigur3 zigur28 zigur30 zigur31
Let us inform you that due to an unexpected power outage in Jihlava's server room the local clusters Zigur and Zapat and disk array /storage/jihlava1-cerit are temporarly unavailable. Unfortunatelly all running jobs have been terminated.
With apologies for the inconvenience and with thanks for your understanding.
Let us inform you that today between 14:00 and 17:00 p.m. the Pilsen's /storage/plzen2-archieve/ can be shortly unavailable due to an actualisation of the system.
With apologies for the inconvenience and with thanks for your understanding.
Let us inform you that on Tuesday (June 18, 10:00 a.m.) the skirit frontend will be shortly unavailable due to a HW upgrade. At the same time the system will be upgraded (Debian 5 -> Debian 6).
You can use any of the other frontends during the outage:
With apologies for the inconvenience and with thanks for your understanding.
Let us inform you that on Tuesday (June 18, 10:00 a.m.) the skirit frontend will be shortly unavailable due to a HW upgrade. At the same time the system will be upgraded (Debian 5 -> Debian 6).
You can use any of the other frontends during the outage:
With apologies for the inconvenience and with thanks for your understanding.
Let us inform you that due to an unexpected event on air condition in the Pilzen's server room and overheating of the local clusters, machines Gram, Minos, Nympha, Konos, Ajax, and disk array /storage/plzen1 are unavailable from todays evening.
With apologies for the inconvenience and with thanks for your understanding.
Dnes došlo v důsledku servisního zásahu dodavatele k neplánovanému výpadku staršího brněnského diskového pole. Dočasně není dostupný /storage/brno1, /afs a SW moduly. Omlouváme se za nepříjemnosti.
On 5th March 2013 from 9:00 till approx 12:00 will be unavailable our trouble ticketing system (RT - rt3.cesnet.cz) due to necessary upgrade. During the outage will not be accessible neither the web nor the mail interface. E-mails sent during the outage (ie. for address meta@cesnet.cz) will be delivered after its end. We appologize for the half-day late response on requests.
All computing nodes located in the computing room of ZČU (ajax, konos, minos[20-35], nympha) will be down for the period October 22-25 due to moving to the new server room. Currently jobs are held in queues. Running jobs will be killed on switching the machines off.
We are sorry for temporary unavailability of the resources.
The takeover of work on switching Pilsen's UL011 to energocentrum was revealed serious defect - failure of some support systems (measurement and control). The repair take unfortunately another switch off (killing of running jobs). The works will take place on the night of Wednesday to Thursday, October 10, 2012 (21:00 - 5:00). Sorry for the inconvenience.
Volume /storage/brno1 is filled to 100 percent. Moreover, there is also probably damaged the file system, so the volume is not currently suitable for working with the data. Please use the volumes /storage/brno2 (11TB available) and /storage/plzen1 (27TB available) for your work. Unfortunately I cannot estimate the time needed for repair so far.
In this context I would like to ask you to delete all unnecessary files stored in mentioned volumes.
On the night of 19 on September 20, 2012 will be reconstructed the wiring in a server room in Pilsen. Machines will be switched off in Wednesday 19th in the afternoon, launch is anticipated in Thursday 30th in the morning. From Thursday morning should be finally available the "long" queue on affected machines.
Besides mentioned clusters will be also unavailable disk volume /storage/plzen1.
We apologize for the temporary inconveniences.
Reported outage for tomorrow is canceled because of problems at the supplier's works. We will inform you about newly planned suspension through this channel. 'Long' queue on affected machines will remain closed for now.
On the night of 29 on August 30, 2012 will be reconstructed the wiring in a server room in Pilsen. Machines will be switched off in Wednesday 29th in the afternoon, launch is anticipated in Thursday 30th in the morning. The "long" queue is already suspended for taking jobs on these machines, all possibly running jobs will be killed in the time of power down.
Besides mentioned clusters will be also unavailable disk volume /storage/plzen1.
We apologize for the temporary inconveniences.