Přeskočit na obsah

New Job Scheduler in CERIT-SC

New Job Scheduler in CERIT-SC

CERIT-SC, together with MetaCentrum, have been evaluating practical drawbacks of the default job scheduler of Torque batch system for a long time. The result of a related research and development is a new job scheduler supporting (job) planning which, according to performed simulations, addresses the most critical drawbacks.

The new job scheduler will be deployed on the CERIT-SC infrastructure next week. Currently running jobs will not be affected.

The key features of the replacement scheduler are:

  • The scheduler is able to perform more efficient and safer backfilling the gaps in the schedule, caused by reserving many nodes for largely distributed jobs, could be filled-in by shorter jobs, thus reducing their waiting time and increasing the nodes' utilization
  • Based on the maintained schedule, the new scheduler is able to estimate job's start time as well as the nodes, which it will run on. The users can check, when and where their jobs start, which jobs will start before, etc.

The essential interaction with the batch system (e.g., qsub command) remains unchanged. The 'qstat' command and graphical interface will start displaying estimated time of job start.

The overview of current jobs schedule will be available at http://metavo.metacentrum.cz/schedule-overview/ and also in PBSmon as usually.

Minor differences are described at
https://wiki.metacentrum.cz/wiki/Manual_for_the_TORQUE_Resource_Manager_with_a_Plan-Based_Scheduler
In particular, do not submit to specific queues, the scheduler does not work with any queues by design (an exception are priority queues dedicated to ser groups according to explicit agreements).

Because deployment of a new job scheduler is a fairly major change in the infrastructure, the users are kindly requested to report any abnormal behaviour immediately to support@cerit-sc.cz. The support team will provide assistance with increased effort in the transition period.


Ivana Křenková, Thu Jul 17 12:40:00 CEST 2014