News

Slurm version 23.02.5 is now available

We are pleased to announce the availability of Slurm version 23.02.5.

The 23.02.5 release includes a number of stability fixes and some fixes for notable regressions.

The SLURM_NTASKS environment variable that in 23.02.0 was not set when using --ntasks-per-node has been changed back to its 22.05 behavior of being set. The method that is is being set, however, is different and should be more accurate in more situations.

The mpi/pmi2 plugin now respects the SrunPortRange option, which matches the behavior of the mpi/pmix plugin as of 23.02.0.

The --uid and --gid options for salloc and srun have been removed. These options did not work correctly since the CVE-2022-29500 fix in combination with some changes made in 23.02.0.

Downloads are available here.

Slurm version 23.02.4 is now available

We are pleased to announce the availability of Slurm version 23.02.4.

The 23.02.4 release includes a number of fixes to Slurm stability and various bug fixes. Some notable fixes include fixing the main scheduler loop not starting on the backup controller after a failover event, a segfault when attempting to use AccountingStorageExternalHost, and an issue where steps could continue running indefinitely if the slurmctld takes too long to respond.

Downloads are available here.

Slurm version 23.02.3 is now available

We are pleased to announce the availability of Slurm version 23.02.3.

The 23.02.3 release includes a number of fixes to Slurm stability, including potential slurmctld crashes when the backup slurmctld takes over. This also fixes some issues when using older versions of the command line tools with a 23.02 controller.

Downloads are available here.

Slurm versions 23.02.2 and 22.05.9 are now available

We are pleased to announce the availability of Slurm versions 23.02.2 and 22.05.9.

The 23.02.2 release includes a number of fixes to Slurm stability, including a fix for a regression in 23.02 that caused openmpi mpirun to fail to launch tasks. It also includes two functional changes: Don't update the cron job tasks if the whole crontab file is left untouched after opening it with scrontab -e, and sort dynamic nodes and include them in topology after scontrol reconfigure or a slurmctld restart.

The 22.05.9 release includes a fix for a regression in 22.05.7 that prevented slurmctld from connecting to an srun running outside a compute node, and a fix to the upgrade process to 22.05 from 21.08 or 20.11 where pending jobs that had requested --mem-per-cpu could be killed due to incorrect memory limit enforcement.

Downloads are available here.

Slurm version 23.02.1 is now available

We are pleased to announce the availability of Slurm version 23.02.1.

The 23.02.2 release includes a number of fixes to Slurm stability,

Downloads are available here.

Slurm version 23.02 is now available

We are pleased to announce the availability of Slurm version 23.02.

To highlight some new features in 23.02:

  • Added a new (optional) RPC rate limiting system in slurmctld.
  • Added usage gathering for gpu/nvml (Nvidia) and gpu/rsmi (AMD) plugins.
  • Added a new jobcomp/kafka plugin.
  • Overhauled the "remote resources" (licenses) functionality managed through sacctmgr / slurmdbd, and introduce a new "lastconsumed" field that is intended to be frequently updated with the current usage as reported by "lmstat" or similar tools. This allows for better cooperative license usage, especially for systems with external workstations and other use that is not under Slurm's control.
  • Added a new "scrun" command which can serve as a OCI runtime proxy.
  • Support for --json/--yaml output from most Slurm commands, which has been extended to support additional filtering options.
  • Extended "configless" operation to allow for propagation of files referenced in "Include" directives.
  • Allow Slurm to create directories for stdout/stderr files. These can include substitution directives such as %j, %a, and %x.
The main Slurm documentation site has been updated to the new release as well.

Downloads are available here.

Slurm release candidate version 23.02rc1 available for testing

We are pleased to announce the availability of Slurm release candidate version 23.02rc1.

To highlight some new features coming in 23.02:

  • Added a new (optional) RPC rate limiting system in slurmctld.
  • Added usage gathering for gpu/nvml (Nvidia) and gpu/rsmi (AMD) plugins.
  • Added a new jobcomp/kafka plugin.
  • Overhauled the "remote resources" (licenses) functionality managed through sacctmgr / slurmdbd, and introduce a new "lastconsumed" field that is intended to be frequently updated with the current usage as reported by "lmstat" or similar tools. This allows for better cooperative license usage, especially for systems with external workstations and other use that is not under Slurm's control.
  • Added a new "scrun" command which can serve as a OCI runtime proxy.
  • Support for --json/--yaml output from most Slurm commands, which has been extended to support additional filtering options.
  • Extended "configless" operation to allow for propagation of files referenced in "Include" directives.
  • Allow Slurm to create directories for stdout/stderr files. These can include substitution directives such as %j, %a, and %x.
This is the first release candidate of the upcoming 23.02 release series, and represents the end of development for this release, and a finalization of the RPC and state file formats.

If any issues are identified with this release candidate, please report them through the Slurm Bugzilla against the 23.02.x version and we will address them before the first production 23.02.0 release is made.

Please note that the release candidates are not intended for production use.

A preview of the updated documentation can be found here.
Downloads are available here.

Slurm version 22.05.8 is now available

We are pleased to announce the availability of Slurm version 22.05.8.

This includes a number of minor to moderate bug fixes, including a regression when updating unsorted NodeName and NodeAddr lists, and an issue when upgrading from 21.08 with job_env or job_script set in AccountingStoreFlags.

Downloads are available here.

Slurm version 22.05.7 is now available

We are pleased to announce the availability of Slurm version 22.05.7.

This includes a number of minor to moderate bug fixes, including fixing an issue when upgrading to MariaDB >= 10.2.1 from an older release.

Downloads are available here.

Slurm version 22.05.6 is now available

We are pleased to announce the availability of Slurm version 22.05.6.

This includes a fix to core selection for steps which could result in random task launch failures, alongside a number of other moderate severity issues.

Downloads are available here.

Slurm version 22.05.5 is now available

We are pleased to announce the availability of Slurm version 22.05.5.

This fixes a number of moderate severity issues, alongside one unfortunate problem with the upgrade process for running jobs with the slurmstepd when using RPM-based installations. Please see Jason Booth's email to the slurm-users mailing list for further details, and ways to mitigate this problem.

Downloads are available here.

Slurm version 22.05.4 is now available

We are pleased to announce the availability of Slurm version 22.05.4.

This includes fixes to two potential crashes in the backfill scheduler, alongside a number of other moderate severity issues.

Downloads are available here.

Slurm version 22.05.3 is now available

We are pleased to announce the availability of Slurm version 22.05.3.

This release includes a number of low to moderate severity fixes made since the last maintenance release was made in June.

Downloads are available here.

Slurm version 22.05.2 is now available

We are pleased to announce the availability of Slurm version 22.05.2.

This includes one significant fix to prevent a potential slurmctld crash if an array job is submitted with a "--gres" request.

Downloads are available here.

Slurm version 22.05.1 is now available

We are pleased to announce the availability of Slurm version 22.05.1.

This includes one significant fix to an regression introduced in 22.05.0 issue that can lead to over-subscription of licenses. For sites running 22.05.0 the new "bf_licenses" option to SchedulerParameters will resolve this issue, otherwise upgrading to this new maintenance release is strongly encouraged.

Downloads are available here.

Slurm version 22.05 is now available

We are pleased to announce the availability of Slurm version 22.05.

To highlight some new features in 22.05:

  • Support for dynamic node addition and removal
  • Support for native Linux cgroup v2 operation
  • Newly added plugins to support HPE Slingshot 11 networks (switch/hpe_slingshot), and Intel Xe GPUs (gpu/oneapi)
  • Added new acct_gather_interconnect/sysfs plugin to collect statistics from arbitrary network interfaces.
  • Expanded and synced set of environment variables available in the Prolog/Epilog/PrologSlurmctld/EpilogSlurmctld scripts.
  • New "--prefer" option to job submissions to allow for a "soft constraint" request to influence node selection.
  • Optional support for license planning in the backfill scheduler with "bf_licenses" option in SchedulerParameters.

The main Slurm documentation site has been updated now as well.

Downloads are available here.

Slurm release candidate version 22.05rc1 available for testing

We are pleased to announce the availability of Slurm release candidate version 22.05rc1.

To highlight some new features coming in 22.05:

  • Support for dynamic node addition and removal
  • Support for native cgroup/v2 operation
  • Newly added plugins to support HPE Slingshot 11 networks (switch/hpe_slingshot), and Intel Xe GPUs (gpu/oneapi)
  • Added new acct_gather_interconnect/sysfs plugin to collect statistics from arbitrary network interfaces
  • Expanded and synced the set of environment variables available in the Prolog/Epilog/PrologSlurmctld/EpilogSlurmctld scripts
  • Added "--prefer" option to job submissions to allow for a "soft constraint" request to influence node selection
This is the first release candidate of the upcoming 22.05 release series, and represents the end of development for this release, and a finalization of the RPC and state file formats.

If any issues are identified with this release candidate, please report them through the Slurm Bugzilla against the 22.05.x version and we will address them before the first production 22.05.0 release is made.

Please note that the release candidates are not intended for production use.

A preview of the updated documentation can be found here.
Downloads are available here.

Slurm versions 21.08.8 and 20.11.9 are now available (CVE-2022-29500, 29501, 29502)

Slurm versions 21.08.8 and 20.11.9 are now available to address a critical security issue with Slurm's authentication handling.

SchedMD customers were informed on April 20th and provided a patch on request; this process is documented in our security policy.

CVE-2022-29500:
An architectural flaw with how credentials are handled can be exploited to allow an unprivileged user to impersonate the SlurmUser account. Access to the SlurmUser account can be used to execute arbitrary processes as root.
This issue impacts all Slurm releases since at least Slurm 1.0.0.
Systems remain vulnerable until all slurmdbd, slurmctld, and slurmd processes have been restarted in the cluster.
Once all daemons have been upgraded sites are encouraged to add "block_null_hash" to CommunicationParameters. That new option provides additional protection against a potential exploit.

CVE-2022-29501:
An issue was discovered with a network RPC handler in the slurmd daemon used for PMI2 and PMIx support. This vulnerability could allow an unprivileged user to send data to an arbitrary unix socket on the host as the root user.

CVE-2022-29502:
An issue was found with the I/O key validation logic in the srun client command that could permit an attacker to attach to the user's terminal, and intercept process I/O. (Slurm 21.08 only.)

Due to the severity of the CVE-2022-29500 issue, SchedMD has removed all prior Slurm releases from our download site.

SchedMD only issues security fixes for the supported releases (currently 21.08 and 20.11). Due to the complexity of these fixes, we do not recommend attempting to backport the fixes to older releases, and strongly encourage sites to upgrade to fixed versions immediately.

Downloads are available here.

Slurm version 21.08.7 is now available

We are pleased to announce the availability of Slurm version 21.08.7.

This includes a number of minor to moderate severity fixes that have accumulated since the last maintenance release was made two months ago.

Downloads are available here.

Slurm version 21.08.6 is now available

We are pleased to announce the availability of Slurm version 21.08.6.

This includes a number of fixes since the last maintenance release was made in December, including an import fix to a regression seen when using the 'mpirun' command within a job script.

Downloads are available here.

Slurm version 21.08.5 is now available

We are pleased to announce the availability of Slurm version 21.08.5.

This includes a number of moderate severity fixes since the last maintenance release a month ago.

And, as it appears to be en vogue to discuss log4j issues, I'll take a moment to state that Slurm is unaffected by the recent log4j disclosures. Slurm is written in C, does not use log4j, and Slurm's logging subsystems are not vulnerable to the class of issues that have led to those exploits.

Downloads are available here.

Slurm version 21.08.4 is now available (CVE-2021-43337)

Slurm version 21.08.4 is now available, and includes a series of recent bug fixes, as well as a moderate security fix.

Note that this security issue is only present in the 21.08 release series. Slurm 20.11 and older releases are unaffected.

SchedMD customers were informed of this issue on November 2nd and provided a fix on request; this process is documented in our security policy.

CVE-2021-43337:
For sites using the new AccountingStoreFlags=job_script and/or job_env options, an issue was reported with the access control rules in SlurmDBD that will permit users to request job scripts and environment files that they should not have access to. (Scripts/environments are meant to only be accessible by user accounts with administrator privileges, by account coordinators for jobs submitted under their account, and by the user themselves.)

Downloads are available here.

Slurm version 21.08.3 is now available

We are pleased to announce the availability of Slurm version 21.08.3.

This includes a number of fixes since the last release a month ago, including one critical fix to prevent a communication issue between slurmctld and slurmdbd for sites that have started using the new AccountingStoreFlags=job_script functionality.

Downloads are available here.

Slurm version 21.08.2 is now available

We are pleased to announce the availability of Slurm version 21.08.2.

There is one significant change include in this maintenance release: the removal of support for the long-misunderstood TaskAffinity=yes option in cgroup.conf. Please consider using "TaskPlugins=cgroup,affinity" in slurm.conf as an option.

Unfortunately a number of issues identified where the processor affinity settings from this now-unsupported approach would be calculated incorrectly, leading to potential performance issues.

SchedMD had been previously planning to remove this support in the next 22.05 release, but a number of issues reported after the cgroup code refactoring have led us to remove this now, rather than try to correct issues with what has not been a recommended configuration for some time.

Downloads are available here.

Slurm version 21.08.1 is now available

We are pleased to announce the availability of Slurm version 21.08.1.

For sites using scrontab, there is a critical fix included to ensure that the cron jobs continue to repeat indefinitely into the future.

Downloads are available here.

Slurm version 21.08 is now available

After 9 months of development and testing SchedMD is pleased to announce the availability of Slurm version 21.08!

Slurm 21.08 includes a number of new features including:

  • A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD.
  • Added "sacct -o SubmitLine" format option to get the submit line of a job/step.
  • Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them.
  • RS256 token support in auth/jwt.
  • Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support.
  • Further improvements to cloud node power state management.
  • A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options.
  • A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management.
  • Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue.
  • Added json/yaml output to sacct, squeue, and sinfo commands.
  • Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot.
  • Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast".
  • Added initial OCI container execution support with a new --container option to sbatch and srun.
  • Improved job step launch throughput.
  • Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua.
Please see the RELEASE_NOTES distributed alongside the source for further details.

Thank you to all customers, partners, and community members who contributed to this release.

As with past releases, the documentation available at https://slurm.schedmd.com has been updated to the 21.08 release. Past versions are available in the archive. This release also marks the end of support for the 20.02 release. The 20.11 release will remain supported up until the 22.05 release next May, but will not see as frequent updates, and bug-fixes will be targeted for the 21.08 maintenance releases going forward.

Downloads are available here.

Slurm version 21.08.0rc2 is now available

We are pleased to announce the availability of Slurm release candidate version 21.08.0rc2.

This is the second release candidate version of the upcoming 21.08 release series, and corrects a number of issues identified with rc1.

If any issues are identified with this release candidate, please report them through SchedMD's Bugzilla against the 21.08.x version and we will address them before the first production 21.08.0 release is made.

Please note that the release candidates are not intended for production use. Barring any late-discovered issues, the state file formats should not change between now and 21.08.0 and are considered frozen at this time for the 21.08 release.

A preview of the updated documentation can be found at here.

Downloads are available here.

Slurm version 21.08.0rc1 is now available

We are pleased to announce the availability of Slurm release candidate version 21.08.0rc1.

This is the first release candidate version of the upcoming 21.08 release series, and represents the end of development for the release cycle, and a finalization of the RPC and state file formats.

If any issues are identified with this release candidate, please report them through SchedMD's Bugzilla against the 21.08.x version and we will address them before the first production 21.08.0 release is made.

Please note that the release candidates are not intended for production use. Barring any late-discovered issues, the state file formats should not change between now and 21.08.0 and are considered frozen at this time for the 21.08 release.

A preview of the updated documentation can be found at here.

Downloads are available here.

Slurm version 20.11.8 is now available

We are pleased to announce the availability of Slurm version 20.11.8.

This includes a number of minor-to-moderate severity fixes.

Downloads are available here.

Slurm versions 20.11.7 and 20.02.7 are now available (CVE-2021-31215)

Slurm versions 20.11.7 and 20.02.7 are now available, and include a series of recent bug fixes, as well as a critical security fix.

SchedMD customers were informed of this issue on April 28th and provided a fix on request; this process is documented in our security policy.

CVE-2021-31215:
An issue was identified with environment handling within Slurm that can allow any user to run arbitrary commands as SlurmUser if the installation uses a PrologSlurmctld and/or EpilogSlurmctld script.

Downloads are available here.

Slurm version 20.11.6 is now available

We are pleased to announce the availability of Slurm version 20.11.6.

This includes a number of minor-to-moderate severity fixes, as well as improvements to the recently added job_container/tmpfs plugin.

Downloads are available here.

Slurm version 20.11.5 is now available

We are pleased to announce the availability of Slurm version 20.11.5.

This includes a number of moderate severity bug fixes, alongside a new job_container/tmpfs plugin developed by NERSC that can be used to create per-job filesystem namespaces.

Initial documentation for this plugin is available here.

Downloads are available here.

Slurm version 20.11.4 is now available

We are pleased to announce the availability of Slurm version 20.11.4.

This includes a workaround for a broken glibc version that erroneously prints a long-double value of 0 as "nan", which can corrupt Slurm's association state files.

Downloads are available here.

Slurm version 20.11.3 is now available; reverts to older step launch semantics

We are pleased to announce the availability of Slurm version 20.11.3.

This does include a major functional change to how job step launch is handled compared to the previous 20.11 releases. This affects srun as well as MPI stacks — such as Open MPI — which may use srun internally as part of the process launch.

One of the changes made in the Slurm 20.11 release was to the semantics for job steps launched through the 'srun' command. This also inadvertently impacts many MPI releases that use srun underneath their own mpiexec/mpirun command.

For 20.11.{0,1,2} releases, the default behavior for srun was changed such that each step was allocated exactly what was requested by the options given to srun, and did not have access to all resources assigned to the job on the node by default. This change was equivalent to Slurm setting the '--exclusive' option by default on all job steps. Job steps desiring all resources on the node needed to explicitly request them through the new '--whole' option.

In the 20.11.3 release, we have reverted to the 20.02 and older behavior of assigning all resources on a node to the job step by default.

This reversion is a major behavioral change which we would not generally do on a maintenance release, but is being done in the interest of restoring compatibility with the large number of existing Open MPI (and other MPI flavors) and job scripts that exist in production, and to remove what has proven to be a significant hurdle in moving to the new release.

Please note that one change to step launch remains — by default, in 20.11 steps are no longer permitted to overlap on the resources they have been assigned. If that behavior is desired, all steps must explicitly opt-in through the newly added '--overlap' option.

Downloads are available here.

Slurm version 20.11.2 is now available

We are pleased to announce the availability of Slurm version 20.11.2.

This resolves a critical regression from the recent 20.11.1 release which prevented both PMI and PMIx interfaces from functioning correctly.

Downloads are available here.

Slurm version 20.11.1 is now available

We are pleased to announce the availability of Slurm version 20.11.1.

This includes a number of fixes made in the month since 20.11 was initially released, including critical fixes to nss_slurm and the Perl API when used with the newer configless mode of operation.

Downloads are available here.

Slurm version 20.11 is now available

After 9 months of development and testing SchedMD is pleased to announce the availability of Slurm version 20.11.0!

Slurm 20.11 includes a number of new features including:

  • Overhaul of the job step management and launch code, alongside improved GPU task placement support.
  • A new "Interactive Step" mode of operation for salloc.
  • A new "scrontab" command that can be used to submit and manage periodically repeating jobs.
  • IPv6 support.
  • Changes to the reservation logic, with new options allowing users to delete reservations, allowing admins to skip the next occurance of a repeated reservation, and allowing for a job to be submitted and eligible to run within multiple reservations.
  • Dynamic Future Nodes - automatically associate a dynamically provisioned (or "cloud") node against a NodeName definition with matching hardware.
  • An experimental new RPC queuing mode for slurmctld to reduce thread contention on heavily loaded clusters.
  • SlurmDBD integration with the Slurm REST API.
Please see the RELEASE_NOTES distributed alongside the source for further details.

Thank you to all customers, partners, and community members who contributed to this release.

As with past releases, the documentation available at https://slurm.schedmd.com has been updated to the 20.11 release. Past versions are available in the archive. This release also marks the end of support for the 19.05 release. The 20.02 release will remain supported up until the 21.08 release next August, but will not see as frequent updates, and bug-fixes will be targeted for the 20.11 maintenance releases going forward.

Downloads are available here.

Slurm versions 20.02.6 and 19.05.8 are now available (CVE-2020-27745 and CVE-2020-27746)

Slurm versions 20.11.0rc2, 20.02.6 and 19.05.8 are now available, and include a series of recent bug fixes, as well as a fix for two security issues.

Note: the 19.05 release series is nearing the end of it's support lifecycle as we prepare to release 20.11 later this month. The 19.05.8 download link is under the 'Older Versions' page.

SchedMD customers were informed on October 29th and provided patches on request; this process is documented in our security policy.

CVE-2020-27745:
A review of Slurm's RPC handling code uncovered a potential buffer overflow with one utility function. The only affected use is in Slurm's PMIx MPI plugin, and a job would only be vulnerable if --mpi=pmix was requested, or the site has set MpiDefault=pmix in slurm.conf.

CVE-2020-27746:
Slurm's use of the 'xauth' command to manage X11 magic cookies can lead to an inadvertent disclosure of a user's cookie when setting up X11 forwarding on a node. An attacker monitoring /proc on the node could race the setup and steal the magic cookie, which may let them connect to that user's X11 session. A job would only be impacted if --x11 was requested at submission time. This was reported by Jonas Stare (NSC).

Downloads are available here.

Slurm version 20.11.0rc1 is now available

We are pleased to announce the availability of Slurm release candidate version 20.11.0rc1.

Slurm 20.11 includes a number of new features including:

  • Overhaul of the job step management and launch code, alongside improved GPU task placement support.
  • A new "Interactive Step" mode of operation for salloc.
  • A new "scrontab" command that can be used to submit and manage periodically repeating jobs.
  • IPv6 support.
  • Changes to the reservation logic, with new options allowing users to delete reservations, allowing admins to skip the next occurance of a repeated reservation, and allowing for a job to be submitted and eligible to run within multiple reservations.
  • Dynamic Future Nodes - automatically associate a dynamically provisioned (or "cloud") node against a NodeName definition with matching hardware.
  • An experimental new RPC queuing mode for slurmctld to reduce thread contention on heavily loaded clusters.
Please see the RELEASE_NOTES distributed alongside the source for further details.

This is the first release candidate version of the upcoming 20.11 release series, and represents the end of development for the release cycle, and a finalization of the RPC and state file formats.

If any issues are identified with this new release candidate, please report them through SchedMD's Bugzilla against the 20.11.x version and we will address them before the first production 20.11.0 release is made.

Please note that the release candidates are not intended for production use. Barring any late-discovered issues, the state file formats should not change between now and 20.11.0 and are considered frozen at this time for the 20.11 release.

A preview of the updated documentation can be found at here.

Downloads are available here.

Slurm version 20.02.5 is now available

We are pleased to announce the availability of Slurm version 20.02.5.

This includes an extended set of fixes of varying severity since the last maintenance release a month ago.

Downloads are available here.

Slurm version 20.02.4 is now available

We are pleased to announce the availability of Slurm version 20.02.4.

This includes an extended set of fixes of varying severity since the last maintenance release was made more than two months ago.

Downloads are available here.

Slurm versions 20.02.3 and 19.05.7 are now available (CVE-2020-12693)

Slurm versions 20.02.3 and 19.05.7 are now available, and include a series of recent bug fixes, as well as a fix for a security issue with the optional message aggregation feature.

SchedMD customers were informed on May 7th and provided a patch on request; this process is documented in our security policy.

CVE-2020-12693:
A review of what was intended to be a minor cleanup patch uncovered an underlying race condition for systems with Message Aggregation enabled. This race condition could allow a user to launch a process as an arbitrary user.
This is only an issue for systems with Message Aggregation enabled, which we expect to be a small number of Slurm installations in practice.
Message Aggregation is off in Slurm by default, and is only enabled by MsgAggregationParams=WindowMsgs=, where is greater than 1.
(Using Message Aggregation on your systems is not a recommended configuration at this time, and we may retire this subsystem in a future Slurm release in favor of other RPC aggregation techniques. Although care must be taken before disabling this to avoid communication issues.)

Downloads are available here.

Slurm version 20.02.2 is now available

We are pleased to announce the availability of Slurm version 20.02.2.

This includes a series of moderate and minor fixes since the last maintenance releases for both branches.

Downloads are available here.

Slurm versions 20.02.1 and 19.05.6 are now available

We are pleased to announce the availability of Slurm versions 20.02.1 and 19.05.6.

This includes a series of minor fixes since the last maintenance releases for both branches.

Please note that the 19.05.6 release is expected to be the the last maintenance release of that branch (barring any critical security issues) as our support team has shifted their attention to the 20.02 release. Also note that support for the 18.08 release ended in Februrary; SchedMD customers are encourage to upgrade to a supported major release (20.02 or 19.05) at their earliest convenience.

Downloads are available here.

Slurm version 20.02.0 is now available

We are pleased to announce the availability of Slurm version 20.02.0.

Highlights of the 20.02 release include:

  • A "configless" method of deploying Slurm within the cluster, in which the slurmd and user commands can use DNS SRV records to locate the slurmctld host and automatically download the relevant configuration files.
  • A new "auth/jwt" authentication mechanism using JWT, which can help integrate untrusted external systems into the cluster.
  • A new "slurmrestd" command/daemon which translates a new Slurm REST API into the underlying libslurm calls.
  • Packaging fixes for RHEL8 distributions.
  • Significant performance improvements to the backfill scheduler, as well as to string construction and processing.

Thank you to all customers, partners, and community members who contributed to this release.

As with past releases, the documentation has been updated to the 20.02 release. Past versions are available in the archive. This release also marks the end of support for the 18.08 release. The 19.05 release will remain supported up until the 20.11 release in November, but will not see as frequent updates, and bug-fixes will be targeted for the 20.02 maintenance releases going forward.

Downloads are available here.

Slurm version 20.02.0rc1 is now available

We are pleased to announce the availability of Slurm release preview version 20.02.0rc1.

This is the first preview of the upcoming 20.02 release series, and marks the finalization of the RPC and state file formats.

This rc1 also includes the first version of the Slurm REST API, as implemented in the new slurmrestd command / daemon. The slurmrestd command acts as a REST proxy to the libslurm internal API, and can be used alongside the new auth/jwt authentication mechanism to integrate Slurm into external systems.

A high-level overview of some of the new features and other changes in the 20.02 release was presented at SLUG'19, and is archived here.

The Release Notes also include a summary of the major changes.

If any issues are identified with this new release candidate, please report them through Bugzilla against the 20.02.x version and we will address them before the first production 20.02.0 release is made.

A preview of the updated documentation can be found here. Once 20.02 is released, the main documentation page will be switched over to this newer content.

Downloads are available here.

Slurm version 20.02.0pre1 is now available

We are pleased to announce the availability of Slurm release preview version 20.02.0pre1.

This is the first preview of the upcoming 20.02 release series, and represents the end of development for the release cycle. The first release candidate — 20.02.0rc1 — is expected out next week, and will mark the finalization of the RPC and state file formats.

A high-level overview of some of the new features and other changes in the 20.02 release was presented at SLUG'19, and is archived here.

The Release Notes also include a summary of the major changes.

If any issues are identified with this new release candidate, please report them through Bugzilla against the 20.02.x version and we will address them before the first production 20.02.0 release is made.

A preview of the updated documentation can be found here. Once 20.02 is released, the main documentation page will be switched over to this newer content.

Downloads are available here.

Slurm versions 19.05.5 and 18.08.9 are now available (CVE-2019-19727 and CVE-2019-19728)

Slurm versions 19.05.5 and 18.08.9 are now available, and include a series of recent bug fixes, as well as a fix for two moderate security vulnerabilities discussed below.

SchedMD customers were informed on December 11th and provided a patch on request; this process is documented in our security policy.

CVE-2019-19727:
Johannes Segitz from SUSE reported that slurmdbd.conf may be installed with insecure permissions by certain Slurm packaging systems.

Slurm itself — as shipped by SchedMD — does not manage slurmdbd.conf directly, but the slurmdbd.conf.example sets a poor example by installing itself with 0644 permissions instead of 0600 in both the slurm.spec and slurm.spec-legacy packaging scripts.

Sites are encourage to verify that the slurmdbd.conf file - which usually will contain your MySQL user and password - is secure on their clusters. Note that this configuration file is only needed by the slurmdbd primary (and optional backup) servers, and does not need to be accessible throughout the cluster.

CVE-2019-19728:
Harald Barth from the KTH Royal Institute of Technology reported that "srun --uid" may not always drop into the correct user account, and instead will print a warning message but launch the tasks as root.

Note that "srun --uid" is only available to the root user, and that this issue is only shown by a race condition between successive lookup calls within the srun client command. SchedMD does not recommend use of the "srun --uid" option (e.g., it does not load the target user's environment but will export the root users) and may remove this option in a future release.

Downloads are available here.

Slurm version 19.05.4 is now available, SC19

Slurm version 19.05.4 is now available, and includes a series of fixes since 19.05.3 was released last month ago.

For those of you who will be at SC19 in Denver: we hope to see you at the Slurm booth (#1571), and at the Slurm "Birds of a Feather" session on Thursday, November 21st, from 12:15 - 1:15pm, in rooms 401/402/403/404. As always, there will be a number of presentations at the Slurm booth - please check the display in the booth for the schedule.

Downloads are available here.

Slurm version 19.05.3 is now available

Slurm version 19.05.3 is now available, and includes a series of minor bug fixes since 19.05.2 was released nearly two months ago.

Downloads are available here.

Slurm version 19.05.2 is now available

Slurm version 19.05.2 is now available, and includes a series of minor bug fixes since 19.05.1 was released over a month ago.

Downloads are available here.

Slurm versions 19.05.1 and 18.08.8 are now available (CVE-2019-12838)

Slurm versions 19.05.1 and 18.08.8 are now available, and include a series of recent bug fixes, as well as a fix for a security vulnerability (CVE-2019-12838) related to the 'sacctmgr archive load' functionality.

While fixes are only available for the currently supported 19.05 and 18.08 releases, similar vulnerabilities affect past versions as well and sites are encourage to upgrade to a supported version.

SchedMD customers were informed on June 26th and provided a patch on request; this process is documented in our security policy.

Downloads are available here.

Slurm version 19.05.0 is now available

We are pleased to announce the availability of Slurm version 19.05.0.

Highlights of the 19.05 release include:

  • The new select/cons_tres plugin, which introduces new GPU-specific job submission options, and extends Slurm's backfill scheduling logic to cover resources beyond just cpus and memory.
  • A new NSS library - nss_slurm - has been developed, which can provide directory info for the job step's user to local processes.
  • Heterogeneous Job support on Cray Aries systems.
  • A new "Association" priority factor, and corresponding PriorityWeightAssoc setting, providing for an alternative approach to establishing relative priority values between groups.
  • Two new plugin APIs intended for sites to customize their Slurm installations: cli_filter and site_factor.

Thank you to all customers, partners, and community members who contributed to getting this release done.

As with past releases, the documentation available at https://slurm.schedmd.com has been updated to the 19.05 release. Past versions are available in the archive. This release also marks the end of support for the 17.11 release. The 18.08 release will remain supported up until the 20.02 release in February, but will stop receiving as frequent updates, and bug-fixes will be targeted for the 19.05 maintenance releases going forward.

Downloads are available here.

Slurm release candidate version 19.05.0rc1 available for testing

We are pleased to announce the availability of Slurm release candidate version 19.05.0rc1.

This is the first release candidate version of the upcoming 19.05 release series, and represents the end of development for the release cycle, and a finalization of the RPC and state file formats.

If any issues are identified with this new release candidate, please report them through bugs.schedmd.com against the 19.05.x version and we will address them before the first production 19.05.0 release is made.

Please note that the release candidates are not intended for production use. Barring any late-discovered issues, the state file formats should not change between now and 19.05.0 and are considered frozen at this time for the 19.05 release.

A preview of the updated documentation can be found here.

Downloads are available here.

Slurm version 18.08.7 is now available

We are pleased to announce the availability of Slurm version 18.08.7.

This includes over 20 fixes since 18.08.6 was released last month, include one for a regression that caused issues with "sacct -J" not returning results correctly.

Downloads are available here.

Slurm version 18.08.6 is now available, as well as 19.05.0pre2, and Slurm on GCP Update

We are pleased to announce the availability of Slurm version 18.08.6, as well as the second 19.05 release preview version 19.05.0pre2.

The 18.08.6 includes over 50 fixes since the last maintenance release was made five weeks ago.

The second preview of the 19.05 release - 19.05.0pre1 - is meant to highlight additional functionality coming with the new select/cons_tres plugin, alongside other recent development work. Please consult the RELEASE_NOTES file for a detailed list of changes made to date.

Please note that preview releases are meant for testing and development only, and should not be used in production, are not supported, and that you cannot migrate to a newer release from these without potential loss of data and your job queues.

I'd also like to call attention to some of our recent work in partnership with Google. There's a blog post today highlighting some of this recent work both on Slurm and with the slurm-gcp integration scripts: HPC made easy: Announcing new features for Slurm on GCP

Downloads are available here.

Slurm versions 17.11.13 and 18.08.5 are now available

Slurm versions 17.11.13 and 18.08.5 are now available, and include a series of recent bug fixes, as well as a fix for a security vulnerability (CVE-2019-6438) on 32-bit systems. We believe that 64-bit builds - the overwhelming majority of installations - of Slurm are not affected by this issue.

Downloads are available here.

While fixes are only available for the supported 17.11 and 18.08 releases, similar vulnerabilities affect 32-bit builds on past versions as well. The only resolution is to upgrade Slurm to a fixed release.

SchedMD customers were informed on January 16th and provided a patch on request; this process is documented in our security policy.

Slurm versions 18.08.4 is now available

We are pleased to announce the availability of Slurm version 18.08.4.

This includes over 70 fixes since 18.08.3 was released in October.

Downloads are available here.

Slurm versions 18.08.3 and 17.11.12 are now available

We are pleased to announce the availability of Slurm versions 18.08.3 and 17.11.12.

These versions include a fix for a regression introduced in 18.08.2 and 17.11.11 that could lead to a loss of accounting records if the slurmdbd was offline. All sites with 18.08.2 or 17.11.11 slurmctld processes are encouraged to upgrade them ASAP.

Downloads are available here.

Slurm versions 18.08.2 and 17.11.11 are now available, as well as 19.05.0pre1

We are pleased to announce the availability of Slurm versions 18.08.2 and 17.11.11, as well as the first 19.05 release preview version 19.05.0pre1.

These versions include a fix for a regression introduced in 18.08.1 and 17.11.10 that prevented the --get-user-env option from working correctly, alongside a few other minor changes.

The first preview of the 19.05 release - 19.05.0pre1 - is meant to highlight additional functionality coming with the new select/cons_tres plugin. Further details on this are in the presentation from SLUG'18 which will be online (along with the rest of the SLUG'18 presentations) in the next week. Please note that preview releases are meant for testing and development only, and should not be used in production, are not supported, and that you cannot migrate to a newer release from these without potential loss of data and your job queues.

Downloads are available here.

Slurm versions 18.08.1 and 17.11.10 are now available

We are pleased to announce the availability of Slurm versions 18.08.1 and 17.11.10.

This includes an extensive set of fixes made since 18.08.0 was released at the end of August, and for 17.11.10 since 17.11.9 was released at the start of August.

Please note that the 17.11.10 release is expected to be the the last maintenance release of that series (barring any critical security issues) as our support team has shifted their attention to the 18.08 release. Also note that support for 17.02 ended in August; SchedMD customers are encourage to upgrade to a supported major release (18.08 or 17.11) at their earliest convenience.

Downloads are available here.

Slurm version 18.08.0 is now available

We are pleased to announce the availability of Slurm release version 18.08.0.

Downloads are available here.

Thank you to all customers, partners, and community members who contributed to getting this release done.

As with past releases, the documentation as been updated to the 18.08 release. Past versions are available in the archive. This release also marks the end of support for the 17.02 release. The 17.11 release will remain supported up until the 19.05 release next spring, but will stop receiving as frequent updates, and bug-fixes will be targeted for the 18.08 maintenance releases going forward.

Slurm 18.08 includes contributions by (alphabetically by last name): Danny Auble, Dan Barke, Dominik Bartkiewicz, Jason Booth, Bill Brophy, Thomas Cadeau, Brian Christiansen, Jeff Frey, Broderick Gardner, Marshall Garey, Isaac Hartung, Michael Hinton, Doug Jacobsen, Morris Jette, Boris Karasev, Ben Matthews, Felip Moll, Jessica Nettelblad, Artem Polyakov, Alejandro Sanchez, Marcin Stolarek, Tim Wickberg, Yair Yarom.

Slurm release candidate version 18.08.0rc1 is now available

We are pleased to announce the availability of Slurm release candidate version 18.08.0rc1.

This is the first release candidate version of the upcoming 18.08 release series, and represents the end of development for the release cycle, and a finalization of the RPC and state file formats.

If any issues are identified with this new release candidate, please report them through https://bugs.schedmd.com against the 18.08.x version and we will address them before the first production 18.08.0 release is made.

Please note that the release versions are not intended for production use. Barring any late-discovered issues, the state file formats should not change between now and 18.08.0 and are considered frozen at this time for the 18.08 release.

Downloads are available here.

Slurm version 17.11.9 is now available

We are pleased to announce the availability of Slurm version 17.11.9.

This includes 10 fixes made since 17.11.8 was released last month, including a fix to prevent hung srun processes that can manifest during massively parallel jobs.

Downloads are available here.

Slurm pre-release version 18.08.0pre2 is now available

We are pleased to announce the availability of Slurm pre-release version 18.08.0pre2.

This is the second pre-release version of the upcoming 18.08 release series, and represents a working snapshot of recent developments. Interested parties are encouraged to test this out ahead of the RPC, ABI, and state file format freezes that will occur when the first release candidate is made available later next week.

If any issues are identified with this new pre-release, please report them through https://bugs.schedmd.com against the 18.08.x version and we will try to address them before the first release candidate is made available.

Please note that the pre-release versions are not indented for production use, and that the state file formats are still subject to change. As such, you should not expect to safely transition between the pre-release versions and the eventual release. This is intended for local testing, especially of unusual system configurations, ahead of the RPC freeze and the first production 18.08.0 release.

Downloads are available here.

Slurm version 17.11.8 is now available

We are pleased to announce the availability of Slurm version 17.11.8.

This includes over 30 fixes made since 17.11.7 was released at the end of May. This includes a change to the slurmd.service file used with systemd, this fix prevents systemd from destroying the cgroup hierarchies slurmd/slurmstepd have created whenever "systemctl daemon-reload" is called (e.g., by yum/rpm).

Downloads are available here.

Slurm version 17.02.11 and 17.11.7 are now available

Slurm versions 17.02.11 and 17.11.7 are now available, and include a series of recent bug fixes, as well as a fix for a recently discovered security vulnerability (CVE-2018-10995).

Downloads are available here.

While fixes are only available for the supported 17.02 and 17.11 releases, we believe similar vulnerabilities do affect past versions as well. The only resolution is to upgrade Slurm to a fixed release.

SchedMD customers were informed on May 16th and provided a patch on request. This is in keeping with our responsible disclosure process.

Slurm version 17.11.6 is now available

We are pleased to announce the availability of Slurm version 17.11.6.

This includes over 50 fixes made since 17.11.5 was released eight weeks ago, including a race condition within the slurmstepd that can lead to hung extern steps.

Downloads are available here.

Slurm version 17.02.10 and 17.11.5 are now available

Slurm versions 17.02.10 and 17.11.5 are now available, and include a series of recent bug fixes, as well as a fix for a recently discovered security vulnerability (CVE-2018-7033).

Downloads are available here.

Several issues were discovered with incomplete sanitization of user-provided text strings, which could potentially lead to SQL injection attacks against SlurmDBD itself. Such exploits could lead to a loss of accounting data, or escalation of user privileges on the cluster.

We believe that variations on these vulnerabilities exist in all past SlurmDBD implementations back to Slurm 1.3 when the SlurmDBD was introduced, continuing through the current supported stable releases (17.02 and 17.11).

SchedMD customers were informed on March 1st and provided a patch on request. This is in keeping with our responsible disclosure process.

The only safe mitigation, aside from installing these updated versions, is to disable slurmdbd on your system.

One additional note: some sites have reported issues when upgrading to the Slurm 17.11 release series while using MySQL version 5.1 (which was the default in RHEL 6) and older. SchedMD customers are encouraged to contact support before upgrading such systems, and/or to upgrade their MySQL installation ahead of a SlurmDBD upgrade to 17.11.

Slurm version 17.11.4 is now available

We are pleased to announce the availability of Slurm version 17.11.4.

This includes roughly 38 fixes made since 17.11.3 was released earlier this month.

Downloads are available here.

Slurm version 18.08.0-pre1 available

Slurm version 18.08.0-pre1 is the first pre-release of version 18.08, to be released in August 2018. Slurm downloads are available from http://www.schedmd.com/#repos

Slurm version 17.11.3 is now available

We are pleased to announce the availability of Slurm version 17.11.3.

This includes over 44 fixes made since 17.11.2 was released last month, including one issue that can result in stray processes when a job is canceled during a long-running prolog script.

Downloads are available here.

Slurm version 17.11.2 is now available

We are pleased to announce the availability of Slurm version 17.11.2.

Notably we found an issue with the way auto_increment was working in MySQL where it would loose the offset of the id field making it possible that an older dynamic TRES (i.e. gres/gpu) would be overwritten by the new billing TRES. This issue only affects those running 17.11.[0|1]. Please review the RELEASE_NOTES file for more information.
Slurm can be downloaded from here.

Slurm version 17.11.1 is now available

We are pleased to announce the immediate availability of Slurm version 17.11.1.

This includes roughly 40 fixes made since 17.11.0 was released three weeks ago, including one critical fix for any systems running on big-endian platforms.

Slurm can be downloaded from here.

Slurm version 17.11.0 is now available

After 9 months of development and testing we are pleased to announce the availability of Slurm version 17.11.0!

As usual this can be downloaded from here.

Thanks again for all the help and support to get this out the door. It was fun to see most of you at SC17!

Slurm version 17.11.0rc3 is now available

We are pleased to announce the availability of Slurm version 17.11.0-0rc3 (release candidate 3).

The release candidate series reflects the end of feature development for each release, the finalization of the RPC layer, and will - except for bug fixes developed during the RC time frame - will be functionally identical to the 17.11.0 release when made available.

Please install and test this to help find any issues during the rc stage before the 17.11.0 release at the end of November.

Downloads are available here.

Slurm version 16.05.11, 17.02.9, and 17.11.0rc2 are now available

Slurm versions 16.05.11, 17.02.9 and 17.11.0rc2 are now available, and include a series of recent bug fixes as well as a fix for a recently discovered security vulnerability (CVE-2017-15566).

Downloads are available here.

Ryan Day (LLNL) reported an issue in SPANK environment variable handling that could allow any normal user to execute code as root during the Prolog or Epilog. All systems using a Prolog or Epilog script are vulnerable, regardless of whether SPANK plugins are in use.

This issue affects all Slurm versions from 15.08.0 (August 2015) to present. This issue was reported to SchedMD on October 16th. SchedMD customers were informed on October 17th and provided a patch on request. This is in keeping with our responsible disclosure process.

The only mitigation, aside from installing a patched version, is to disable both Prolog and Epilog settings on your system and restart all slurmd processes.

Release notes follow below. Please note that support for the 16.05 release series ends in November as support for the upcoming 17.11 release starts, and as such 16.05.11 will be the final maintenance update for that branch.

Note that 17.11.0rc2 is the second release candidate for the 17.11 series, and is not considered a stable release suited for production use. We do encourage sites to test this out, and report issues ahead of the 17.11.0 release in November.

Slurm version 17.11.0-0rc1 is now available

We are pleased to announce the availability of Slurm version 17.11.0-0rc1 (release candidate 1). This release marks the end of development on the next major version of Slurm, 17.11, only bug fixes will be added going forward to this branch.

We anticipate another rc release before a .0 is tagged in November. Please install and test this to help find any issues during the rc stage.

Slurm can be downloaded from here.

Slurm version 17.02.8 available

We are pleased to announce the release of Slurm version 17.02.8, which contains 42 bug fixes developed over the past two months.

Slurm can be downloaded from here.

Slurm versions 17.02.7 and 17.11.0-pre2 are now available

Slurm version 17.02.7 contains about 35 bug fixes developed over the past six weeks.

Slurm version 17.11.0-pre2 is the second pre-release of version 17.11, to be released in November 2017.

Slurm downloads are available from http://www.schedmd.com/#repos

Slurm version 17.02.6 is now available

Slurm version 17.02.6 is now available. It contains several bug fixes, including one which can result in communications between the slurmctld and slurmdbd daemons stopping.

Slurm downloads are available from http://www.schedmd.com/#repos

Slurm versions 17.02.5 and 17.11.0-pre1 are now available

Slurm version 17.02.5 contains 18 bug fixes developed over the past month.

Slurm version 17.11.0-pre1 is the first pre-release of version 17.11, to be released in November 2017. This version contains the support for scheduling of a workload across a set (federation) of clusters which is described in some detail here.

Slurm downloads are available from here.

Slurm version 17.02.4 is now available

We are pleased to announce the release of Slurm version 17.02.4, which contains about 40 fixes developed over the past month.

Slurm can be downloaded from here.

Slurm version 17.02.3 is now available

We are pleased to announce the release of Slurm version 17.02.3, which contains 40 bug fixes developed over the past month.

Slurm can be downloaded from here.

Slurm version 17.02.2 available

We are pleased to announce the release of Slurm version 17.02.2, which contains 49 bug fixes developed over the past month.

Slurm can be downloaded from here.

Slurm versions 17.02.1 and 16.05.10 are now available.

We are pleased to announce the release of versions 17.02.1 and 16.05.10.

Version 17.02.1 contains 19 bug fixes discovered over the past week including a deadlock in the slurmctld daemon. Version 16.05.10 contains 30 relatively minor bug fixes discovered over the past 5 weeks. Future changes to version 16.05 will be limited to more significant bugs with our focus being shifted to version 17.02.

Both versions can be downloaded from here.

Slurm version 17.02.0 is now available

After 9 months of development we are pleased to announce the availability of Slurm version 17.02.0.

For a description of what has changed please consult the RELEASE_NOTES file available in the source.

Slurm downloads are available here.

Slurm versions 16.05.9 and 17.02.0-0rc1 are now available

We are pleased to announce the availability of Slurm versions 16.05.9 and 17.02.0-0rc1 (release candidate 1).

16.05.9 contains around 25 rather minor bug fixes. Please upgrade at your leisure.

The rc release contains all of the features intended for release 17.02. Development has ended for this release and we are continuing with our testing phase which will most likely result in another rc before we tag 17.02.0 near the middle of February. A description of what this release contains is in the RELEASE_NOTES file available in the source. Your help in hardening this version is greatly appreciated. You are invited to download this version and assist in testing. As with all rc releases you should be able to install and not worry about protocol/state changes going forward with the version.

Slurm downloads are available from here.

Slurm versions 15.08.13, 16.05.8, and 17.02.0-pre4 are now available

Slurm versions 15.08.13, 16.05.8 and 17.02.0-pre4 are now available, and include a series of recent bug fixes as well as a fix for a recently discovered security vulnerability (CVE-2016-10030).

During a code review on a recent commit, a vulnerability was discovered in how the slurmd daemon informs users of a Prolog failure on a compute node. That vulnerability could allow a user to assume control of an arbitrary file on the system. Any exploitation of this is dependent on the user being able to cause or anticipate the failure (non-zero return code) of a Prolog script that their job would run on.

This issue affects all Slurm versions from 0.6.0 (September 2005) to present. This issue was discovered on December 16th. SchedMD customers were informed on December 21st and provided a version of the fix on request.

Workarounds to prevent exploitation of this is are to either disable your Prolog script, or modify it such that it always returns 0 ("success") and adjust it set the node as down using scontrol instead of relying on the slurmd to handle that automatically. If you do not have a Prolog set you are unaffected by this issue.

Downloads are available at here.

Slurm version 16.05.7 is now available

We are pleased to announce the immediate availability of Slurm 16.05.7. It contains about 40 relatively minor bug fixes.

Slurm downloads are available from here.

Slurm versions 16.05.6 and 17.02.0-pre3 are now available

Slurm version 16.05.6 is now available and includes around 40 bug fixes developed over the past month.

We have also made the third pre-release of version 17.02, which is under development and scheduled for release in February 2017.

Slurm downloads are available from here.

We are excited to see you all next month at SC16, please feel free to come by our booth #412.

The Slurm BoF will be Thursday, November 17th12:15pm - 1:15pm in room 355-E

More information about that can be found here.

Slurm versions 16.05.5 and 17.02.0-pre2 are now available

Slurm version 16.05.5 is now available and includes about 50 bug fixes developed over the past six weeks. We have also made the second pre-release of version 17.02, which is under development and scheduled for release in February 2017. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 16.05.4 is now available

Slurm version 16.05.4 is now available and includes about 30 bug fixes developed over the past few weeks.

Slurm downloads are available from here.

Slurm 16.05.3 and 17.02.0-pre1 are now available

Slurm version 16.05.3 is now available and includes about 30 bug fixes developed over the past few weeks. We have also released the first pre-release of version 17.02, which is under development and scheduled for release in February 2017. A description of the changes in each version is appended.

Slurm downloads are available from here.

Slurm versions 16.06.3 and 17.02.0-pre1 are now available

Slurm version 16.05.3 is now available and includes about 30 bug fixes developed over the past few weeks. We have also relesed the first pre-release of version 17.02, which is under development and scheduled for release in February 2017. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 16.05.2 is now available

Slurm version 16.05.2 is now available and includes 16 bug fixes developed over the past week, including two which can cause the slurmctld daemon to crash. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 16.05.1 is now available

Slurm version 16.05.1 is now available and includes 40 bug fixes developed over the past month. Slurm downloads are available from http://www.schedmd.com/#repos.

CFP: Slurm User Group Meeting

You are invited to submit an abstract of a tutorial, technical presentation or site report to be given at the Slurm User Group Meeting 2016. This event is sponsored and organized by SchedMD and the Greek Research and Technology Network (GRNET) and will be held in Athens, Greece on 26-27 September 2016. This international event is opened to those who want to: Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler Share their knowledge and experience with other users and administrators Get detailed information about the latest features and developments Share requirements and discuss future developments Everyone who wants to present their own usage, developments, site report, or tutorial about Slurm is invited to send an abstract to slugc@schedmd.com

Slurm 16.05.0 and 15.08.12 are now available

We are pleased to announce the release of 16.05.0! It contains many new features and performance enhancements. Please read the RELEASE_NOTES file to get an idea of the new items that have been added. The online Slurm documentation has been updated to reflect this release.

We have also release one of the last tags of 15.08 in the form of 15.08.12.

Both versions can be downloaded from the normal spot.

Slurm version 16.05.0-rc2 available

Slurm version 16.05.0-rc2 (Release Candidate 2) is now available and includes about 11 bug fixes developed over the past week. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.11 and 16.05.0-rc1 now available

We are pleased to announce the availability of Slurm versions 15.08.11 and 16.05.0-rc1 (release candidate 1).

15.08.11 contains around 25 rather minor bug fixes. Please upgrade at your leisure.

The rc release contains all of the features intended for release 16.05. Development has ended for this release and we are continuing with our testing phase which will most likely result in another rc before we tag 16.05.0 near the end of the month. A description of what this release contains is in the RELEASE_NOTES file available in the source. Your help in hardening this version is greatly appreciated. You are invited to download this version and assist in testing.

Slurm downloads are available from here.

Slurm versions 15.08.10 now available

Slurm version 15.08.10 is now available and includes 10 bug fixes developed over the past week including a race condition that could cause the slurmctld daemon to crash. Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.9 and 16.05.0-pre2 now available

Slurm versions 15.08.9 and 16.05.0-pre2 now available Slurm version 15.08.9 is now available and includes about 40 bug fixes developed over the past six weeks. Details about the changes are listed in the distribution's NEWS file. Slurm version 16.05.0-pre2 is also available and includes new development for the next major release in May. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.8 and 16.05.0-pre1 now available

Slurm version 15.08.8 is now available and includes about 30 bug fixes developed over the past four weeks. Details about the changes are listed in the distribution's NEWS file. Slurm version 16.05.0-pre1 is also available and includes new development for the next major release in May. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 15.08.7 is now available

We are pleased to announce the availability of Slurm version 15.08.7. It contains 46 relatively minor bug fixes you may find interesting. Slurm downloads are available from here.

Slurm version 15.08.6 is now available

We are pleased to announce the availability of Slurm version 15.08.6. This release is primarily in response to the regression in 15.08.5 with respects to finding the lua library. It also contains a few other minor bug fixes you may find interesting. Slurm downloads are available from here.

We hope everyone has a great holiday and thanks for a great year!

Slurm version 15.08.5 is now available

We are pleased to announce the availability of Slurm version 15.08.5 which includes about 30 bug fixes developed over the past few weeks as listed below. Slurm downloads are available from here.

Slurm version 15.08.4 is now available

Slurm version 15.08.4 now available it includes about 25 bug fixes developed over the past couple of weeks.

One notable fix is found in commits 8e66e2677 and d72f132d42 which will fix a slurmctld bug in which a pending job array could be canceled by a user different from the owner or the administrator. This appears to exist in the 15.08.* as well as the 14.11.* branches.

It is recommended you update at your earliest convenience. If upgrading isn't an option generating a patch from those 2 commits is recommended.

Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from here.

See you all at SC15 next week, Slurm booth #1851!

Slurm version 15.08.3 now available

Slurm version 15.08.3 now available Slurm version 15.08.3 includes about 25 bug fixes developed over the past couple of weeks. Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm version 15.08.2 now available

Slurm version 15.08.2 includes about 40 bug fixes developed over the past four weeks. Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.1 is now available

We are pleased to announce the availability of Slurm version 15.08.1 with about 40 bug fixes to 15.08.0. A list of changes is available in the NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 15.08.0 and 14.11.9 have been released!

We are pleased to announce the release of 15.08.0! It contains many new features and performance enhancements. Please read the RELEASE_NOTES file to get an idea of the new items that have been added. The on-line Slurm documentation has been updated to reflect this release.

We have also release one of the last tags of 14.11 in the form of 14.11.9.

Both versions can be downloaded from the normal spot here.

Slurm version 15.08.0-0rc1 is now available

We are pleased to announce the availability of Slurm version 15.08.0-rc1 (release candidate 1). This version contains all of the features intended for release 15.08 (with the exception of some minor burst buffer work) and we are moving into a testing phase. You are invited to download this version and assist in testing.

Slurm downloads are available from here.

If you would like to find out more about these new features and others, please join us at the Slurm User Group meeting.

Slurm versions 14.11.8 and 15.08.0-pre6 are now available

Slurm version 14.11.8 includes about 30 relatively minor bug fixes developed over the past seven weeks while version 15.08.0-pre6 contains new development scheduled for release next month. Details about the changes are listed in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.11.7 and 15.08.0-0pre5 now available

Slurm version 14.11.7 is now available with quite a few bug fixes.

A development tag for 15.08 (pre5) has also been made. It represents the current state of Slurm development for the release planned in August 2015 and is intended for development and test purposes only. One notable enhancement included is the idea of Trackable Resources (TRES) for accounting for cpu, memory, energy, GRES, licenses, etc.

Both are available for download here.

Slurm versions 14.11.6 is now available

Slurm version 14.11.6 is now available with quite a few bug fixes. See the distribution's NEWS file for details. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.11.5 and 15.08.0-pre3 are now available

Version 14.11.5 contains quite a few bug fixes generated over the past five weeks including two high impact bugs. There is a fix for the slurmdbd daemon aborting if a node is set to a DOWN state and it's "reason" field is NULL. The other important bug fix will prevent someone from being able to kill a job array belonging to another user.

Version 15.08.0-pre3 represents the current state of Slurm development for the release planned in August 2015 and is intended for development and test purposes only. Notable enhancements include power capping support for Cray systems and add the ability for a compute node to be allocated to multiple jobs, but restricted to one user at a time.

Both versions can be downloaded from here.

Slurm version 14.11.4 and 15.08.0-pre2 are now availabl

Version 14.11.4 contains quite a few bug fixes generated over the past five weeks. Several of these are related to job arrays, including one that can cause the slurmctld daemon to abort.

Version 15.08.0-pre2 represents the current state of Slurm development for the released planned in August 2015 and is intended for development and test purposes only. It includes some development work for burst buffers, power management, and inter-cluster job dependencies.

Both versions can be downloaded from here.

Slurm versions 14.11.3 is now available

Slurm versions 14.11.3 is now available. Version 14.11.3 includes quite a few bug fixes, most of which are relatively minor. There were also a few more major issues fixed that previously would cause various daemons to seg fault in corner case scenarios.

It is encouraged anyone running 14.11 to upgrade to 14.11.3. It is also encouraged everyone else to do the same :).

The new tarball can be downloaded here.

Slurm version 14.11.2 and 15.08.0-pre1 are now available

Slurm versions 14.11.2 and 15.08.0-pre1 are now available. Version 14.11.2 includes quite a few relatively minor bug fixes.

Version 15.08.0 is under active development and its release is planned in August 2015. While this is the first pre-release there is already quite a bit of new functionality.

Both versions can be downloaded from here.

Slurm version 14.11.is now available

Slurm version 14.11.is now available. This includes a fix for a race condition that can deadlock the slurmctld daemon when job_submit plugins are used, plus a few minor changes as identified in the distribution's NEWS file. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.11.0 is now available

Slurm version 14.11.0 is now available. This is a major Slurm release with many new features. See the RELEASE_NOTES and NEWS files in the distribution for detailed descriptions of the changes, a few of which are noted below.

Upgrading from Slurm versions 2.6 or 14.03 should proceed without loss of jobs or other state. Just be sure to upgrade the slurmdbd first. (Upgrades from pre-releases of version 14.11 may result job loss.)

Slurm downloads are available from here.

Thanks to all those who helped make this release!

Highlights of changes in Slurm version 14.11.0 include:
-- Added job array data structure and removed 64k array size restriction.
-- Added support for reserving CPUs and/or memory on a compute node for system use.
-- Added support for allocation of generic resources by model type for heterogeneous systems (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
-- Added support for non-consumable generic resources that are limited, but can be shared between jobs.
-- Added support for automatic job requeue policy based on exit value.
-- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO CHANGE! The lua script no longer needs to explicitly load meta-tables, but information is available directly using names slurm.reservations, slurm.jobs, slurm.log_info, etc. Also, the job_submit.lua script is reloaded when updated without restarting the slurmctld daemon.
-- Eliminate native Cray specific port management. Native Cray systems must now use the MpiParams configuration parameter to specify ports to be used for communications. When upgrading Native Cray systems from version 14.03, all running jobs should be killed and the switch_cray_state file (in SaveStateLocation of the nodes where the slurmctld daemon runs) must be explicitly deleted.

Slurm versions 14.03.10 and 14.11.0-rc3 are now available

Slurm version 14.03.10 includes quite a few relatively minor bug fixes, and will most likely be the last 14.03 release. Thanks to all those who helped make this a very stable release.

We hope to officially tag 14.11.0 before SC14. Version 14.11.0-rc3 includes a few bug fixes discovered in recent testing but is looking very stable. Thanks to everyone participating in the testing! If you can, please test this release so we can attempt to fix as many issues as we can before we tag 14.11.0.

Just a heads up, version 15.08 is already starting development we will most likely tag a pre1 of this later this month.

Slurm downloads are available from here.

Slurm versions 14.03.9 and 14.11.0-rc2 are now available

Version 14.03.9 includes quite a few relatively minor bug fixes. Version 14.11.0-rc2 includes a few bug fixes discovered in recent testing. Thanks to everyone participating in the testing! Version 14.11.0 is no longer under active development, but is undergoing testing for a planned release in early November. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.03.8 and 14.11.0-pre5 are now available

Slurm versions 14.03.8 and 14.11.0-pre5 are now available. Version 14.03.8 includes quite a few relatively minor bug fixes.

Version 14.11.0 is under active development and its release is planned in November 2014. Much of its features and performance enhancements will be discussed next week at SLUG 2014 in Lugano Switzerland.

Note to all developers, code freeze for new features in 14.11 will be at the end of this month (September).

Slurm downloads are available here.

Slurm versions 14.03.7 and 14.11.0-pre4 are now available

Slurm versions 14.03.7 and 14.11.0-pre4 are now available. Version 14.03.7 includes quite a few relatively minor bug fixes. Version 14.11.0-pre4 includes a new job array data structure and APIs for managing job arrays. These changes provide vastly improved scalability with respect to job arrays. Version 14.11.0 is under active development and its release is planned in November 2014. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.03.6 and 14.11.0-pre3 are now available

Slurm versions 14.03.6 and 14.11.0-pre3 are now available. Version 14.03.6 includes includes a couple of bug fixes, including a bug related to generic resources that can result in the slurmctld daemon aborting. Version 14.11.0-pre3 includes some performance and scalability enhancements plus some new job prioritization options. Slurm downloads are available from http://www.schedmd.com/#repos.

Slurm versions 14.03.5 and 14.11.0-pre2 are now available

Slurm versions 14.03.5 and 14.11.0-pre2 are now available. Version 14.03.5 includes about 40 relatively minor bug fixes and enhancements as described below.
Version 14.11.0-pre2 is the second pre-release of the next major release of Slurm scheduled for November 2014. This is very much a work in progress and not intended for production use.

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of changes in Slurm version 14.03.5 include:

  • If a srun runs in an exclusive allocation and doesn't use the entire allocation and CR_PACK_NODES is set layout tasks appropriately.
  • Correct Shared field in job state information seen by scontrol, sview, etc.
  • Print Slurm error string in scontrol update job and reset the Slurm errno before each call to the API.
  • Fix task/cgroup to handle -mblock:fcyclic correctly.
  • Fix for core-based advanced reservations where the distribution of cores across nodes is not even.
  • Fix issue where association maxnodes wouldn't be evaluated correctly if a QOS had a GrpNodes set.
  • GRES fix with multiple files defined per line in gres.conf.
  • When a job is requeued make sure accounting marks it as such.
  • Print the state of requeued job as REQUEUED.
  • Fix if a job's partition was taken away from it don't allow a requeue.
  • Make sure we lock on the conf when sending slurmd's conf to the slurmstepd.
  • Fix issue with sacctmgr 'load' not able to gracefully handle bad formatted file.
  • sched/backfill: Correct job start time estimate with advanced reservations.
  • Error message added when in proctrack/cgroup the step freezer path isn't able to be destroyed for debug.
  • Added extra index's into the database for better performance when deleting users.
  • Fix issue with wckeys when tracking wckeys, but not enforcing them, you could get multiple '*' wckeys.
  • Fix bug which could report to squeue the wrong partition for a running job that is submitted to multiple partitions.
  • Report correct CPU count allocated to job when allocated whole node even if not using all CPUs.
  • If job's constraints cannot be satisfied put it in pending state with reason BadConstraints and don't remove it.
  • sched/backfill - If job started with infinite time limit, set its end_time one year in the future.
  • Clear record of a job's gres when requeued.
  • Clear QOS GrpUsedCPUs when resetting raw usage if QOS is not using any cpus.
  • Remove log message left over from debugging.
  • When using CR_PACK_NODES fix make --ntasks-per-node work correctly.
  • Report correct partition associated with a step if the job is submitted to multiple partitions.
  • Fix to allow removing of preemption from a QOS.
  • If the proctrack plugins fail to destroy the job container print an error message and avoid to loop forever, give up after 120 seconds.
  • Make srun obey POSIX convention and increase the exit code by 128 when the process terminated by a signal.
  • Sanity check for acct_gather_energy/rapl.
  • If the sbatch command specifies the option --signal=B:signum sent the signal to the batch script only.
  • If we cancel a task and we have no other exit code send the signal and exit code.
  • Added note about InnoDB storage engine being used with MySQL.
  • Set the job exit code when the job is signaled and set the log level to debug2() when processing an already completed job.
  • Reset diagnostics time stamp when "sdiag --reset" is called.
  • squeue and scontrol to report a job's "shared" value based upon partition options rather than reporting "unknown" if job submission does not use --exclusive or --shared option.
  • task/cgroup - Fix cpuset binding for batch script.
  • sched/backfill - Fix anomaly that could result in jobs being scheduled out of order.
  • Expand pseudo-terminal size data structure field sizes from 8 to 16 bits.
  • Set the job exit code when the job is signaled and set the log level to debug2() when processing an already completed job.
  • Distinguish between two identical error messages.
  • If using accounting_storage/mysql directly without a DBD fix issue with start of requeued jobs.
  • If a job fails because of batch node failure and the job is requeued and an epilog complete message comes from that node do not process the batch step information since the job has already been requeued because the epilog script running isn't guaranteed in this situation.
  • Change message to note a NO_VAL for return code could of come from node failure as well as interactive user.
  • Modify test4.5 to only look at one partition instead of all of them.
  • Fix sh5util -u to accept username different from the user that runs the command.
  • Corrections to man pages:salloc.1 sbatch.1 srun.1 nonstop.conf.5 slurm.conf.5.
  • Restore srun --pty resize ability.
  • Have sacctmgr dump cluster handle situations where users or such have special characters in their names like ':'.


Highlights of changes in Slurm version 14.11.0pre2 (pre-release) include:
  • Added AllowSpecResourcesUsage configuration parameter in slurm.conf. This allows jobs to use specialized resources on nodes allocated to them if the job designates --core-spec=0.
  • Add new SchedulerParameters option of build_queue_timeout to throttle how much time can be consumed building the job queue for scheduling.
  • Added HealthCheckNodeState option of "cycle" to cycle through the compute nodes over the course of HealthCheckInterval rather than running all at the same time.
  • Add job "reboot" option for Linux clusters. This invokes the configured RebootProgram to reboot nodes allocated to a job before it begins execution.
  • Added squeue -O/--Format option that makes all job and step fields available for printing.
  • Improve database slurmctld entry speed dramatically.
  • Add "CPUs" count to output of "scontrol show step".
  • Add support for lua5.2.
  • scancel -b signals only the batch step neither any other step nor any children of the shell script.
  • MySQL - enforce NO_ENGINE_SUBSTITUTION.
  • Added CpuFreqDef configuration parameter in slurm.conf to specify the default CPU frequency and governor to be set at job end.
  • Added support for job email triggers: TIME_LIMIT, TIME_LIMIT_90 (reached 90% of time limit), TIME_LIMIT_80 (reached 80% of time limit), and TIME_LIMIT_50 (reached 50% of time limit). Applies to salloc, sbatch and srun commands.
  • In slurm.conf add the parameter SrunPortRange=min-max. If this is configured then srun will use its dynamic ports only from the configured range.
  • Make debug_flags 64 bit to handle more flags.

Slurm versions 14.03.4 and 14.11.0-pre1 are now available

Slurm versions 14.03.4 and 14.11.0-pre1 are now available. Version 14.03.4 includes about 40 relatively minor bug fixes and enhancements as described below. Of particular note, there are several enhancements to control layout of tasks across resources and significant performance improvements for backfill scheduling.
Version 14.11.0-pre1 is the first pre-release of the next major release of Slurm scheduled for November 2014. This is very much a work in progress and not intended for production use.

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of changes in Slurm version 14.03.4 include:

  • Fix issue where not enforcing QOS but a partition either allows or denies them.
  • CRAY - Make switch/cray default when running on a Cray natively.
  • CRAY - Make job_container/cncu default when running on a Cray natively.
  • Disable job time limit change if it's preemption is in progress.
  • Correct logic to properly enforce job preemption GraceTime.
  • Fix sinfo -R to print each down/drained node once, rather than once per partition.
  • If a job has non-responding node, retry job step create rather than returning with DOWN node error.
  • Support SLURM_CONF path which does not have "slurm.conf" as the file name.
  • CRAY - make job_container/cncu default when running on a Cray natively
  • Fix issue where batch cpuset wasn't looked at correctly in jobacct_gather/cgroup.
  • Correct squeue's job node and CPU counts for requeued jobs.
  • Correct SelectTypeParameters=CR_LLN with job selecition of specific nodes.
  • Only if ALL of their partitions are hidden will a job be hidden by default.
  • Run EpilogSlurmctld for a job is killed during slurmctld reconfiguration.
  • Close window with srun if waiting for an allocation and while printing something you also get a signal which would produce deadlock.
  • Add SelectTypeParameters option of CR_PACK_NODES to pack a job's tasks tightly on its allocated nodes rather than distributing them evenly across the allocated nodes.
  • cpus-per-task support: Try to pack all CPUs of each tasks onto one socket. Previous logic could spread the tasks CPUs across multiple sockets.
  • Add new distribution method fcyclic so when a task is using multiple cpus it can bind cyclically across sockets.
  • task/affinity - When using --hint=nomultithread only bind to the first thread in a core.
  • Make cgroup task layout (block | cyclic) method mirror that of task/affinity.
  • If TaskProlog sets SLURM_PROLOG_CPU_MASK reset affinity for that task based on the mask given.
  • Keep supporting 'srun -N x --pty bash' for historical reasons.
  • If EnforcePartLimits=Yes and QOS job is using can override limits, allow it.
  • Fix issues if partition allows or denies account's or QOS' and either are not set.
  • If a job requests a partition and it doesn't allow a QOS or account the job is requesting pend unless EnforcePartLimits=Yes. Before it would always kill the job at submit.
  • Fix format output of scontrol command when printing node state.
  • Improve the clean up of cgroup hierarchy when using the jobacct_gather/cgroup plugin.
  • Added SchedulerParameters value of Ignore_NUMA.
  • Fix issues with code when using automake 1.14.1.
  • select/cons_res plugin: Fix memory leak related to job preemption.
  • After reconfig rebuild the job node counters only for jobs that have not finished yet, otherwise if requeued the job may enter an invalid COMPLETING state.
  • Do not purge the script and environment files for completed jobs on slurmctld reconfiguration or restart (they might be later requeued).
  • scontrol now accepts the option job=xxx or jobid=xxx for the requeue, requeuehold and release operations.
  • task/cgroup - fix to bind batch job in the proper CPUs.
  • Added strigger option of -N, --noheader to not print the header when displaying a list of triggers.
  • Modify strigger to accept arguments to the program to execute when an event trigger occurs.
  • Attempt to create duplicate event trigger now generates ESLURM_TRIGGER_DUP ("Duplicate event trigger").
  • Treat special characters like %A, %s etc. literally in the file names when specified escaped e.g. sbatch -o /home/zebra\\%s will not expand %s as the stepid of the running job.
  • CRAYALPS - Add better support for CLE 5.2 when running Slurm over ALPS.
  • Test time when job_state file was written to detect multiple primary slurmctld daemons (e.g. both backup and primary are functioning as primary and there is a split brain problem).
  • Fix scontrol to accept update jobid=# numtasks=#
  • If the backup slurmctld assumes primary status, then do NOT purge any job state files (batch script and environment files) and do not re-use them. This may indicate that multiple primary slurmctld daemons are active (e.g. both backup and primary are functioning as primary and there is a split brain problem).
  • Set correct error code when requeuing a completing/pending job.
  • When checking for if dependency of type afterany, afterok and afternotok don't clear the dependency if the job is completing.
  • Cleanup the JOB_COMPLETING flag and eventually requeue the job when the last epilog completes, either slurmd epilog or slurmctld epilog, whichever comes last.
  • When attempting to requeue a job distinguish the case in which the job is JOB_COMPLETING or already pending.
  • When reconfiguring the controller don't restart the slurmctld epilog if it is already running.
  • Email messages for job array events print now use the job ID using the format "#_# (#)" rather than just the internal job ID.
  • Set the number of free licenses to be 0 if the global license count decreases and total is less than in use.
  • Add DebugFlag of BackfillMap. Previously a DebugFlag value of Backfill logged information about what it was doing plus a map of expected resouce use in the future. Now that very verbose resource use map is only logged with a DebugFlag value of BackfillMap.
  • Fix slurmstepd core dump.
  • Modify the description of -E and -S option of sacct command as point in time 'before' or 'after' the database records are returned.
  • Correct support for partition with Shared=YES configuration.
  • If job requests --exclusive then do not use nodes which have any cores in an advanced reservation. Also prevents case where nodes can be shared by other jobs.
  • For "scontrol --details show job" report the correct CPU_IDs when thre are multiple threads per core (we are translating a core bitmap to CPU IDs).


Highlights of changes in Slurm version 14.03.11-pre1 include:
  • Modify sdiag to report Slurm RPC traffic by user, type, count and time consumed.
  • Add support for allocation of GRES by model type for heterogeneous systems (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of any type).
  • Modify squeue --start option to print the nodes expected to be used for pending job (in addition to expected start time, etc.).
  • Add support for non-consumable generic resources for resources that are limited, but can be shared between jobs.
  • Introduce automatic job requeue policy based on exit value. See RequeueExit and RequeueExitHold descriptions in slurm.conf man page.
  • Modify slurmd to cache launched job IDs for more responsive job suspend and gang scheduling.
  • Add srun --cpu-freq options to set the CPU governor (OnDemand, Performance, PowerSave or UserSpace).
  • Add support for a job step's CPU governor and/or frequency to be reset on suspend/resume (or gang scheduling). The default for an idle CPU will now be "ondemand" rather than "userspace" with the lowest frequency (to recover from hard slurmd failures and support gang scheduling).
  • Replace round-robin front-end node selection with least-loaded algorithm.
  • Add new node configuration parameters CoreSpecCount, CPUSpecList and MemSpecLimit which support the reservation of resources for system use with Linux cgroup.
  • Cray/ALPS system - Enable backup controller to run outside of the Cray to accept new job submissions and most other operations on the pending jobs.
  • sview - Better job_array support.
  • Provide more precise error message when job allocation can not be satisfied (e.g. memory, disk, cpu count, etc. rather than just "node configuration not available").

Slurm 14.03.3 is now available

We are pleased to announce that Slurm 14.03.3 is now available at http://www.schedmd.com/#repos.
Changes in Slurm 14.03.3

  • Fix perlapi to compile correctly with perl 5.18.
  • Correction to default batch output file name. In version 14.03.2 was using "slurm__4294967294.out" due to error in job array logic.
  • In slurm.spec file, replace "Requires cray-MySQL-devel-enterprise" with "Requires mysql-devel".
  • Switch/nrt - On switch resource allocation failure, free partial allocation.
  • Switch/nrt - Properly track usage of CAU and RDMA resources with multiple tasks per compute node.
  • Fix issue where user is requesting --acctg-freq=0 and no memory limits.
  • BGQ - Temp fix issue where job could be left on job_list after it finished.
  • BGQ - Fix issue where limits were checked on midplane counts instead of cnode counts.
  • BGQ - Move code to only start job on a block after limits are checked.
  • Handle node ranges better when dealing with accounting max node limits.

Slurm version 14.03.2 is now available

We are Please to announce Slurm 14.03.2 available here.

Please upgrade at your earliest convenience. Refer to the NEWS file for list of changes.

Slurm version 14.03.1 is now available.

Slurm version 14.03.1 is now available. This release includes a four weeks of bug fixes since the release of version 14.03.0. Upgrading from Slurm versions 2.5 or 2.6 should proceed without loss of jobs or other state. Just be sure to upgrade the slurmdbd first. (Upgrades from pre-releases of version 14.03 may result job loss.)

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of changes in Slurm version 14.03.1 include:

  • Add support for job std_in, std_out and std_err fields in Perl API.
  • Add "Scheduling Configuration Guide" web page.
  • BGQ - fix check for jobinfo when it is NULL.
  • Do not check cleaning on "pending" steps.
  • task/cgroup plugin - Fix for building on older hwloc (v1.0.2).
  • In the PMI implementation by default don't check for duplicate keys. Set the SLURM_PMI_KVS_DUP_KEYS if you want the code to check for duplicate keys.
  • Add job submission time to squeue.
  • Permit user root to propagate resource limits higher than the hard limit slurmd has on that compute node has (i.e. raise both current and maximum limits).
  • Fix issue with license used count when doing an scontrol reconfig.
  • Fix the PMI iterator to not report duplicated keys.
  • Fix issue with sinfo when -o is used without the %P option.
  • Rather than immediately invoking an execution of the scheduling logic on every event type that can enable the execution of a new job, queue its execution. This permits faster execution of some operations, such as modifying large counts of jobs, by executing the scheduling logic less frequently, but still in a timely fashion.
  • If the environment variable is greater than MAX_ENV_STRLEN don't set it in the job env otherwise the exec() fails.
  • Optimize scontrol hold/release logic for job arrays.
  • Modify srun to report an exit code of zero rather than nine if some tasks exit with a return code of zero and others are killed with SIGKILL. Only an exit code of zero did this.
  • Fix a typo in scontrol man page.
  • Avoid slurmctld crash getting job info if detail_ptr is NULL.
  • Fix sacctmgr add user where both defaultaccount and accounts are specified.
  • Added SchedulerParameters option of max_sched_time to limit how long the main scheduling loop can execute for.
  • Added SchedulerParameters option of sched_interval to control how frequently the main scheduling loop will execute.
  • Move start time of main scheduling loop timeout after locks are aquired.
  • Add squeue job format option of "%y" to print a job's nice value.
  • Update scontrol update jobID logic to operate on entire job arrays.
  • Fix PrologFlags=Alloc to run the prolog on each of the nodes in the allocation instead of just the first.
  • Fix race condition if a step is starting while the slurmd is being restarted.
  • Make sure a job's prolog has ran before starting a step.
  • BGQ - Fix invalid memory read when using DefaultConnType in the bluegene.conf.
  • Make sure we send node state to the DBD on clean start of controller.
  • Fix some sinfo and squeue sorting anomalies due to differences in data types.
  • Only send message back to slurmctld when PrologFlags=Alloc is used on a Cray/ALPS system, otherwise use the slurmd to wait on the prolog to gate the start of the step.
  • Remove need to check PrologFlags=Alloc in slurmd since we can tell if prolog has ran yet or not.
  • Fix squeue to use a correct macro to check job state.
  • BGQ - Fix incorrect logic issues if MaxBlockInError=0 in the bluegene.conf.
  • priority/basic - Insure job priorities continue to decrease when jobs are submitted with the --nice option.
  • Make the PrologFlag=Alloc work on batch scripts.
  • Make PrologFlag=NoHold (automatically sets PrologFlag=Alloc) not hold in salloc/srun, instead wait in the slurmd when a step hits a node and the prolog is still running.
  • Added --cpu-freq=highm1 (high minus one) option.
  • Expand StdIn/Out/Err string length output by "scontrol show job" from 128 to 1024 bytes.
  • squeue %F format will now print the job ID for non-array jobs.
  • Use quicksort for all priority based job sorting, which improves performance significantly with large job counts.
  • If a job has already been released from a held state ignore successive release requests.
  • Fix srun/salloc/sbatch man pages for the --no-kill option.
  • Add squeue -L/--licenses option to filter jobs by license names.
  • Handle abort job on node on front end systems without core dumping.
  • Fix dependency support for job arrays.
  • When updating jobs verify the update request is not identical to the current settings.
  • When sorting jobs and priorities are equal sort by job_id.
  • Do not overwrite existing reason for node being down or drained.
  • Requeue batch job if Munge is down and credential can not be created.
  • Make _slurm_init_msg_engine() tolerate bug in bind() returning a busy ephemeral port.
  • Don't block scheduling of entire job array if it could run in multiple partitions.
  • Introduce a new debug flag Protocol to print protocol requests received together with the remote IP address and port.
  • CRAY - Set up the network even when only using 1 node.
  • CRAY - Greatly reduce the number of error messages produced from the task plugin and provide more information in the message.
  • Slurm version 14.03.0 is now available

    Slurm version 14.03.0 is now available. This is a major Slurm release with many new features. See the RELEASE_NOTES and NEWS files in the distribution for detailed descriptions of the changes, a few of which are noted below. Upgrading from Slurm versions 2.5 or 2.6 should proceed without loss of jobs or other state. Just be sure to upgrade the slurmdbd first. (Upgrades from pre-releases of version 14.03 may result job loss.) Slurm downloads are available from http://www.schedmd.com/#repos. Highlights of changes in Slurm version 14.03.0 include:

  • Added support for native Slurm operation on Cray systems (without ALPS).
  • Added partition configuration parameters AllowAccounts, AllowQOS, DenyAccounts and DenyQOS to provide greater control over use.
  • Added the ability to perform load based scheduling. Allocating resources to jobs on the nodes with the largest number if idle CPUs.
  • Added support for reserving cores on a compute node for system services (core specialization)
  • Add mechanism for job_submit plugin to generate error message for srun, salloc or sbatch to stderr.
  • Added new structures and support for both server and cluster global resources (e.g. license mechanism).
  • Support for Postgres database has long since been out of date and problematic, so it has been removed entirely. If you would like to use it the code still exists in <= 2.6, but will not be included in this and future versions of the code.
  • Significant performance improvements, especially with respect to job array support.
  • Slurm versions 2.6.7 and 14.03.0-rc1 are now available

    We are pleased to announce the availability of Slurm version 2.6.7, plus version 14.03.0-rc1 (release candidate 1). We plan to release version 14.03.0 by the end of the month. See the "RELEASE_NOTES" file in the distribution for a description of the major changes in version 14.03.

    This will most likely be the last 2.6 release. 14.03 code has been frozen for development and will only get bug fixes from here on out. Thanks to all those that have contributed to the effort!

    The Slurm distributions are available from: here

    Slurm versions 2.6.6 and 14.03.0-pre6 are now available

    Slurm version 2.6.6 with a multitude of bug fixes is now available. We are also making available version 14.03.0-pre6 with more development work for the next major release. See the NEWS file in the distribution for detailed descriptions of the changes. Downloads are available here.

    Slurm versions 2.6.5 and 14.03.0-pre5 are now available

    Slurm version 2.6.5 with a multitude of bug fixes is now available. We are also making available version 14.03.0-pre5 with more development work for the next major release. A summary of changes is found in the NEWS file of the tarball. Downloads are available from here.

    Slurm versions 2.6.4 and 13.12.0-pre4 are now available

    Slurm version 2.6.4 with a multitude of bug fixes plus some new development to better support Torque/PBS commands and options is now available. We are also making available version 13.12.0-pre4 with more development work for the next major release. See the NEWS file in the distribution for detailed descriptions of the changes. Downloads are available from http://www.schedmd.com/#repos.

    Slurm versions 2.6.3 and 13.12.0-pre3 are now available

    Slurm version 2.6.3 with a multitude of bug fixes plus some new development to better support Torque/PBS commands and options is now available. We are also making available version 13.12.0-pre3 with more development work for the next major release. See the NEWS file in the distribution for detailed descriptions of the changes. Downloads are available from http://www.schedmd.com/#repos.

    Slurm versions 2.6.2 and 13.12.0-pre2 are now available

    We are pleased to announce the availability of Slurm version 2.6.2 (with various bug fixes) and 13.12.0-pre2 (with second installment of development for the next major release). Downloads are available from http://www.schedmd.com/#repos. Highlights of changes in Slurm version 2.6.2 include:

    • Fix issue with reconfig and GrpCPURunMins
    • Fix of wrong node/job state problem after reconfig
    • Allow users who are coordinators update their own limits in the accounts they are coordinators over.
    • BackupController - Make sure we have a connection to the DBD first thing to avoid it thinking we don't have a cluster name.
    • Correct value of min_nodes returned by loading job information to consider the job's task count and maximum CPUs per node.
    • If running jobacct_gather/none fix issue on unpacking step completion.
    • Reservation with CoreCnt: Avoid possible invalid memory reference.
    • sjstat - Add man page when generating rpms.
    • Make sure GrpCPURunMins is added when creating a user, account or QOS with sacctmgr.
    • Fix for invalid memory reference due to multiple free calls caused by job arrays submitted to multiple partitions.
    • Enforce --ntasks-per-socket=1 job option when allocating by socket.
    • Validate permissions of key directories at slurmctld startup. Report anything that is world writable.
    • Improve GRES support for CPU topology. Previous logic would pick CPUs then reject jobs that can not match GRES to the allocated CPUs. New logic first filters out CPUs that can not use the GRES, next picks CPUs for the job, and finally picks the GRES that best match those CPUs.
    • Switch/nrt - Prevent invalid memory reference when allocating single adapter per node of specific adapter type
    • CRAY - Make Slurm work with CLE 5.1.1
    • Fix segfault if submitting to multiple partitions and holding the job.
    • Use MAXPATHLEN instead of the hardcoded value 1024 for maximum file path lenghts.
    • If OverTimeLimit is defined, do not declare failed those jobs that ended in the OverTimeLimit interval.

    Slurm versions 2.6.1 and 13.12.0-pre1 are now available

    Slurm versions 2.6.1 and 13.12.0-pre1 are now available We are pleased to announce the availability of Slurm version 2.6.1 (with various bug fixes) and 13.12.0-pre1 (with first installment of development for the next major release). Downloads are available from http://www.schedmd.com/#repos.

    Slurm version 2.6.0 available

    We are pleased to announce the availability of Slurm version 2.6. Changes from version 2.5 are extensive and highlights are listed below. Please see the RELEASE_NOTES file in the Slurm distribution for more details. Note the Slurm documentation at schedmd.com has been updated to version 2.6. Highlights of changes in Slurm version 2.6 include:

    • Added support for job arrays, which increases performance and ease of use for sets of similar jobs. This may necessitate changes in prolog and/or epilog scripts due to change in the job ID format, which is now of the form "_" for job arrays.
    • Added support for job profiling to periodically capture each task's CPU use, memory use, power consumption, Lustre use and Infiniband network use.
    • Added support for generic external sensor plugins which can be used to capture temperature and power consumption data.
    • Added mpi/pmi2 plugin with much more scalable performance for MPI implementations using PMI communications interface.
    • Added prolog and epilog support for advanced reservations.
    • Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling.
    • Advanced reservations with hostname and core counts now supports asymetric reservations (e.g. specific different core count for each node).
    • Added slurmctld/dynalloc plugin for MapReduce+ support. New versions of OpenMPI and MapReduce are required to enable this functionality.
    • Make sched/backfill the default scheduling plugin rather than sched/builtin (FIFO).

    Slurm version 2.6.0-rc2 is now available

    We are pleased to announce the availability of Slurm version 2.6.0-rc2 (release candidate 2).

    We plan to release version 2.6.0 very soon. See the "RELEASE_NOTES" file in the distribution for a description of the major changes in version 2.6.

    A great way to find out about Slurm development is to attend the Slurm User Group Meeting, September 18 - 19 in Oakland, California, USA.

    The Slurm distributions are available from here.

    Slurm versions 2.5.7 and 2.6.0-rc1 are now available

    We are pleased to announce the availability of Slurm version 2.5.7 plus version 2.6.0-rc1 (release candidate 1).

    We plan to release version 2.6.0 after more testing. See the "RELEASE_NOTES" file in the distribution for a description of the major changes in version 2.6.

    A great way to find out about Slurm development is to attend the Slurm User Group Meeting, September 18 - 19 in Oakland, California, USA.

    The Slurm distributions are available from here.

    Slurm versions 2.5.6

    We have just found a regression in 2.5.5 if using the mysql database for accounting along with GRES. There for we tagged a 2.5.6 with the fix.

    You can download it from here.

    This bug only exists in 2.5.5 and 2.6.0-0pre3 systems. Those running 2.6 pre releases are advised to patch there code base or just do a pull from github.

    A simple patch is found here.

    2.5.6 also contains a patch dealing with requeuing jobs that use GRES as well.

    Slurm versions 2.5.5 and 2.6.0-pre3 are now available

    Feel free to update from here. There has been quite a few changes for Cray and BGQ systems, so anyone running them should take a serious look. As always it is a good idea to run with the latest on any system though.

    * Changes in Slurm 2.5.5
    ========================
    -- Fix for sacctmgr add qos to handle the 'flags' option.
    -- Export SLURM_ environment variables from sbatch, even if "--export"
    option does not explicitly list them.
    -- If node is in more than one partition, correct counting of allocated CPUs.
    -- If step requests more CPUs than possible in specified node count of job
    allocation then return ESLURM_TOO_MANY_REQUESTED_CPUS rather than
    ESLURM_NODES_BUSY and retrying.
    -- CRAY - Fix SLURM_TASKS_PER_NODE to be set correctly.
    -- Accounting - more checks for strings with a possible `'` in it.
    -- sreport - Fix by adding planned down time to utilization reports.
    -- Do not report an error when sstat identifies job steps terminated during
    its execution, but log using debug type message.
    -- Select/cons_res - Permit node removed from job by going down to be returned
    to service and re-used by another job.
    -- Select/cons_res - Tighter packing of job allocations on sockets.
    -- SlurmDBD - fix to allow user root along with the slurm user to register a
    cluster.
    -- Select/cons_res - Fix for support of consecutive node option.
    -- Select/cray - Modify build to enable direct use of libslurm library.
    -- Bug fixes related to job step allocation logic.
    -- Cray - Disable enforcement of MaxTasksPerNode, which is not applicable
    with launch/aprun.
    -- Accounting - When rolling up data from past usage ignore "idle" time from
    a reservation when it has the "Ignore_Jobs" flag set. Since jobs could run
    outside of the reservation in it's nodes without this you could have
    double time.
    -- Accounting - Minor fix to avoid reuse of variable erroneously.
    -- Reject job at submit time if the node count is invalid. Previously such a
    job submitted to a DOWN partition would be queued.
    -- Purge vestigial job scripts when the slurmd cold starts or slurmstepd
    terminates abnormally.
    -- Add support for FreeBSD.
    -- Add sanity check for NULL cluster names trying to register.
    -- BGQ - Push action 'D' info to scontrol for admins.
    -- Reset a job's reason from PartitionDown when the partition is set up.
    -- BGQ - Handle issue where blocks would have a pending job on them and
    while it was free cnodes would go into software error and kill the job.
    -- BGQ - Fix issue where if for some reason we are freeing a block with
    a pending job on it we don't kill the job.
    -- BGQ - Fix race condition were a job could of been removed from a block
    without it still existing there. This is extremely rare.
    -- BGQ - Fix for when a step completes in Slurm before the runjob_mux notifies
    the slurmctld there were software errors on some nodes.
    -- BGQ - Fix issue on state recover if block states are not around
    and when reading in state from DB2 we find a block that can't be created.
    You can now do a clean start to rid the bad block.
    -- Modify slurmdbd to retransmit to slurmctld daemon if it is not responding.
    -- BLUEGENE - Fix issue where when doing backfill preemptable jobs were
    never looked at to determine eligibility of backfillable job.
    -- Cray/BlueGene - Disable srun --pty option unless LaunchType=launch/slurm.
    -- CRAY - Fix sanity check for systems with more than 32 cores per node.
    -- CRAY - Remove other objects from MySQL query that are available from
    the XML.
    -- BLUEGENE - Set the geometry of a job when a block is picked and the job
    isn't a sub-block job.
    -- Cray - avoid check of macro versions of CLE for version 5.0.
    -- CRAY - Fix memory issue with reading in the cray.conf file.
    -- CRAY - If hostlist is given with srun make sure the node count is the same
    as the hosts given.
    -- CRAY - If task count specified, but no tasks-per-node, then set the tasks
    per node in the BASIL reservation request.
    -- CRAY - fix issue with --mem option not giving correct amount of memory
    per cpu.
    -- CRAY - Fix if srun --mem is given outside an allocation to set the
    APRUN_DEFAULT_MEMORY env var for aprun. This scenario will not display
    the option when used with --launch-cmd.
    -- Change sview to use GMutex instead of GStaticMutex
    -- CRAY - set APRUN_DEFAULT_MEMROY instead of CRAY_AUTO_APRUN_OPTIONS
    -- sview - fix issue where if a partition was completely in one state the
    cpu count would be reflected correctly.
    -- BGQ - fix for handling half rack system in STATIC of OVERLAP mode to
    implicitly create full system block.
    -- CRAY - Dynamically create BASIL XML buffer to resize as needed.
    -- Fix checking if QOS limit MaxCPUMinsPJ is set along with DenyOnLimit to
    deny the job instead of holding it.
    -- Make sure on systems that use a different launcher than launch/slurm not
    to attempt to signal tasks on the frontend node.
    -- Cray - when a step is requested count other steps running on nodes in the
    allocation as taking up the entire node instead of just part of the node
    allocated. And always enforce exclusive on a step request.
    -- Cray - display correct nodelist, node/cpu count on steps.

    * Changes in Slurm 2.6.0pre3
    ============================
    -- Add milliseconds to default log message header (both RFC 5424 and ISO 8601
    time formats). Disable milliseconds logging using the configure
    parameter "--disable-log-time-msec". Default time format changes to
    ISO 8601 (without time zone information). Specify "--enable-rfc5424time"
    to restore the time zone information.
    -- Add username (%u) to the filename pattern in the batch script.
    -- Added options for front end nodes of AllowGroups, AllowUsers, DenyGroups,
    and DenyUsers.
    -- Fix sched/backfill logic to initiate jobs with maximum time limit over the
    partition limit, but the minimum time limit permits it to start.
    -- gres/gpu - Fix for gres.conf file with multiple files on a single line
    using a slurm expression (e.g. "File=/dev/nvidia[0-1]").
    -- Replaced ipmi.conf with generic acct_gather.conf file for all acct_gather
    plugins. For those doing development to use this follow the model set
    forth in the acct_gather_energy_ipmi plugin.
    -- Added more options to update a step's information
    -- Add DebugFlags=ThreadID which will print the thread id of the calling
    thread.
    -- CRAY - Allocate whole node (CPUs) in reservation despite what the
    user requests. We have found any srun/aprun afterwards will work on a
    subset of resources.

    Slurm User Group Meeting CFA

    You are invited to submit an abstract of a presentation or tutorial to be given at the Slurm User Group Meeting 2013. This event is sponsored and organized by SchedMD and will be held in Oakland, California, USA on September 18 and 19, 2013.

    This international event is opened to everyone who wants to:

    • Learn more about Slurm, a highly scalable Resource Manager and Job Scheduler
    • Share their knowledge and experience with other users and administrators
    • Get detailed informations about the latest features and developments
    • Share requirements and discuss future developments

    Everyone who wants to present their own usage, developments, site report, or tutorial about Slurm is invited to send an abstract to sugc@schedmd.com.

    IMPORTANT DATES:
    May 24, 2013: Abstracts due
    June 21, 2013: Notification of acceptance
    September 18-19, 2013: Slurm User Group Meeting 2013

    Program Committee:
    Yiannis Georgiou (Bull)
    Matthieu Hautreux (CEA)
    Morris Jette (SchedMD)
    Donald Lipari (LLNL, Lawrence Livermore National Laboratory)
    Colin McMurtrie (CSCS, Swiss National Supercomputing Centre)
    Stephen Trofinoff (CSCS, Swiss National Supercomputing Centre)

    Slurm versions 2.5.4 and 2.6.0-pre2 are now available

    Slurm versions 2.5.4 is now available with the bug fixes listed below. The latest versions of Slurm are available from www.schedmd.com/#repos.

    - Fix bug in PrologSlurmctld use that would block job steps until node responds.
    - CRAY - If a partition has MinNodes=0 and a batch job doesn't request nodes put the allocation to 1 instead of 0 which prevents the allocation to happen.
    - Better debug when the database is down and using the --cluster option in the user commands.
    - When asking for job states with sacct, default to 'now' instead of midnight of the current day.
    - Fix for handling a test-only job or immediate job that fails while being built.
    - Comment out all of the logic in the job_submit/defaults plugin. The logic is only an example and not meant for actual use.
    - Eliminate configuration file 4096 character line limitation.
    - More robust logic for tree message forward
    - BGQ - When cnodes fail in a timeout fashion correctly look up parent midplane.
    - Correct sinfo "%c" (node's CPU count) output value for Bluegene systems.
    - Backfill - Responsive improvements for systems with large numbers of jobs (more than 5000) and using the SchedulerParameters option bf_max_job_user.
    - slurmstepd: ensure that IO redirection openings from/to files correctly handle interruption
    - BGQ - Able to handle when midplanes go into Hardware::SoftwareFailure
    - GRES - Correct tracking of specific resources used after slurmctld restart. Counts would previously go negative as jobs terminate and decrement from a base value of zero.
    - Fix for priority/multifactor2 plugin to not assert when configured with --enable-debug.
    - Select/cons_res - If the job request specified --ntasks-per-socket and the allocation using is cores, then pack the tasks onto the sockets up to the specified value.
    - BGQ - If a cnode goes into an 'error' state and the block containing the cnode does not have a job running on it do not resume the block.
    - BGQ - Handle blocks that don't free themselves in a reasonable time better.
    - BGQ - Fix for signaling steps when allocation ends before step.
    - Fix for backfill scheduling logic with job preemption; starts more jobs.
    - xcgroup - remove bugs with EINTR management in write calls
    - jobacct_gather - fix total values to not always == the max values.
    - Fix for handling node registration messages from older versions without energy data.
    - BGQ - Allow user to request full dimensional mesh.
    - sdiag command - Correction to jobs started value reported.
    - Prevent slurmctld assert when invalid change to reservation with running jobs is made.
    - BGQ - If signal is NODE_FAIL allow forward even if job is completing and timeout in the runjob_mux trying to send in this situation.
    - BGQ - More robust checking for correct node, task, and ntasks-per-node options in srun, and push that logic to salloc and sbatch.
    - GRES topology bug in core selection logic fixed.
    - Fix to handle init.d script for querying status and not return 1 on success.

    Slurm versions 2.6.0-pre2 contains the enhancements listed below.

    - Do not purge inactive interactive jobs that lack a port to ping (added for MR+ operation).
    - Advanced reservations with hostname and core counts now supports asymetric reservations (e.g. specific different core count for each node).
    - Added slurmctld/dynalloc plugin for MapReduce+ support.
    - Added "DynAllocPort" configuration parameter.
    - Added partition paramter of SelectTypeParameters to override system-wide value.
    - Added cr_type to partition_info data structure.
    - Added allocated memory to node information available (within the existing select_nodeinfo field of the node_info_t data structure). Added Allocated Memory to node information displayed by sview and scontrol commands.
    - Make sched/backfill the default scheduling plugin rather than sched/builtin (FIFO).
    - Added support for a job having different priorities in different partitions.
    - Added new SchedulerParameters configuration parameter of "bf_continue" which permits the backfill scheduler to continue considering jobs for backfill scheduling after yielding locks even if new jobs have been submitted. This can result in lower priority jobs from being backfill scheduled instead of newly arrived higher priority jobs, but will permit more queued jobs to be considered for backfill scheduling.
    - Added support to purge reservation records from accounting.
    - Cray - Add support for Basil 1.3

    Slurm version 2.5.3 is now available

    Slurm versions 2.5.3 is now available with the bug fixes listed below. The latest versions of Slurm are available from www.schedmd.com/#repos.

    Gres/gpu plugin - If no GPUs requested, set CUDA_VISIBLE_DEVICES=NoDevFiles. This bug was introduced in 2.5.2 for the case where a GPU count was configured, but without device files.
    task/affinity plugin - Fix bug in CPU masks for some processors.
    Modify sacct command to get format from SACCT_FORMAT environment variable.
    BGQ - Changed order of library inclusions and fixed incorrect declaration to compile correctly on newer compilers.
    Fix for not building sview if glib exists on a system but not the gtk libs.
    BGQ - Fix for handling a job cleanup on a small block if the job has long since left the system.
    Fix race condition in job dependency logic which can result in invalid memory reference.

    Slurm versions 2.5.2 and 2.6.0-pre1 available

    Slurm version 2.5.2 is now available with various bug fixes. We have also made available pre-release of version 2.6, (still under development). Notable features in v2.6 include support for job arrays and accounting for a job's energy consumption using IPMI. The job array documentation is available at www.schedmd.com/slurmdocs/job_array.html. The latest versions of Slurm are available from www.schedmd.com/#repos.

    Slurm version 2.5.0 released

    We are pleased to announce the availability of Slurm version 2.5.0. This is a major upgrade from version 2.4 with changes to the Slurm commands and API. Pending and running should be preserved through the upgrade. You should plan to upgrade your slurmdbd (Slurm DataBase Daemon) before upgrading other Slurm daemons or programs. You can get the latest Slurm tar-ball from the repository here.

    We have also released version 2.4.5 with various minor bug fixes. This will likely be the final release of version 2.4.

    Highlights of version 2.5 include:

    • Major performance improvements for high-throughput computing.
    • Added srun option "--cpu-freq" to enable user control over the job's CPU frequency and thus it's power consumption.
    • Account for power consumption by job.
    • Added "boards" count to node information and "boards_per_node" to job request and job information. Optimize resource allocation to minimize number of boards used by a job.
    • Added support for IBM Parallel Environment (PE) including the launching of jobs using either the srun or poe command.
    • Add support for advanced reservation of specific cores rather than whole nodes.
    • Added priority/multifactor2 plugin supporting ticket based shares.
    • Added gres/mic plugin supporting Intel Many Integrated Core (MIC) processors.
    • Added launch plugin to support srun interface to launch tasks using different methods like IBM's poe and Cray's aprun.
    • Web pages have a different appearance.

    Slurm version 2.5.0-rc2 now available

    Slurm version 2.5.0-rc2 is now available. Slurm version 2.5.0-rc1 had a bad slurm.spec file resulting in plugins not being packaged. The slurm.spec file and several bugs have fixed in version 2.5.0-rc2, which is available from here. We currently expect to tag versions 2.5.0 and 2.4.5 about December 4, a bit later than expected, but preferable for improved stability.

    Slurm version 2.5.0-rc1 now available

    We are pleased to announce that Slurm version 2.5.0-rc1 (release candidate 1) is now available for download from here.

    This version should be considered stable and encourage all early adopters to upgrade and test so we can flush out any major issues before the scheduled release of version 2.5.0 in the end of November.

    Thanks for everyone's help with this release. It has a host of new features, see the RELEASE_NOTES file for more information there.

    Slurm version 2.4.4 is now available

    We are pleased to announce that Slurm version 2.4.4 has been tagged and is now available for download from here. It contains a variety of bug fixes, almost all of them for IBM BlueGene/Q systems.

    SLURM versions 2.4.3 and 2.5.0-pre3 now available

    We are pleased to announce the availability of SLURM version 2.4.3 with a sizable number of bug fixes, primarily for IBM Bluegene systems.

    Both are available now for download here.

    We have also made available version 2.5.0-pre3, a pre-release of the version 2.5 code, which is still under development. Of particular note, this version of SLURM supports the IBM Parallel Environment (PE) including POE and IBM's NRT switch interface. We are nearing the end of development for version 2.5 and will soon move into a testing phase before release, planned for November. If you are developing new code please code against the master git repo (2.5) as it is constantly updated so as to avoid as many conflicts as possible.

    As always if you find any bugs let us know through http://bugs.schedmd.com or the slurm-dev list.

    SLURM version 2.4.2 now available

    SLURM version 2.4.2 is now available for download from here.

    This includes many bug fixes, most of which are IBM BlueGene system related.

    As always if you find any bugs let us know through http://bugs.schedmd.com or the slurm-dev list.

    SLURM versions 2.4.1 is now available

    It has come to our attention a bug in 2.4.0 results in job loss when upgrading from 2.3.* to 2.4.0.

    2.4.1 has fixed this problem. This is the only patch in 2.4.1 from 2.4.0.

    2.4.1 will preserve job state from 2.4.0 as well as state from 2.1+.

    Sorry for the inconvenience, thanks to Carles Fenoy for bringing the issue to our attention.

    You may download it here. To avoid future job loss we have taken 2.4.0 away from download. If you need it for historic purposes please fill free to download the tag from github.

    SLURM versions 2.4.0 and 2.5.0-pre1 are now available

    We are pleased to release a formal 2.4.0 release! Also a first development release of 2.5.

    Both are available now for download here.

    If you are developing new code please code against the master git repo as it is constantly updated so as to avoid as many conflicts as possible.

    Note to BGQ earlier adopters: Recently there have been a few changes that require the runjob_mux to run as your SLURM user. Also the plugin_flags must be updated as well to avoid a possible runjob_mux crash if you are starting a job and decide to turn off the slurmctld at the same time. Please read the updated bluegene web page look for "System Administration for BlueGene/Q only" for full instructions.

    Thanks for all your help and support. Among other things 2.4 brings substantial performance enhancements and many other improvements many of which can be found in the RELEASE_NOTES file in the code.

    As always if you find any bugs let us know through http://bugs.schedmd.com or the slurm-dev list.

    Slurm version 2.5.0 released

    We are pleased to announce the availability of Slurm version 2.5.0. This is a major upgrade from version 2.4 with changes to the Slurm commands and API. Pending and running should be preserved through the upgrade. You should plan to upgrade your slurmdbd (Slurm DataBase Daemon) before upgrading other Slurm daemons or programs. You can get the latest Slurm tar-ball from the repository here.

    We have also released version 2.4.5 with various minor bug fixes. This will likely be the final release of version 2.4.

    Highlights of version 2.5 include:

    • Major performance improvements for high-throughput computing.
    • Added srun option "--cpu-freq" to enable user control over the job's CPU frequency and thus it's power consumption.
    • Account for power consumption by job.
    • Added "boards" count to node information and "boards_per_node" to job request and job information. Optimize resource allocation to minimize number of boards used by a job.
    • Added support for IBM Parallel Environment (PE) including the launching of jobs using either the srun or poe command.
    • Add support for advanced reservation of specific cores rather than whole nodes.
    • Added priority/multifactor2 plugin supporting ticket based shares.
    • Added gres/mic plugin supporting Intel Many Integrated Core (MIC) processors.
    • Added launch plugin to support srun interface to launch tasks using different methods like IBM's poe and Cray's aprun.
    • Web pages have a different appearance.

    SLURM 2.4.0-rc1 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.4.0-rc1!

    A summary of the changes in version 2.4.0-rc1 from version 2.4.0-pre4 can be found in the file "NEWS" in the distributed files. As 2.4 has graduated from "pre" to "rc" only bug fixes will be contained in future 2.4 releases.

    Our current plan is to release another rc in a couple of weeks and then a genuine 2.4 tag in June.

    This code should be considered stable and production ready, please test and let us know if you find any issues.

    The code is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@schedmd.com

    Enjoy!

    SLURM 2.3.5 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.5!

    A summary of the changes in version 2.3.5 from version 2.3.4 can be found in the file "NEWS" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    The code is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@schedmd.com

    Enjoy!

    SLURM 2.3.4 and 2.4.0-pre4 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.4!

    A summary of the changes in version 2.3.4 from version 2.3.3 can be found in the file "RELEASE_NOTES" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    Also tagged is a new development release 2.4.0-pre4. Future development will be added to this release along with any bug fixes found in the 2.3 branch. If you are developing new code or want to run the most bleeding edge SLURM please use this version. While the code may be fairly stable this version is beta and should be considered such. For most cases this version isn't recommended for production systems.

    2.4 NOTE:
    Because internal data structures may change from one -pre release to another preserving state is not always possible, so jobs may be lost.
    2.4 NOTE:
    If running on a BGQ system this version is most likely the version you want. 2.3 gives a very small subset of functionality 2.4 already offers.

    The code for both versions is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@schedmd.com

    Enjoy!

    SLURM 2.4.0-pre3 tagged and released

    For those of you working on SLURM version 2.4 development SchedMD is pleased to announce the immediate availability of the development release SLURM 2.4.0-pre3!

    IBM BlueGene/Q systems are fully supported in this release including documentation.

    Major changes in version 2.4.0-pre3 from version 2.4.0-pre2 can be found in the file "NEWS" in the distributed files.

    Future development will be added to this release along with any bug fixes found in the 2.3 branch. If you are developing new code or want to run the most bleeding edge SLURM please use this version. While the code may be fairly stable this version is beta and should be considered such. For most cases this version isn't recommended for production systems.

    2.4 NOTE:
    Because internal data structures may change from one -pre release to another preserving state is not always possible, so jobs may be lost.
    2.4 NOTE:
    If running on a BGQ system this version is the version you want. 2.3 gives a very small subset of functionality where 2.4 now delivers the complete package.

    The code is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@lists.llnl.gov

    Enjoy!

    SLURM 2.3.3 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.3!

    A summary of the changes in version 2.3.3 from version 2.3.2 can be found in the file "NEWS" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    The code is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@lists.llnl.gov

    Enjoy!

    SLURM 2.3.2 and 2.4.0-pre2 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.2!

    A summary of the changes in version 2.3.2 from version 2.3.1 can be found in the file "RELEASE_NOTES" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    Also tagged is a new development release 2.4.0-pre2. Future development will be added to this release along with any bug fixes found in the 2.3 branch. If you are developing new code or want to run the most bleeding edge SLURM please use this version. While the code may be fairly stable this version is beta and should be considered such. For most cases this version isn't recommended for production systems.

    2.4 NOTE:
    Because internal data structures may change from one -pre release to another preserving state is not always possible, so jobs may be lost.
    2.4 NOTE:
    If running on a BGQ system this version is most likely the version you want. 2.3 gives a very small subset of functionality 2.4 already offers.

    The code for both versions is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@lists.llnl.gov

    Enjoy!

    SLURM 2.3.1 and 2.4.0-pre1 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.1!

    A summary of the changes in version 2.3.1 from version 2.3.0 can be found in the file "RELEASE_NOTES" in the distributed files. Only bug fixes are and will be contained in the 2.3 releases.

    Also tagged is a new development release 2.4.0-pre1. Future development will be added to this release along with any bug fixes found in the 2.3 branch. If you are developing new code or want to run the most bleeding edge SLURM please use this version. While the code may be fairly stable this version is beta and should be considered such. For most cases this version isn't recommended for production systems.

    2.4 NOTE:
    Because internal data structures may change from one -pre release to another preserving state is not always possible, so jobs may be lost.
    2.4 NOTE:
    If running on a BGQ system this version is most likely the version you want. 2.3 gives a very small subset of functionality 2.4 already offers.

    The code for both versions is available here: http://www.schedmd.com/#repos

    Please report any bug to http://bugs.schedmd.com or the slurm-dev list at slurm-dev@lists.llnl.gov

    Enjoy!

    SLURM 2.3.0 tagged and released

    SchedMD is pleased to announce the immediate availability of SLURM 2.3.0!

    A summary of the major changes in version 2.3 from version 2.2 can be found in the file "RELEASE_NOTES" with the distributed files.

    This version should be considered stable and ready for production use.

    The code is available here: http://www.schedmd.com/#repos

    Enjoy!

    SchedMD Porting SLURM to BlueGene/Q for LLNL

    SchedMD LLC anouced today a contract signing with Lawrence Livermore National Laboratory (LLNL) to provide development services for the SLURM workload scheduler. Technical activities center around making SLURM operational on Sequoia, a 20 petaFLOP IBM Bluegene/Q computer slated for delivery to LLNL in late 2011 with deployment scheduled in 2012. Sequoia will have 1.6 million cores, 1.6 petabytes of memory, 96 racks and 98,304 compute nodes, making it one of the most powerful computers in the world.

    Moe Jette, Chief Technology Officer of SchedMD, reports "We eagerly anticipate working with LLNL and extending SLURM's capabilities to the latest generation of hardware from IBM. SLURM was designed for very high scalability from its inception and anticipate no difficulties in managing the workload on Sequoia's 1.6 million cores."

    SchedMD Announces SLURM Support Contract with Swiss National Supercomputer Centre

    Livermore, CA - SchedMD LLC announced today a contract signing with Swiss National Supercomputer Centre (CSCS) to provide support services for the SLURM workload scheduler. CSCS recently installed the SLURM workload scheduler across their supercomputers including the 22,032 core Cray XT5 system.

    Colin McMurtie, Head of Systems at CSCS said "The ease with which we have made this transition is testament to the robustness and high quality of the product but also to the no-fuss installation and configuration procedure and the high quality documentation. We have no qualms about recommending SLURM to any facility, large or small, who wish to make the break from the various commercial options available today."

    Morris "Moe" Jette, Chief Technology Officer of SchedMD, said "We look forward to working with one of the premier computer centers in Europe. The workload scheduler is a critical component of any supercomputer center and SchedMD will provide CSCS with the support they need to use SLURM."

    SLURM Developers Depart Lawrence Livermore National Laboratory

    The primary SLURM developers, Morris Jette and Danny Auble, have decided to depart Lawrence Livermore National Laboratory (LLNL) in order to concentrate their energies on SchedMD LLC, a company they formed in 2010 to provide SLURM development and support.

    SLURM development was begun at Lawrence Livermore National Laboratory (LLNL) in 2002. It has since become one of the most popular job schedulers in high-performance computing, currently installed on the largest computer in the world at the National University of Defence Technology (NUDT) in China, Europe's largest system at Commissariat a l'Energie Atomique (CEA) in France and many others. This popularity was achieved with limited commercial support and no marketing. Our expectation is that by leaving LLNL, more resources can be made available for SLURM development and that its rate of development will increase. We also anticipate that commercial support will make it more attractive to many consumers and lead to more widespread acceptance.

    Our intent is that SLURM remain open-source and freely available to the public. The short-term impact of this transition is a short delay in the release of SLURM version 2.3 to the Fall of 2011. Version 2.3 includes the ability to increase job size, task binding to resources using Linux cgroups, plus support for IBM BlueGene/Q, Cray XT and Cray XE systems.

    The longer-term effect will be largely driven by market forces.

    SchedMD to port SLURM to Cray Computers

    SchedMD LLC announced today that an agreement had been reached with Oak Ridge National Laboratory to port SLURM to Cray systems.

    Oak Ridge National Laboratory (ORNL) operates the most powerful computer in the United States, a Cray XT5 with a peak speed of 2.33 petaflops (over two thousand trillion calculations per second). ORNL also has a contract with Cray for a 20 petaflop computer to begin shipment in 2011.

    Morris "Moe" Jette, Chief Technology Officer of SchedMD, says "SLURM supports all of the high-performance computing architectures today except for Cray systems. This work will open the door for Cray customers to a state of the art, open source job scheduler with tremendous cost savings compared to proprietary schedulers."

    Slurm versions 2.5.2 and 2.6.0-pre1 available

    Slurm version 2.5.2 is now available with the bug fixes described below. We have also made available pre-release of version 2.6,(still under development). Notable features in v2.6 include support for job arrays and accounting for a job's energy consumption using IPMI. The job array documentation is available at www.schedmd.com/slurmdocs/job_array.html The latest versions of Slurm are available from: www.schedmd.com/#repos * Changes in SLURM 2.5.2 ======================== -- Fix advanced reservation recovery logic when upgrading from version 2.4. -- BLUEGENE - fix for QOS/Association node limits. -- Add missing "safe" flag from print of AccountStorageEnforce option. -- Fix logic to optimize GRES topology with respect to allocated CPUs. -- Add job_submit/all_partitions plugin to set a job's default partition to ALL available partitions in the cluster. -- Modify switch/nrt logic to permit build without libnrt.so library. -- Handle srun task launch failure without duplicate error messages or abort. -- Fix bug in QoS limits enforcement when slurmctld restarts and user not yet added to the QOS list. -- Fix issue where sjstat and sjobexitmod was installed in 2 different RPMs. -- Fix for job request of multiple partitions in which some partitions lack nodes with required features. -- Permit a job to use a QOS they do not have access to if an administrator manually set the job's QOS (previously the job would be rejected). -- Make more variables available to job_submit/lua plugin: slurm.MEM_PER_CPU, slurm.NO_VAL, etc. -- Fix topology/tree logic when nodes defined in slurm.conf get re-ordered. -- In select/cons_res, correct logic to allocate whole sockets to jobs. Work by Magnus Jonsson, Umea University. -- In select/cons_res, correct logic when job removed from only some nodes. -- Avoid apparent kernel bug in 2.6.32 which apparently is solved in at least 3.5.0. This avoids a stack overflow when running jobs on more than 120k nodes. -- BLUEGENE - If we made a block that isn't runnable because of a overlapping block, destroy it correctly. -- Switch/nrt - Dynamically load libnrt.so from within the plugin as needed. This eliminates the need for libnrt.so on the head node. -- BLUEGENE - Fix in reservation logic that could cause abort. * Changes in SLURM 2.6.0-pre1 ============================= -- Add "state" field to job step information reported by scontrol. -- Notify srun to retry step creation upon completion of other job steps rather than polling. This results in much faster throughput for job step execution with --exclusive option. -- Added "ResvEpilog" and "ResvProlog" configuration parameters to execute a program at the beginning and end of each reservation. -- Added "slurm_load_job_user" function. This is a variation of "slurm_load_jobs", but accepts a user ID argument, potentially resulting in substantial performance improvement for "squeue --user=ID" -- Added "slurm_load_node_single" function. This is a variation of "slurm_load_nodes", but accepts a node name argument, potentially resulting in substantial performance improvement for "sinfo --nodes=NAME". -- Added "HealthCheckNodeState" configuration parameter identify node states on which HealthCheckProgram should be executed. -- Remove sacct --dump --formatted-dump options which were deprecated in 2.5. -- Added support for job arrays (phase 1 of effort). See "man sbatch" option -a/--array for details. -- Add new AccountStorageEnforce options of 'nojobs' and 'nosteps' which will allow the use of accounting features like associations, qos and limits but not keep track of jobs or steps in accounting. -- Cray - Add new cray.conf parameter of "AlpsEngine" to specify the communication protocol to be used for ALPS/BASIL. -- select/cons_res plugin: Correction to CPU allocation count logic in for cores without hyperthreading. -- Added new SelectTypeParameter value of "CR_ALLOCATE_FULL_SOCKET". -- Added PriorityFlags value of "TICKET_BASED" and merged priority/multifactor2 plugin into priority/multifactor plugin. -- Add "KeepAliveTime" configuration parameter controlling how long sockets used for srun/slurmstepd communications are kept alive after disconnect. -- Added SLURM_SUBMIT_HOST to salloc, sbatch and srun job environment. -- Added SLURM_ARRAY_TASK_ID to environment of job array. -- Added squeue --array/-r option to optimize output for job arrays. -- Added "SlurmctldPlugstack" configuration parameter for generic stack of slurmctld daemon plugins. -- Removed contribs/arrayrun tool. Use native support for job arrays. -- Modify default installation locations for RPMs to match "make install": _prefix /usr/local _slurm_sysconfdir %{_prefix}/etc/slurm _mandir %{_prefix}/share/man _infodir %{_prefix}/share/info -- Add acct_gather_energy/ipmi which works off freeipmi for energy gathering