SLURM COMMANDS¶

Here is information about SLURM-related commands, whether from the vendor or created by JHPCE staff members. Links are provided to online copies of the manual pages for commands. If we've written a page with advice about using the command, use the (LOCAL TIPS) link.

Warning

Do not frequently¹ run slurmpic, squeue, sacct or other Slurm client commands using loops in shell scripts or other programs.

These commands all send remote procedure calls to slurmctld, the main SLURM control and scheduling daemon, They may also perform look-ups in the accounting database. That process and the database need to be highly responsive to the input/output caused by running jobs.

Ensure that programs limit calls to slurmctld to the minimum necessary for the information you are trying to gather. Add arguments to limit to needed partitions or users or job data fields, etcetera.

Locally Written Tools¶

Many of these bash scripts. You can inspect their contents and if desired, make copies of them for yourself that are modified to suit you. There will also soon be bash routines defined in the default environment via either module files or the /etc/profile.d/ directory. Most scripts have -h help options which will reveal more usage information.

The command which programname is how you find out where it is located, if it is a program. If it is a bash routine, the same command will display the code of the routine.

Information about cluster and jobs¶

slurmpic: An essential program for getting cluster status info. Use -h option to see key usage details. By default, with no arguments, it provides info for the shared partition.
slurmuser: Provides a per-user list of CPU/RAM in use for running jobs and requested for pending jobs (if any). By default this is for all jobs in all partitions.
qoverview: Quick view into number of running, pending jobs on the whole cluster. Also counts the pending jobs in some primary reasons.
showjob: Displays job information when given a jobid. Only works for pending or running jobs. Currently simply a shortcut for scontrol show job jobid --details but hopefully in the future will produce more readable output.
showqos: Displays list of our QOS definitions in a readable format.
showreason: Show the Reason line from showjob for nodes that are in DRAIN or DOWN etc.
slurm-hist-all-cores: Histogram of core consumption for whole cluster
slurm-hist-all-mem: Histogram of RAM consumption for whole cluster
slurm-hist-shared-cores: Histogram of core consumption for only the shared partition
slurm-hist-shared-mem: Histogram of RAM consumption for only the shared partition
slurmuser: Displays per-user summary usage of RAM & CPU across the cluster. Can display by partition or for a specific user.
smem: Displays memory used by your currently running jobs. If given a jobid number, it will display info about the memory usage of that job. (no man page yet)
memory reporting script - puts per-user output daily into directories under /jhpce/shared/jhpce/jhpce-log/
useron: List nodes where a user has running jobs.
jobson: Displays running jobs running on a node when given a three digit node number.

Special purpose¶

timeleft: Produces a SLURM "time_spec" in the format DAYS-HH:MM:SS indicatingt time left before an upcoming announced outage. Meaningless if there is not an imminent outage.
jobtimeleft: Given a jobid, it uses scontrol to update the job's time limit using timeleft. Useful for jobs that went into pending status instead of running because the user did not specify a job time limit.

Contributed SLURM Programs We've Installed¶

reportseff: (LOCAL TIPS) Very handy tool! Displays efficiency of CPU and RAM usage for jobs, job array elements. Can be given many options to control output.
seff: Display efficiency of CPU and RAM usage of a single completed job. (no man page yet)
slurm-mail: Tool used to add details to mail sent to you. Not something you can modify. Listed for completeness.

Provided with Slurm¶

All of the manual pages are here, including those for the configuration files found in /etc/slurm/

Submitting Jobs¶

salloc: request an interactive job allocation (doesn't start any processes anywhere)
sbatch: submit a batch script to Slurm to create an allocation and run processes
srun: launch one or more tasks of an application using allocated resources

Information about cluster and jobs¶

Warning

Do not frequently¹ run slurmpic, squeue, sacct or other Slurm client commands using loops in shell scripts or other programs.

These commands all send remote procedure calls to slurmctld, the main SLURM control and scheduling daemon, They may also perform look-ups in the accounting database. That process and the database need to be highly responsive to the input/output caused by running jobs.

Ensure that programs limit calls to slurmctld to the minimum necessary for the information you are trying to gather. Add arguments to limit to needed partitions or users or job data fields, etcetera.

Some SLURM commands such as sacct and squeue can display a wide variety of information. It can be complex to specify what you want to see and to format it so it is readable. We've tried to document some common choices in the LOCAL TIPS documents. A tip: you set certain environment variables to specify output arguments instead of providing the arguments on the command line. It can be useful to define these different ways in aliases or shell scripts to format output in ways you need, because simply changing the value of these variables can produce vastly different output for commands like sacct and squeue. Example variables are: SLURM_TIME_FORMAT, SACCT_FORMAT, SQUEUE_FORMAT, SQUEUE_FORMAT2, SQUEUE_SORT.

sacct: (LOCAL TIPS): display accounting data for jobs in the Slurm database
sattach: attach to a running job step
scontrol: (LOCAL TIPS): display (or modify when permitted) the status of Slurm entities (jobs, nodes, partitions, reservations)
sinfo: display node and partition information
sprio: (LOCAL TIPS): display the factors that comprise a job's scheduling priority
squeue: display the jobs in the scheduling queues, one job per line
sshare: display the shares and usage for each charge account and user
sstat: display process statistics of a running job/step
sview: X11 graphical tool for displaying jobs, partitions, reservations

Controlling Jobs¶

scancel: cancel or pause a job or job step or signal a running job or job step to pause
scontrol: (LOCAL TIPS): display (and modify when permitted) the status of Slurm entities (jobs, nodes, partitions, reservations)

For Systems Administrators¶

sacctmgr: (LOCAL TIPS): display and modify Slurm account information
scontrol: (LOCAL TIPS): display and modify Slurm jobs and partitions
sdiag: display scheduling statistics and timing parameters
slurmctld: central management daemon
slurmd: client-side daemon
sreport: generate canned reports from job accounting data and machine utilization statistics

Frequently meaning more than once every five minutes. Do you REALLY need to know something sooner than that? If you want to know when a job starts, fails, or finishes, use email notification settings. You can add them to pending and running jobs using scontrol. (See sbatch manual page for possible mail types.) ↩↩