Table of Contents
- Running interactive jobs on cluster nodes
- Submitting batch jobs from the login node
- Two steps to submitting an R BATCH job to the cluster
- Specifying your job’s memory needs
- Using a default SGE specification file .sge_request
- A special note about h_vmem and h_stack
- Notes about specifying resource requirements
- Requesting multiple SGE slots for a multi-core job
- Checking your job’s memory usage
- Checking the status of your job
- Job status via email
- How many jobs can I submit?
- The shared queue – shared.q
- Questions and comments
Setting up keypair and 2-factor authentication
Passwords are not very secure. Don’t believe us? Read this scary article from wired magazine. We prefer that you use key-pair authentication for logins from your normal machine or 2-factor authentication when logging in from any other machine. Currently these are optional but they will soon be required, so we recommend that you set up one or both of these immediately.
Configure key-pair authentication: key-pair authentication
Configure 2-factor authentication: 2-factor authentication
Using the cluster
The cluster can be accessed via the machine jhpce01.jhsph.edu. To obtain an account on this machine, please refer to the instructions at http://jhpce.jhu.edu/knowledge-base/new-user-orientation/
Everything related to job submission, scheduling, and execution on the cluster is under the control of Sun Grid Engine (SGE). The Grid Engine project was sponsored by Sun Microsystems as an open source community effort to facilitate the adoption of distributed computing solutions.
By default, one job uses one slot, but see Requesting multiple SGE slots for a multi-core job. You may submit jobs which request more than your slot limit and they will all be queued to run as your other jobs finish. When the cluster nodes are all at maximum capacity, jobs waiting to run will be subject to a functional share priority algorithm.
IMPORTANT
The logon machine (jhpce01/jhpce02) is only used for login. Do NOT run jobs on jhpce01! This machine is not for doing any sort of computation. Rather, it is ONLY for text editing and for submitting jobs to the cluster. Any compute-intensive jobs found running on enigma2 will be KILLED WITHOUT NOTICE. You will lose any data and/or computations associated with the running job.
(Back to top)
Running interactive jobs on cluster nodes
Users are not allowed to ssh or rsh from the login host directly to a compute node. Instead users must use the qrsh or rlogin command to start an interactive job on a cluster node. These commands launch a remote shell on a cluster node. The easiest thing to do after you log into the login host is to type
qrsh
*** Be sure to see the section Specifying your job’s memory needs
You will be logged into a random cluster node and get an interactive shell prompt, just as if you logged into jhpce01. Now you can run whatever program you want. For example, you can run R. However, you must remember to logout (‘exit’ or ‘CTRL D’). Otherwise, you will be taking up a slot in the queue which will not be available to others.
While you are logged into a cluster node via qrsh , if you run
qstat
you’ll see something like the following:
job-ID prior name user state submit/start at queue slots ------------------------------------------------------------------------------------------------------- 15194 1.53962 Pf_3D7 maryj r 08/15/2016 16:04:16 shared.q@compute-076 1 15299 2.00790 BootA10600 maryj r 08/17/2016 15:26:06 shared.q@compute-080 1 15290 2.35449 QRLOGIN maryj r 08/17/2016 15:20:00 shared.q@compute-111 1
The job labeled QRLOGIN is the interactive session ( for more info see Checking the status of your job).
You may also specify memory requirements or special queues on your qrsh command just as you do on the qsub command (see below). For interactive work we strongly encourage users to work on the cluster via qrsh (rather than use jhpce01).
NOTE: DO NOT run background or ‘nohup’ jobs on the cluster. Sun Grid Engine (SGE) must know about your job/session so that it can manage and account for cluster resources. By default, SGE assumes one slot (corresponding to one CPU core) for each qrsh session.
If you still have running programs and no session appears for you in qstat, then you have done something inappropriate. If jobs are found running on cluster nodes with no associated SGE entry, they will be killed without notice.
NOTE: If you encounter an error while running a program interactively on a cluster node and your program crashes, it still might be in the cluster’s process queue. If you don’t quit out of your program normally, make sure to check the cluster queue (via qstat, see below) and see if your (interactive) job is still there. If it is, get the job-ID and kill the job using qdel.
(Back to top)
Submitting batch jobs
As we have indicated, the cluster uses Sun’s Grid Engine (SGE) to control scheduling of jobs. SGE comes with a number of command line tools to help you manage your jobs on the cluster. Most of the time you can get away with just knowing a few commands. The most immediately relevant ones are:
- qsub: submit a batch job to the cluster
- qstat: query the status of your jobs on the cluster (or look at all jobs running on the cluster)
- qdel: delete a job from the cluster/queue.
IMPORTANT: Every job that you submit to the cluster from jhpce01 must be wrapped in a shell script. That is, you cannot just start a program from the command line (e.g. nice +19 R) like you could on your own machine or some other server. Not to worry, though; wrapping your program with a shell script is not as difficult as it might sound! For example: below are instructions for how to run an R batch job on the cluster.
Two steps to submitting an R BATCH job to the cluster
Using the Sun Grid Engine to submit an R BATCH job to the cluster is very simple.
- First (on jhpce01), assuming you have an R program in a file named mycommands.R , you need to create a new file that will invoke and run your R program when it is submitted to the cluster. Let’s call this new file batch.sh . You should put this batch.sh file in the same directory as your mycommands.R file.To run an R BATCH job on the cluster using the mycommands.R file, your batch.sh file need only have this one line in it, like this:
R CMD BATCH mycommands.R
The file might have other lines in it to specify SGE job options or commands to run before or after the “R CMD BATCH …” line. The technical name for this file is “shell script”. Knowing this might help you communicate with the system administrator.
- Once you’ve written your short batch.sh file you could submit it to the cluster via the command
qsub -cwd batch.sh
*** Be sure to see the section Specifying your job’s memory needs
The -cwd option tells SGE to execute the batch.sh script on the cluster from the current working directory (otherwise, it will run from your home directory, which is probably not what you want).
That’s all you have to do! There are a few things to note:
- You do not have to put an & at the end of the line (don’t worry if you don’t know what the & might be used for). qsub automatically sends your job to the cluster and returns to your command line prompt so that you can do other things.
- After submitting your job with qsub, you can use the qstat command to see the status of your job(s).
Specifying your job’s memory needs
When submitting your job(s), if you did not specify any memory requirements, SGE will set a default memory setting of 2GB (See Using a default SGE specification file .sge_request) If you need more memory that 2GB for your program, you will need to add a memory resource requirement to your qsub (or qrsh ) command. Use a command like:
qsub -cwd -l mem_free=MEM_NEEDED,h_vmem=MEM_MAX batch.sh
or
qrsh -l mem_free=MEM_NEEDED,h_vmem=MEM_MAX
where MEM_NEEDED is the amount of memory (in megabytes M, or gigabytes G) that your job will require and MEM_MAX is the upper bound on the amount of memory your job is allowed to use. Typically h_vmem is set to be the same as mem_free. For example, if your job will require 6GB of memory, you could type
qsub -cwd -l mem_free=6G,h_vmem=6G batch.sh
In the above case, your job would go to a node with at least 6GB of memory available at the time the job starts and the job would automatically be stopped if it exceeded 6GB of memory usage at any time.
See Notes about specifying resource requirements discussion
Using a default SGE specification file .sge_request
One easy way to always have a default limit on your jobs and sessions is to put a line like this
-l mem_free=5G,h_vmem=5G
in a file named .sge_request in your home directory. Then, if you do not specify those parameters on your command line, the defaults from the file .sge_request will be used. Obviously, if you need higher limits on a given job you need to override those defaults with appropriate command line options. Other options that you might use can also go in this defaults file.
All new users get the following .sge_request defaults file:
# # Set defaults for mem_free and h_vmem -l mem_free=2G,h_vmem=2G # # Set the standard value for stack size limit # (needed for some programs to run properly when h_vmem is set) -l h_stack=256M # # Set a default maximum file size that an SGE job can create -l h_fsize=10G
Note the default file size limit set to 10G.
Notes about specifying resource requirements
The -l before the mem_free etc. specifications is a ‘minus’ follwed by the ‘lower-case letter L’.
There are no spaces in the comma delimited list of resources and limits.
Here are some notes explaining the use of mem_free and h_vmem, as well as h_fsize :
————————————————
mem_free
This should be set to be the what you think your job will need (or a little more). This will reserve memory for your job on the node that it is run on.
h_vmem
This is the “high water mark” for memory for your job. This should be set to be equal to, your mem_free request. You should combine mem_free and h_vmem as so:
$ qsub -cwd -l mem_free=12G,h_vmem=12G batch.sh
… or similarly on a qrsh command.
$ qrsh -l mem_free=12G,h_vmem=12G
h_fsize
By default, all users have an 10GB file size limit set ofr their jobs. You can increase this by specifying an “h_fsize” option for your job. Something like:
qsub -cwd -l mem_free=12G,h_vmem=12G,h_fsize=20G batch.sh
The qsub script filename would always be the last thing specified on the qsub command line (unless there are also options that need to be supplied to the script file).
If you usually invoke a program (such as R) when you type your qrsh command, it might look like this:
qrsh -l mem_free=5G,h_vmem=5G R
To see a summary of available nodes and their memory capacity and current load, use the command qpic -s .
After submitting your job with qsub, use the qstat command to see which queue (node) your job actually went to (see Checking the status of your job).
A special note about h_vmem and h_stack
If you use h_vmem, as we request (see Specifying your job’s memory needs), to limit the amount of memory that can be used by your qrsh or qsub, then you might also want to specify the h_stackvalue explicitly as well or you may encounter a problem when using interactive MATLAB or other programs and packages that use Tcl/Tk libraries. It seems to affect only those programs and packages which use Tcl/Tk at some level but we have seen the issue in a few other cases as well. So, your qrsh or qsub command should include something like this:
-l mem_free=10G,h_vmem=10G,h_stack=256M
The mem_free and h_vmem values may vary according to your needs but the h_stack value should always be 256M (as far as we know from our experience thus far).
Requesting multiple SGE slots for a multi-core job
To run a multi-core job on a single node you must request a parallel environment with multiple slots (where one slot corresponds to one CPU-core). In addition, if the cluster is busy, it is critical to use the reserve option for your job, The reserve option allow your job to reserve a spot on a node, even if there are not sufficient cores on the node to start your job at that moment. The reserve option allows your job to get a foot in the door on a node, and prevents the node from being constantly loaded by single-core jobs. Lastly, when requesting RAM, you will need to divide the total RAM needed for your job by the number of cores you are requesting, as RAM is a per-core resource in SGE. The following options will need to be set for your SGE job when using multiple cores:
- Use the
-pe local K
option to request K slots on a single compute node. - Use the
-R y
option to turn on slot reservation. - Set the
mem_free
andh_vmem
RAM limits equal to your total RAM requirements divided by the number of cores you are requesting.
For example, suppose your job needs 8 slots, and you want to use slot reservation because the cluster is busy, and you expect your job will need 48G in total (so 6GB per core). You would submit the job with the following command:
qsub -pe local 8 -R y -l mem_free=6G,h_vmem=6G myScript.sh
Note: The “local” parallel environment is a construct name particular to our cluster … and implies that your request is for that many slots on a single cluster node.
Checking your job’s memory usage
While your qsub job is running you can see it’s memory usage using the command
qstat -j NNNNN | grep vmem
where NNNNN is your specific cluster job number … look at the “vmem” and “maxvmem” entries.
To make it easier to monitor memory usage for your currently running jobs, we have created the command
qmem
If you have no jobs running on the cluster qmem will print nothing, but if you do, the results will look something like:
[jhpce01]$ qmem 10506 maryj node=33 vmem=289.1M, maxvmem=294.3M howMany10.sh 14257 maryj node=8 vmem=231.5M, maxvmem=238.0M s.all.sh 16695 maryj node=25 vmem= 1.8G, maxvmem= 1.8G mergedoc1.3.sh 17464 maryj node=15 vmem=272.9M, maxvmem=284.0M simulateVariance.sh 17555 maryj node=12 vmem=N/A, maxvmem=N/A QRLOGIN 17584 maryj node=6 vmem=315.1M, maxvmem=334.3M calculateVaried-emp.genSampScheme.sh
To see your job’s memory usage upon job completion, use email notification, which works for aborted jobs as well. See the job status via email discussion for instructions on how to use email notification.
Note: qrsh sessions will not report memory usage using the above method. You will simply see “N/A” in the entries for vmem and maxvmem as shown in the above example..
Checking the status of your job
After submitting your job you can use qstat to look at the status of your job. By default, under our version of SGE, qstat with no arguments shows cluster jobs for all users. To restrict the output to show only your jobs, use the -u USERID argument. For example:
qstat -u maryj
would only display active/pending jobs for user maryj.
Under the state column you can see the status of your job. Some of the codes are:
- r: the job is running
- t: the job is being transferred to a cluster node. Some jobs may remain in the “t” state during hte duration of their run.
- qw: the job is queued (and not running yet)
- Eqw: an error occurred with the job. You will likely need to kill the job and verify that the job parameters are correct before resubmitting. If you are certain the parameters are correct, please email bitsupport, and someone can clear the error state.
- Rr: the job had to be restarted, likely because had been running on a node that crashed, and the job is now running on a compute node
- Rq: the job had to be restarted, likely because had been running on a node that crashed, and the job is now in queue waiting to run.
You can look at the manual page for qstat (type man qstat at the prompt) to get more information on the state codes.
Another important thing to note is the job-ID for your job. You need to know this if you ever want to make changes to your job. For example, to delete your job from the cluster, you can run
qdel 15299
where 15299 is the job-ID I got from running qstat.
Job status via email
If you wish to be notified via email when your job’s status changes, include options like the following when submitting your jobs:
qsub -m e -M your_email@jhu.edu your_job.sh
which means send email to given address(es) when the job ends.
If you want to automatically have such options (or others) always added to your job(s), simply put them in a file named .sge_request in your home directory. You can also have working-directory-specific .sge_request files (see the man page for sge_request – man sge_request).
Lines like this in your .sge_request file:
-M your_email@jhu.edu -m e
will cause an email to be sent, when your job ends, for every cluster job that you start (including, for what it’s worth, a qrsh ‘job’).
You could use -m n on individual qsub job command lines to suppress email notification for certain jobs.
Or better yet, … you might only put the -M your_email@jhsph.edu in the .sge_request file and simply use the -m e option on jobs for which you want email notification.
Note: You may also invoke the options shown above (and others) by including special lines at the top of your job shell scripts. Lines beginning with #$ are interpreted as qsub options for that job. For example, if the first few lines of your script look like the following:
#!/bin/bash #$ -M joe_x@gmail.com #$ -m e
The lines beginning with #$ would cause SGE to send email to ‘joe_x@gmail.com’ when the job ends.
#$ -m be
would cause an email to be sent when the job begins (‘b’) and ends (‘e’). See the manual page for qsub (type man qsub at a shell prompt ) to get more information.
How many jobs can I submit?
We do not limit the number of jobs that you submit. However, the more jobs you submit, the more effort SGE expends trying to figure out which jobs should go where. This can lead to problems for other users who are trying to submit their jobs. So as a practical matter, please do not keep from the 10,000 jobs in the input queue.
Only a limited number of your submitted jobs will run in a slot. The rest will have the queue wait state ‘qw’ and will start as your other jobs finish. In our SGE configuration, a slot generally corresponds to a single cpu-core.
For multithreaded jobs see Requesting multiple SGE slots for a multi-core job above.
The maximum number of slots per user may change depending on the availability of cluster resources or special needs and requests.
Currently, the maximum number of slots in the shared queue is 200.
There are dedicated queues for stakeholders which may have custom configurations.
The shared queue – shared.q
On the cluster, the shared queue is the default queue. Currently, the shared queue has no time limit. We limit the number of jobs on the shared queue so that no one user can monopolize the entire queue for days and weeks at a time. On the shared queue, for the time being, we have set the maximum number of slots per user to 200. We also limit the total amount of memory that a user can use for jobs on the shared queue. That limit is currently 1TB.
The number of job/slots that can be run by a user may be limited by the availability of cpu slots as well as an automated “functional sharing” policy that takes into account demand and usage by all other users. If you encounter a situation where there are no available shared.q queue slots within a reasonable amount of time, please contact us.
Please note that, by default, the shared queue will be used for your job, unless you specify a specific queue. You do not need to specify the shared queue in your qsub or qrsh request. In fact, specifically requesting the shared queue (i.e. “qrsh -l shared”) will cause your job to fail.
Questions and/or comments
Please send any questions or comments about this document to bit support@lists.jhu.edu