HOW-TO

Table of Contents

  1. Running interactive jobs on cluster nodes
  2. Submitting batch jobs from the login node
  3. Two steps to submitting an R BATCH job to the cluster
  4. Specifying your job’s memory needs
  5. Using a default SGE specification file .sge_request
  6. A special note about h_vmem and h_stack
  7. Notes about specifying resource requirements
  8. Requesting multiple SGE slots for a multi-core job
  9. Checking your job’s memory usage
  10. Checking the status of your job
  11. Job status via email
  12. How many jobs can I submit?
  13. The shared queue – shared.q
  14. Questions and comments

Setting up keypair and 2-factor authentication

Passwords are not very secure. Don’t believe us? Read this scary article from wired magazine. We prefer that you use key-pair authentication for logins from your normal machine or 2-factor authentication when logging in from any  other machine. Currently these are optional but they will soon be required, so we recommend that you set up one or both of these immediately.

Configure key-pair authentication: key-pair authentication
Configure 2-factor authentication:  2-factor authentication

Using the cluster

The cluster can be accessed via the machine jhpce01.jhsph.edu. To obtain an account on this machine, please refer to the instructions at http://jhpce.jhu.edu/knowledge-base/new-user-orientation/

Everything related to job submission, scheduling, and execution on the cluster is under the control of Sun Grid Engine (SGE). The Grid Engine project was sponsored by Sun Microsystems as an open source community effort to facilitate the adoption of distributed computing solutions.

By default, one job uses one slot, but see Requesting multiple SGE slots for a multi-core job. You may submit jobs which request more than your slot limit and they will all be queued to run as your other jobs finish. When the cluster nodes are all at maximum capacity, jobs waiting to run will be subject to a functional share priority algorithm.

IMPORTANT

The logon machine (jhpce01/jhpce02) is only used for login. Do NOT run jobs on jhpce01! This machine is not for doing any sort of computation. Rather, it is ONLY for text editing and for submitting jobs to the cluster. Any compute-intensive jobs found running on enigma2 will be KILLED WITHOUT NOTICE. You will lose any data and/or computations associated with the running job.

(Back to top)

Running interactive jobs on cluster nodes

Users are not allowed to ssh or rsh from the login host directly to a compute node. Instead users must use the  qrsh or rlogin command to start an interactive job on a cluster node. These commands launch a remote shell on a cluster node. The easiest thing to do after you log into the login host is to type

 qrsh

*** Be sure to see the section   Specifying your job’s memory needs

You will be logged into a random cluster node and get an interactive shell prompt, just as if you logged into jhpce01. Now you can run whatever program you want. For example, you can run R. However, you must remember to logout (‘exit’ or ‘CTRL D’). Otherwise, you will be taking up a slot in the queue which will not be available to others.

While you are logged into a cluster node via   qrsh , if you run

qstat

you’ll see something like the following:

job-ID  prior   name       user         state submit/start at     queue                     slots
-------------------------------------------------------------------------------------------------------
  15194 1.53962 Pf_3D7     maryj        r     08/15/2016 16:04:16 shared.q@compute-076      1
  15299 2.00790 BootA10600 maryj        r     08/17/2016 15:26:06 shared.q@compute-080      1
  15290 2.35449 QRLOGIN    maryj        r     08/17/2016 15:20:00 shared.q@compute-111      1

The job labeled QRLOGIN is the interactive session ( for more info see Checking the status of your job).

You may also specify memory requirements or special queues on your qrsh command just as you do on the qsub command (see below). For interactive work we strongly encourage users to work on the cluster via qrsh (rather than use jhpce01).

NOTE: DO NOT run background or ‘nohup’ jobs on the cluster. Sun Grid Engine (SGE) must know about your job/session so that it can manage and account for cluster resources. By default, SGE assumes one slot (corresponding to one CPU core) for each qrsh session.

If you still have running programs and no session appears for you in qstat, then you have done something inappropriate. If jobs are found running on cluster nodes with no associated SGE entry, they will be killed without notice.

NOTE: If you encounter an error while running a program interactively on a cluster node and your program crashes, it still might be in the cluster’s process queue. If you don’t quit out of your program normally, make sure to check the cluster queue (via qstat, see below) and see if your (interactive) job is still there. If it is, get the job-ID and kill the job using qdel.
(Back to top)

Submitting batch jobs

As we have indicated, the cluster uses Sun’s Grid Engine (SGE) to control scheduling of jobs. SGE comes with a number of command line tools to help you manage your jobs on the cluster. Most of the time you can get away with just knowing a few commands. The most immediately relevant ones are:

  1. qsub:   submit a batch job to the cluster
  2. qstat:   query the status of your jobs on the cluster (or look at all jobs running on the cluster)
  3. qdel:   delete a job from the cluster/queue.

IMPORTANT: Every job that you submit to the cluster from jhpce01 must be wrapped in a shell script. That is, you cannot just start a program from the command line (e.g. nice +19 R) like you could on your own machine or some other server. Not to worry, though; wrapping your program with a shell script is not as difficult as it might sound! For example: below are instructions for how to run an R batch job on the cluster.
 

(Back to top)

Two steps to submitting an R BATCH job to the cluster

Using the Sun Grid Engine to submit an R BATCH job to the cluster is very simple.

  1. First (on jhpce01), assuming you have an R program in a file named  mycommands.R , you need to create a new file that will invoke and run your R program when it is submitted to the cluster. Let’s call this new file  batch.sh . You should put this  batch.sh  file in the same directory as your  mycommands.R  file.To run an R BATCH job on the cluster using the  mycommands.R  file, your  batch.sh  file need only have this one line in it, like this:
    R CMD BATCH mycommands.R

    The file might have other lines in it to specify SGE job options or commands to run before or after the “R CMD BATCH …” line. The technical name for this file is “shell script”. Knowing this might help you communicate with the system administrator.

  2. Once you’ve written your short  batch.sh  file you could submit it to the cluster via the command
    qsub -cwd batch.sh

    *** Be sure to see the section   Specifying your job’s memory needs

    The -cwd option tells SGE to execute the batch.sh script on the cluster from the current working directory (otherwise, it will run from your home directory, which is probably not what you want).

That’s all you have to do! There are a few things to note:

  • You do not have to put an & at the end of the line (don’t worry if you don’t know what the & might be used for).    qsub automatically sends your job to the cluster and returns to your command line prompt so that you can do other things.
  • After submitting your job with qsub, you can use the qstat command to see the status of your job(s).

(Back to top)

Specifying your job’s memory needs

When submitting your job(s), if you did not specify any memory requirements, SGE will set a default memory setting of 2GB (See Using a default SGE specification file .sge_request) If you need more memory that 2GB for your program, you will need to add a memory resource requirement to your qsub (or qrsh ) command. Use a command like:

qsub -cwd -l mem_free=MEM_NEEDED,h_vmem=MEM_MAX  batch.sh

or

qrsh -l mem_free=MEM_NEEDED,h_vmem=MEM_MAX

where MEM_NEEDED is the amount of memory (in megabytes M, or gigabytes G) that your job will require and MEM_MAX is the upper bound on the amount of memory your job is allowed to use. Typically h_vmem is set to be the same as mem_free. For example, if your job will require 6GB of memory, you could type

qsub -cwd -l mem_free=6G,h_vmem=6G  batch.sh

In the above case, your job would go to a node with at least 6GB of memory available at the time the job starts and the job would automatically be stopped if it exceeded 6GB of memory usage at any time.

See Notes about specifying resource requirements discussion

(Back to top)

Using a default SGE specification file .sge_request

One easy way to always have a default limit on your jobs and sessions is to put a line like this

   -l mem_free=5G,h_vmem=6G

in a file named .sge_request in your home directory. Then, if you do not specify those parameters on your command line, the defaults from the file .sge_request will be used. Obviously, if you need higher limits on a given job you need to override those defaults with appropriate command line options. Other options that you might use can also go in this defaults file.

All new users get the following .sge_request defaults file:

#
# Set defaults for mem_free and h_vmem
-l mem_free=2G,h_vmem=2G
#
# Set the standard value for stack size limit
# (needed for some programs to run properly when h_vmem is set)
-l h_stack=256M
#
# Set a default maximum file size that an SGE job can create
-l h_fsize=10G

Note the default file size limit set to 10G.

(Back to top)

Notes about specifying resource requirements

The -l before the mem_free etc. specifications is a ‘minus’ follwed by the ‘lower-case letter L’.

There are no spaces in the comma delimited list of resources and limits.

Here are some notes explaining the use of   mem_free and   h_vmem, as well as   h_fsize :

————————————————
mem_free 
This should be set to be the what you think your job will need (or a little more). This will reserve memory for your job on the node that it is run on.

h_vmem 
This is the “high water mark” for memory for your job.  This should be set to be equal to, your mem_free request.  You should combine mem_free and h_vmem as so:

   $ qsub -cwd -l mem_free=12G,h_vmem=12G batch.sh 

… or similarly on a qrsh command.

$ qrsh -l mem_free=12G,h_vmem=12G

h_fsize 
By default, all users have an 10GB file size limit set ofr their jobs.  You can increase this by specifying  an “h_fsize” option for your job.  Something like:    

qsub -cwd -l mem_free=12G,h_vmem=12G,h_fsize=20G batch.sh

The qsub script filename would always be the last thing specified on the qsub command line (unless there are also options that need to be supplied to the script file).
If you usually invoke a program (such as R) when you type your qrsh command, it might look like this:

   qrsh -l mem_free=5G,h_vmem=5G  R

To see a summary of available nodes and their memory capacity and current load, use the command   qpic -s   .

After submitting your job with qsub, use the  qstat command to see which queue (node) your job actually went to (see Checking the status of your job).

(Back to top)

A special note about h_vmem and h_stack

If you use h_vmem, as we request (see Specifying your job’s memory needs), to limit the amount of memory that can be used by your qrsh or qsub, then you might also want to specify the h_stackvalue explicitly as well or you may encounter a problem when using interactive MATLAB or other programs and packages that use Tcl/Tk libraries. It seems to affect only those programs and packages which use Tcl/Tk at some level but we have seen the issue in a few other cases as well. So, your qrsh or qsub command should include something like this:

   -l mem_free=10G,h_vmem=10G,h_stack=256M

The mem_free and h_vmem values may vary according to your needs but the h_stack value should always be 256M (as far as we know from our experience thus far).

(Back to top)

Requesting multiple SGE slots for a multi-core job

To run a multi-core job on a single node you must request a parallel environment with multiple slots (where one slot corresponds to one CPU-core). In addition, if the cluster is busy, it is critical to tell SGE to reserve slots for your job, otherwise, a sufficient number of slots may never become available for your job. The problem is that your multi-slot job gets “starved” because individual slots are filled by single-slot jobs as quickly as they are freed up by terminating jobs.

  1. Use the   -pe local K   option to request K slots on a single cluster node.
  2. use the -R y option to turn on slot reservation.
  3. Use the  mem_free=NG  option to specify N Gigabytes of memory your job.  Important: the value, n, you set in mem_free is the total memory you need divided by the number of slots specified for -pe. In other words n=N/K.
  4. Use h_vmem= nG to set the hard memory limit for your job. Important: the value, n, you set in h_vmem is the total memory you set via  mem_free divided by the number of slots specified for -pe. In other words n=N/K.

For example, suppose your job needs 6 slots and we want to use slot reservation because the cluster is busy. Moreover, suppose we expect our job to need 36G. We would submit the job with the following command:

     qsub -pe local 6 -R y -l mem_free=6G,h_vmem=6G  myScript.sh

Note: The “local” parallel environment is a construct name particular to our cluster … and implies that your request is for that many slots on a single cluster node.

(Back to top)

Checking your job’s memory usage

While your qsub job is running you can see it’s memory usage using the command

    qstat -j NNNNN | grep vmem

where NNNNN is your specific cluster job number … look at the “vmem” and “maxvmem” entries.

To make it easier to monitor memory usage for your currently running jobs, we have created the command

   qmem

If you have no jobs running on the cluster qmem will print nothing, but if you do, the results will look something like:

[jhpce01]$ qmem
10506 maryj node=33       vmem=289.1M, maxvmem=294.3M      howMany10.sh
14257 maryj node=8        vmem=231.5M, maxvmem=238.0M      s.all.sh
16695 maryj node=25       vmem=  1.8G, maxvmem=  1.8G      mergedoc1.3.sh
17464 maryj node=15       vmem=272.9M, maxvmem=284.0M      simulateVariance.sh
17555 maryj node=12       vmem=N/A, maxvmem=N/A            QRLOGIN
17584 maryj node=6        vmem=315.1M, maxvmem=334.3M      calculateVaried-emp.genSampScheme.sh

To see your job’s memory usage upon job completion, use email notification, which works for aborted jobs as well. See the job status via email discussion for instructions on how to use email notification.

Note: qrsh sessions will not report memory usage using the above method. You will simply see “N/A” in the entries for vmem and maxvmem as shown in the above example..

(Back to top)

Checking the status of your job

After submitting your job you can use qstat to look at the status of your job. By default, under our version of SGE, qstat with no arguments shows cluster jobs for all users. To restrict the output to show only your jobs, use the -u USERID argument. For example:

    qstat -u maryj

would only display active/pending jobs for user maryj.

Under the state column you can see the status of your job. Some of the codes are

  • r: the job is running
  • t: the job is being transferred to a cluster node
  • qw: the job is queued (and not running yet)
  • Eqw: an error occurred with the job

You can look at the manual page for qstat (type man qstat at the prompt) to get more information on the state codes.

Another important thing to note is the job-ID for your job. You need to know this if you ever want to make changes to your job. For example, to delete your job from the cluster, you can run

    qdel 15299

where 15299 is the job-ID   I got from running qstat.

(Back to top)

Job status via email

If you wish to be notified via email when your job’s status changes, include options like the following when submitting your jobs:

    qsub  -m e  -M your_email@jhu.edu   your_job.sh

which means send email to given address(es) when the job ends.

If you want to automatically have such options (or others) always added to your job(s), simply put them in a file named .sge_request in your home directory. You can also have working-directory-specific .sge_request files (see the man page for sge_request – man sge_request).

Lines like this in your .sge_request file:

-M your_email@jhu.edu
-m e

will cause an email to be sent, when your job ends, for every cluster job that you start (including, for what it’s worth, a qrsh ‘job’).

You could use   -m n on individual qsub job command lines to suppress email notification for certain jobs.

Or better yet, … you might only put the -M your_email@jhsph.edu in the .sge_request file and simply use the -m e option on jobs for which you want email notification.

Note: You may also invoke the options shown above (and others) by including special lines at the top of your job shell scripts. Lines beginning with #$ are interpreted as qsub options for that job. For example, if the first few lines of your script look like the following:

#!/bin/bash
#$ -M joe_x@gmail.com
#$ -m e

The lines beginning with #$ would cause SGE to send email to ‘joe_x@gmail.com’ when the job ends.

#$ -m be

would cause an email to be sent when the job begins (‘b’) and ends (‘e’). See the manual page for qsub (type man qsub at a shell prompt ) to get more information.

(Back to top)

How many jobs can I submit?

We do not limit the number of jobs that you submit. However, the more jobs you submit, the more effort SGE expends trying to figure out which jobs should go where. This can lead to problems for other users who are trying to submit their jobs. So as a practical matter, please do not keep from the 10,000 jobs in the input queue.

Only a limited number of your submitted jobs will run in a slot. The rest will have the queue wait state ‘qw’ and will start as your other jobs finish. In our SGE configuration, a slot generally corresponds to a single cpu-core.

For multithreaded jobs see Requesting multiple SGE slots for a multi-core job above.
The maximum number of slots per user may change depending on the availability of cluster resources or special needs and requests.

Currently, the maximum number of slots in the shared queue is  200.
There are dedicated queues for stakeholders which may have custom configurations.

(Back to top)


The shared queue – shared.q

On the  cluster, the shared queue is the default queue. Currently, the shared queue has no time limit.  We  limit the number of jobs on the shared queue so that no one user can monopolize the entire queue for days and weeks at a time. On the shared queue, for the time being, we have set the maximum number of slots per user to 200.  We also limit the total amount of memory that a user can use for jobs on the shared queue.  That limit is currently 1TB.

The number of job/slots that can be run by a user may be limited by the availability of cpu slots as well as an automated “functional sharing” policy that takes into account demand and usage by all other users. If you encounter a situation where there are no available shared.q queue slots within a reasonable amount of time, please contact us.

Please note that, by default, the shared queue will be used for your job, unless you specify a specific queue.  You do not need to specify the shared queue in your qsub or qrsh request.  In fact, specifically requesting the shared queue (i.e. “qrsh -l shared”) will cause your job to fail.

(Back to top)

Questions and/or comments

Please send any questions or comments about this document to  bit support@lists.jhu.edu

(Back to top)