FAQs

Grid Engine

How do I copy a large directory structure from one place to another.

As an example, to copy a directory tree from /home/bst/bob/src to /dcs01/bob/dst, first, create a cluster script, let’s call it “copy-job“, that contains the line:

rsync -avzh /home/bst/bob/src/ /dcs01/bob/dst/

Next, submit a batch job to the cluster

qsub -cwd -m e -M bob@jhu.edu copy-job

This will submit the “copy-job” script to the cluster, which will run the job on one of the compute nodes, and send an email when it finishes.

My app is complaining that it can’t find a shared library, e.g. libgfortran.so.1 could you please install it?

We would guess that 9 times out of 10, the allegedly missing library is there. The problem is that your application is looking for the version of the library that is compatible with the old system software. It will not help to point your application to the new libraries. They are more than likely to be incompatible with the new system and we won’t help you debug any problems  if you try to do this. The correct solution is to reinstall your software. If the problem persists after the reinstallation, then please contact us and we will install standard libraries that are actually missing.

My app claims it’s out of disk space, but I see there is plenty of space, what gives?

By default, every user should have a .sge_request file in their home directory.  This file contains a line like this:

-l h_fsize=10G

This limits the size of all created files to 10GB.  If you plan on creating larger files you should increase this limit, either in the .sge_request file before you start your qrsh session, or in the batch script you submit via qsub. From the command line,  you would start a qrsh session as follows:

qrsh  -l h_fsize=300G

ssh gave a scary warning: REMOTE HOST IDENTIFICATION HAS CHANGED!

Go into the ~/.ssh directory of your laptop/desktop and edit the known_hosts file.
Search for the line that starts with the host that you ssh’d to. Delete that line (it is probably a long line that wraps). Then try again.

Why aren’t SGE commands, or R, or matlab, or… available to my cron job?

cron jobs are not launched from a login shell, but the module commands and the JHPCE default environment is initialized automatically only when you log in. Consequently, in a cron job, you have to do the initialization yourself. Do this by wrapping your cron job in a bash script that initializes the module command and then loads the default sge modules. You bash shell script should start with the following lines:

#!/bin/bash

# Source the global bashrc
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
module load JHPCE_DEFAULT_ENV

This should allow your cron jobs to run within SGE.

Why does running “firefox” crash from a compute node.

It appears that firefox is allocating RAM based on the stack size setting for your qrsh session. To run firefox, you can either:

  • Requesting 40GB of RAM (qrsh -l mem_free=40G,h_vmem=40G)
  • Reducing stack size from the default 128M to 8M (qrsh -l h_stack=8M)
  • Using qrsh as normal, then reducing stack size by running “ulimit” (qrsh ; ulimit -s 8192 -S 8192)

1 2

General Unix/Linux

How do I copy a large directory structure from one place to another.

As an example, to copy a directory tree from /home/bst/bob/src to /dcs01/bob/dst, first, create a cluster script, let’s call it “copy-job“, that contains the line:

rsync -avzh /home/bst/bob/src/ /dcs01/bob/dst/

Next, submit a batch job to the cluster

qsub -cwd -m e -M bob@jhu.edu copy-job

This will submit the “copy-job” script to the cluster, which will run the job on one of the compute nodes, and send an email when it finishes.

My app is complaining that it can’t find a shared library, e.g. libgfortran.so.1 could you please install it?

We would guess that 9 times out of 10, the allegedly missing library is there. The problem is that your application is looking for the version of the library that is compatible with the old system software. It will not help to point your application to the new libraries. They are more than likely to be incompatible with the new system and we won’t help you debug any problems  if you try to do this. The correct solution is to reinstall your software. If the problem persists after the reinstallation, then please contact us and we will install standard libraries that are actually missing.

My app claims it’s out of disk space, but I see there is plenty of space, what gives?

By default, every user should have a .sge_request file in their home directory.  This file contains a line like this:

-l h_fsize=10G

This limits the size of all created files to 10GB.  If you plan on creating larger files you should increase this limit, either in the .sge_request file before you start your qrsh session, or in the batch script you submit via qsub. From the command line,  you would start a qrsh session as follows:

qrsh  -l h_fsize=300G

ssh gave a scary warning: REMOTE HOST IDENTIFICATION HAS CHANGED!

Go into the ~/.ssh directory of your laptop/desktop and edit the known_hosts file.
Search for the line that starts with the host that you ssh’d to. Delete that line (it is probably a long line that wraps). Then try again.

Why aren’t SGE commands, or R, or matlab, or… available to my cron job?

cron jobs are not launched from a login shell, but the module commands and the JHPCE default environment is initialized automatically only when you log in. Consequently, in a cron job, you have to do the initialization yourself. Do this by wrapping your cron job in a bash script that initializes the module command and then loads the default sge modules. You bash shell script should start with the following lines:

#!/bin/bash

# Source the global bashrc
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
module load JHPCE_DEFAULT_ENV

This should allow your cron jobs to run within SGE.

Why does running “firefox” crash from a compute node.

It appears that firefox is allocating RAM based on the stack size setting for your qrsh session. To run firefox, you can either:

  • Requesting 40GB of RAM (qrsh -l mem_free=40G,h_vmem=40G)
  • Reducing stack size from the default 128M to 8M (qrsh -l h_stack=8M)
  • Using qrsh as normal, then reducing stack size by running “ulimit” (qrsh ; ulimit -s 8192 -S 8192)

1 2

Environment Modules

How do I copy a large directory structure from one place to another.

As an example, to copy a directory tree from /home/bst/bob/src to /dcs01/bob/dst, first, create a cluster script, let’s call it “copy-job“, that contains the line:

rsync -avzh /home/bst/bob/src/ /dcs01/bob/dst/

Next, submit a batch job to the cluster

qsub -cwd -m e -M bob@jhu.edu copy-job

This will submit the “copy-job” script to the cluster, which will run the job on one of the compute nodes, and send an email when it finishes.

My app is complaining that it can’t find a shared library, e.g. libgfortran.so.1 could you please install it?

We would guess that 9 times out of 10, the allegedly missing library is there. The problem is that your application is looking for the version of the library that is compatible with the old system software. It will not help to point your application to the new libraries. They are more than likely to be incompatible with the new system and we won’t help you debug any problems  if you try to do this. The correct solution is to reinstall your software. If the problem persists after the reinstallation, then please contact us and we will install standard libraries that are actually missing.

My app claims it’s out of disk space, but I see there is plenty of space, what gives?

By default, every user should have a .sge_request file in their home directory.  This file contains a line like this:

-l h_fsize=10G

This limits the size of all created files to 10GB.  If you plan on creating larger files you should increase this limit, either in the .sge_request file before you start your qrsh session, or in the batch script you submit via qsub. From the command line,  you would start a qrsh session as follows:

qrsh  -l h_fsize=300G

ssh gave a scary warning: REMOTE HOST IDENTIFICATION HAS CHANGED!

Go into the ~/.ssh directory of your laptop/desktop and edit the known_hosts file.
Search for the line that starts with the host that you ssh’d to. Delete that line (it is probably a long line that wraps). Then try again.

Why aren’t SGE commands, or R, or matlab, or… available to my cron job?

cron jobs are not launched from a login shell, but the module commands and the JHPCE default environment is initialized automatically only when you log in. Consequently, in a cron job, you have to do the initialization yourself. Do this by wrapping your cron job in a bash script that initializes the module command and then loads the default sge modules. You bash shell script should start with the following lines:

#!/bin/bash

# Source the global bashrc
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
module load JHPCE_DEFAULT_ENV

This should allow your cron jobs to run within SGE.

Why does running “firefox” crash from a compute node.

It appears that firefox is allocating RAM based on the stack size setting for your qrsh session. To run firefox, you can either:

  • Requesting 40GB of RAM (qrsh -l mem_free=40G,h_vmem=40G)
  • Reducing stack size from the default 128M to 8M (qrsh -l h_stack=8M)
  • Using qrsh as normal, then reducing stack size by running “ulimit” (qrsh ; ulimit -s 8192 -S 8192)

1 2