Skip to content

Using GPUs on JHPCE

Overview

We have a number of GPU nodes on the JHPCE cluster that are available for general use. This page provides information about our collection of GPU resources, shows how to access them, and gives examples of running programs and submitting jobs that utilizes a GPU.

GPU resources are limited - please consider your fellow users

Please use a GPU node only for performing GPU calculations. If you can do preliminary or post analysis work on a normal node, please craft your jobs to do that, as opposed to putting everything into a single job. You can create dependencies between jobs which launch a specific one when another has successfully completed.

Please use the GPU cards with the lowest memory which meet your jobs needs. The latest model cards will perform calculations more quickly. However, they also tend to have the most memory. If you run a low-memory-requiring job on the latest cards, you will be blocking other users with larger memory needs.

GPU-capable partitions

There are several partitions with GPUs in them, documented in our partitions page. By default all users should only access the "gpu" partition. The other partitions are PI-specific partitions for groups which have purchased GPUs for the JHPCE cluster. The partitions beginning with "bst" are for members of the Biostatics department.

Please note the different time limits set on the various GPU partitions.

Accessing an interactive session on a GPU node

You can see what GPUs are currently available by running the slurmpic -g command. The GPUS column will show how many GPUs are in use and how many are available on each node:

From the JHPCE SLURM cluster login node, you can access a GPU node interactively by using the “–partition gpu” and “–gpus” options to the srun command. You can also supply traditional options to srun, and you may find that you need to request additional system RAM for your program to run. Here is an example of how to request a single GPU on the “gpu” partition. You can run “nvidia-smi” to see the GPU that you’ve been assigned.

[login31 /users/mmill116]$ srun --pty --x11 --partition gpu --gpus=1 --mem=20G bash 
[compute-117 /users/mmill116]$ nvidia-smi 
 Tue Nov 28 08:39:23 2023       
 +---------------------------------------------------------------------------------------+
 | NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
 |-----------------------------------------+----------------------+----------------------+
 | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
 |                                         |                      |               MIG M. |
 |=========================================+======================+======================|
 |   0  Tesla V100-PCIE-32GB           On  | 00000000:89:00.0 Off |                    0 |
 | N/A   41C    P0              28W / 250W |      0MiB / 32768MiB |      0%      Default |
 |                                         |                      |                  N/A |
 +-----------------------------------------+----------------------+----------------------+
 +---------------------------------------------------------------------------------------+
 | Processes:                                                                            |
 |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
 |        ID   ID                                                             Usage      |
 |=======================================================================================|
 |  No running processes found                                                           |
 +---------------------------------------------------------------------------------------+

JHPCE GPU nodes and cards

As of November 2024, we have the following GPUs available on the gpu partition:

Node Name Description GPU Count
compute-117 Our first GPU node.
2 Intel(R) Xeon(R) Silver 4116 CPUs and 384GB of RAM
2 Nvidia V100 GPUs with 32GB RAM
1 Nvidia Titan V with 11GB RAM
compute-123 The first Biostat GPU node
2 Intel(R) Xeon(R) Silver 4210R CPUs and 768GB of RAM
4 Nvidia V100s GPUs with 32GB RAM
compute-126 One of the Lieber (Collado) GPU nodes
2 Intel(R) Xeon(R) Gold 5317 CPUs and 512GB of RAM
4 Nvidia A100 GPUs with 80GB RAM
compute-128 One of the Feinberg Lab (CEGS) GPU nodes
2 Intel(R) Xeon(R) Gold 5317 CPUs and 512GB of RAM
4 Nvidia A100 GPUs with 80GB RAM
compute-170 The second Biostat GPU node
1 Intel(R) Xeon(R) Silver 4510R CPUs and 1TB of RAM
2 Nvidia H100s GPUs with 96GB RAM
compute-171 - compute-173 Group of 3 Biostat GPU nodes, sharing GPUs with the general gpu queue.
1 AMD EPYC 7443P CPUs and 1TB of RAM (each node)
12 Nvidia L40S GPUs with 46GB RAM - 4 GPUs in each node

NVIDIA assigns different "Computing Capability" values to each GPU model. These values can be found here, and help one determine what the GPU is capable of doing. Details can be found here. Specific descriptions for each generation as well as some specific CUDA routines to use with each are here: 7.x, 8.x, and 9.x

GPU TypeGRES OptionComputing Capability
Nvidia Titan V with 11GB RAMtitanv7.0
Nvidia V100 with 32GB RAMtesv1007.0
Nvidia V100S with 32GB RAMtesv100s7.0
Nvidia A100 with 80GB RAMtesa1008.0
Nvidia H100 with 96GB RAMtesh1009.0
Nvidia L40S with 46GB RAMl40s8.9

Requesting specific GPU card types

In general it is better to allow the cluster to assign both nodes and GPU cards to you rather than requesting a specific node or a particular model, as certain models may not be available at the time you are trying to run your program.

However, you may need the features of a particular model or sets of models, such as RAM size.

If you want to request a particular model of GPU, you can use either the GRES name with the "--gres=gpu:" argument, or by using a particular Node Feature name with the "--constrain" argument.

If you want to request a set of acceptable models of GPU, use the Node Feature names. Examples of both are shown below.

Using the GRES name

You can request a particular model of GPU using the "GRES" (Generic RESources) argument to srun and sbatch.

The following GRES names are available for the various models of GPUS:

GPU TypeGRES Option
Nvidia Titan V with 11GB RAMtitanv
Nvidia V100 with 32GB RAMtesv100
Nvidia V100S with 32GB RAMtesv100s
Nvidia A100 with 80GB RAMtesa100
Nvidia H100 with 96GB RAMtesh100
Nvidia L40S with 46GB RAMl40s

You would use the value in the “GRES Option” column above to srun or sbatch.

In the below example, we are requesting an Nvidia Titan V GPU.

[login31 /users/mmill116]$ srun --pty --x11 --partition gpu --gres=gpu:titanv:1  bash
[compute-117 /users/mmill116]$ nvidia-smi
 Tue Nov 28 10:50:25 2023       
 +---------------------------------------------------------------------------------------+
 | NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
 |-----------------------------------------+----------------------+----------------------+
 | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
 |                                         |                      |               MIG M. |
 |=========================================+======================+======================|
 |   0  NVIDIA TITAN V                 On  | 00000000:B1:00.0 Off |                  N/A |
 | 28%   35C    P8              26W / 250W |      0MiB / 12288MiB |      0%      Default |
 |                                         |                      |                  N/A |
 +-----------------------------------------+----------------------+----------------------+
                                                                                          
 +---------------------------------------------------------------------------------------+
 | Processes:                                                                            |
 |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
 |        ID   ID                                                             Usage      |
 |=======================================================================================|
 |  No running processes found                                                           |
 +---------------------------------------------------------------------------------------+ 

Using node features names

Node feature names are strings the systems administrators have assigned to each node. You can read more about using features.

You can specify a single (or a list of) acceptable GPU models using the "--constrain" argument to sbatch and srun.

This is especially helpful when you want to select a number of GPU models that would be acceptible to you to use. For example, one could run the folloing to request a node with either an l40s, tesv100, or titanv.


[mmill116@jhpce01 ~]$ srun --pty -p gpu --gpus=1 --constrain="l40s|tesv100|titanv" bash 

Helpful SLURM job arguments

One commonly used option when using GPUs in a SLURM environment is the --gpu-bind=closest option. This will make sure the core/CPU that is assigned to your job is the closest one to the controling bus that the GPU is on. So if you are trying to get the most out of the performance of your GPU code, this option may be helpful. There is a really good description of GPU/CORE binding here.

Examples

At this point you can start running your GPU specific code. You can either install your own GPU-enabled programs, or use the version of python that is installed on the GPU nodes.

Running an MNIST tensorflow program

Below is an example of running an MNIST tensorflow program. The tensorflow and keras python modules have been installed on the GPU nodes, so you can use the default system version of python. This example comes from https://www.tensorflow.org/tutorials/quickstart/beginner

MNIST interactive job

[login31 /users/mmill116]$ srun --pty --x11 --partition gpu --gpus=1 --mem=10G bash
[compute-117 /users/mmill116]$ python
Python 3.9.16 (main, Dec  8 2022, 00:00:00) 
[GCC 11.3.1 20221121 (Red Hat 11.3.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2023-11-28 09:05:16.067766: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-28 09:05:16.850666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
>>> mnist = tf.keras.datasets.mnist
>>> (x_train, y_train), (x_test, y_test) = mnist.load_data()
>>> x_train, x_test = x_train / 255.0, x_test / 255.0
>>> model = tf.keras.models.Sequential([
...   tf.keras.layers.Flatten(input_shape=(28, 28)),
...   tf.keras.layers.Dense(128, activation='relu'),
...   tf.keras.layers.Dropout(0.2),
...   tf.keras.layers.Dense(10)
... ])
2023-11-28 09:05:31.942590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 31141 MB memory:  -> device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0
>>> predictions = model(x_train[:1]).numpy()
>>> predictions
array([[-0.745112  ,  0.49414337, -0.10749201, -0.23818162,  0.2159372 ,
        -0.38107562,  0.8540315 , -0.21077928,  0.04448523,  0.37432173]],
      dtype=float32)
>>> tf.nn.softmax(predictions).numpy()
array([[0.0417023 , 0.14399974, 0.07889961, 0.06923362, 0.10902808,
        0.06001488, 0.206376  , 0.07115702, 0.09184969, 0.12773912]],
      dtype=float32)
>>> loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
>>> loss_fn(y_train[:1], predictions).numpy()
2.8131628
>>> model.compile(optimizer='adam',
...               loss=loss_fn,
...               metrics=['accuracy'])
>>> model.fit(x_train, y_train, epochs=5)
Epoch 1/5
2023-11-28 09:06:00.300076: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fee20069290 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-11-28 09:06:00.300099: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2023-11-28 09:06:00.305647: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:255] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-11-28 09:06:00.371150: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8904
2023-11-28 09:06:00.442691: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-11-28 09:06:00.535627: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
1875/1875 [==============================] - 5s 2ms/step - loss: 0.2931 - accuracy: 0.9150
Epoch 2/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1395 - accuracy: 0.9597
Epoch 3/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1070 - accuracy: 0.9678
Epoch 4/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0877 - accuracy: 0.9729
Epoch 5/5
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0754 - accuracy: 0.9764
<keras.src.callbacks.History object at 0x7feef9f09550>
>>> model.evaluate(x_test,  y_test, verbose=2)
313/313 - 1s - loss: 0.0714 - accuracy: 0.9756 - 600ms/epoch - 2ms/step
[0.07143650203943253, 0.975600004196167]
>>> 

MNIST batch job

You can also submit a batch job to the cluster that uses GPUs. In this example of submitting a batch job to use a GPU, we are creating 2 files, one containing the python steps that we used above, and the second containing a shell script that will be submitted to SLURM. The Python program looks like:

[login31 /users/mmill116/gpu-test]$ ls -al
 total 496
 drwxr-xr-x    2 mmill116 mmi   4 Nov 28 11:49 .
 drwxr-x---+ 214 mmill116 mmi 412 Nov 28 11:49 ..
 -rw-r--r--    1 mmill116 mmi 789 Nov 28 11:47 nvidia-test.py
 -rwxr-xr-x    1 mmill116 mmi 343 Nov 28 11:49 test-slurm-gpu.sh
[login31 /users/mmill116/gpu-test]$ cat nvidia-test.py
 import tensorflow as tf
 print("TensorFlow version:", tf.__version__)
 mnist = tf.keras.datasets.mnist
 (x_train, y_train), (x_test, y_test) = mnist.load_data()
 x_train, x_test = x_train / 255.0, x_test / 255.0
 model = tf.keras.models.Sequential([
   tf.keras.layers.Flatten(input_shape=(28, 28)),
   tf.keras.layers.Dense(128, activation='relu'),
   tf.keras.layers.Dropout(0.2),
   tf.keras.layers.Dense(10)
 ])
 predictions = model(x_train[:1]).numpy()
 predictions
 tf.nn.softmax(predictions).numpy()
 loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
 loss_fn(y_train[:1], predictions).numpy()
 model.compile(optimizer='adam',
               loss=loss_fn,
               metrics=['accuracy'])
 model.fit(x_train, y_train, epochs=5)
 model.evaluate(x_test,  y_test, verbose=2)

The script that will be used to submit to SLURM looks like. At this point you need to include the LD_LIBRARY_PATH environemnt variable if you will be using the system version of python:

>[login31 /users/mmill116/gpu-test]$ cat test-slurm-gpu.sh
!/bin/sh
SBATCH --partition=gpu
SBATCH --gres=gpu:titanv:1

echo $CUDA_VISIBLE_DEVICES
cd $HOME/gpu-test
nvidia-smi
export LD_LIBRARY_PATH=/jhpce/shared/jhpce/core/JHPCE_tools/3.0/lib:/usr/local/lib/python3.9/site-packages/nvidia/cudnn/lib:/jhpce/shared/jhpce/core/conda/miniconda3-23.3.1/envs/cudatoolkit-11.8.0/lib

python nvidia-test.py

You can now use “sbatch” to submit the job, and examine the results.


[login31 /users/mmill116/gpu-test]$ sbatch test-slurm-gpu.sh
 Submitted batch job 915238
 [login31 /users/mmill116/gpu-test]$ squeue --me
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             915238       gpu test-slu mmill116  R       0:06      1 compute-117

[login31 /users/mmill116/gpu-test]$ ls
 nvidia-test.py  slurm-915238.out  test-slurm-gpu.sh
 [login31 /users/mmill116/gpu-test]$ squeue --me
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             915238       gpu test-slu mmill116  R       0:21      1 compute-117 
[login31 /users/mmill116/gpu-test]$ squeue --me
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[login31 /users/mmill116/gpu-test]$ ls
 nvidia-test.py  slurm-915238.out  test-slurm-gpu.sh
 [login31 /users/mmill116/gpu-test]$ cat slurm-915238.out
 0
 Tue Nov 28 11:55:08 2023       
 +---------------------------------------------------------------------------------------+
 | NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
 |-----------------------------------------+----------------------+----------------------+
 | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
 |                                         |                      |               MIG M. |
 |=========================================+======================+======================|
 |   0  NVIDIA TITAN V                 On  | 00000000:B1:00.0 Off |                  N/A |
 | 28%   35C    P8              26W / 250W |      0MiB / 12288MiB |      0%      Default |
 |                                         |                      |                  N/A |
 +-----------------------------------------+----------------------+----------------------+
 +---------------------------------------------------------------------------------------+
 | Processes:                                                                            |
 |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
 |        ID   ID                                                             Usage      |
 |=======================================================================================|
 |  No running processes found                                                           |
 +---------------------------------------------------------------------------------------+
 2023-11-28 11:55:08.645381: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
 To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
 2023-11-28 11:55:09.449932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
 2023-11-28 11:55:10.803612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10696 MB memory:  -> device: 0, name: NVIDIA TITAN V, pci bus id: 0000:b1:00.0, compute capability: 7.0
 TensorFlow version: 2.13.0
 Epoch 1/5
 2023-11-28 11:55:11.914159: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ff35c066ba0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
 2023-11-28 11:55:11.914504: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA TITAN V, Compute Capability 7.0
 2023-11-28 11:55:11.920545: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:255] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable.
 2023-11-28 11:55:11.985098: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8904
 2023-11-28 11:55:12.057397: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
 2023-11-28 11:55:12.148968: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
 1875/1875 [==============================] - 5s 2ms/step - loss: 0.2948 - accuracy: 0.9142
 Epoch 2/5
 1875/1875 [==============================] - 4s 2ms/step - loss: 0.1433 - accuracy: 0.9574
 Epoch 3/5
 1875/1875 [==============================] - 4s 2ms/step - loss: 0.1078 - accuracy: 0.9675
 Epoch 4/5
 1875/1875 [==============================] - 4s 2ms/step - loss: 0.0884 - accuracy: 0.9725
 Epoch 5/5
 1875/1875 [==============================] - 4s 2ms/step - loss: 0.0751 - accuracy: 0.9769
 313/313 - 1s - loss: 0.0713 - accuracy: 0.9780 - 600ms/epoch - 2ms/step
 [login31 /users/mmill116/gpu-test]$