GPUs on the JHPCE cluster

We have a number of GPU nodes on the JHPCE cluster that are available for general use. As of July 2023, we have the following GPUs available on the gpu queue:

compute-117 – the main gpu.q node:
3 NVIDIA GeForce GTX 1080 Ti with 11GB RAM
2 Nvidia V100 GPUs with 32GB RAM
1 Nvidia Titan V with 11GB RAM

compute-123 – The Biostat GPU node, sharing GPUs with the general gpu queue:
2 Nvidia V100 GPUs with 32GB RAM

compute-126 – One of the Lieber (Collado) GPU node, sharing GPUs with the general gpu queue:
4 Nvidia A100 GPUs with 80GB RAM

Below is the process for accessing the GPU node and a couple of examples of running a program that utilizes a GPU.

  1. First, login to the JHPCE cluster.
  2. From the login node, run “qpic -q gpu” to identify if any GPUs are available on the gpu queue.
$ qpic -q gpu
		gpu	shared	| Jobs   - Cores-   Load    |  Used - Tot RAM - mem_free
compute-117  :	1/6		|  1	-   48  -   1.26   |   18G -  376G   -   357G 
compute-123  :	2/2	0/24	|  2	-   40  -   1.33   |   50G -  754G   -   276G 
compute-126  :	2/4		|  2	-   48  -   8.87   |   32G -  503G   -   300G 
Totals:		5/10	0/0 |     5 /  136    4%       |  100G /1633 G    6% 933G
		gpu	shared	

3. Connect to a GPU node by running “qrsh -l gpu”. Be sure to include sufficient RAM for your job. Typically 100GB will be required.
4. Identify which GPUS are available by running “nvidia-smi”. In the below example, GPUs 0 and 1 are in use, so GPU 2 is available:

[jhpce01 /users/mmill116]$ qrsh -l gpu -l mem_free=100G,h_vmem=100G
Last login: Mon Apr 25 16:30:39 2022 from jhpce01.cm.cluster
[compute-117 /users/mmill116]$ nvidia-smi 
Mon May 23 11:24:13 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:1B:00.0 Off |                    0 |
| N/A   67C    P0   237W / 250W |   5853MiB / 32510MiB |     72%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:88:00.0 Off |                    0 |
| N/A   64C    P0   255W / 250W |   5853MiB / 32510MiB |     72%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA TITAN V      Off  | 00000000:B2:00.0 Off |                  N/A |
| 18%   36C    P0    35W / 250W |      0MiB / 12066MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    154362      C   ...da3/envs/V100/bin/python3     5849MiB |
|    1   N/A  N/A    154363      C   ...da3/envs/V100/bin/python3     5849MiB |
+-----------------------------------------------------------------------------+

5. To select a GPU, set the CUDA_VISIBLE_DEVICES environment variable to an available GPU. In the above example, GPUs 0 and 1 are in use, so you would run the following to use GPU #2.

[compute-117 /users/mmill116]$ export CUDA_VISIBLE_DEVICES=2

6. At this point you can start running your GPU specific code. You can either use install your own GPU-enabled programs, or use one of the tensorflow conda environments installed on the cluster. If you install your own programs, you will likely need to run “module load cudnn” to add the Deep Neural Network libraries to your LD_LIBRARY_PATH. Below are 2 examples of using GPUs on the JHPCE cluster.

Up first, is an example of running the MNIST tensorflow example, using the conda environment installed on the JHPCE cluster that is set up with tensorflow, keras, and the required CUDA libraries.

[jhpce01 /users/mmill116]$ qrsh -l gpu -l mem_free=100G,h_vmem=100G
Last login: Mon May 23 11:43:53 2022 from jhpce01.cm.cluster
[compute-117 /users/mmill116]$ module load conda
[compute-117 /users/mmill116]$ source activate /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/tensorflow-gpu-2.2
(tensorflow-gpu-2.2) [compute-117 /users/mmill116]$ which python
/jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/tensorflow-gpu-2.2/bin/python
(tensorflow-gpu-2.2) [compute-117 /users/mmill116]$ export CUDA_VISIBLE_DEVICES=2
(tensorflow-gpu-2.2) [compute-117 /users/mmill116]$ python
Python 3.8.12 (default, Oct 12 2021, 13:49:34) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> mnist = tf.keras.datasets.mnist
>>> (x_train, y_train),(x_test, y_test) = mnist.load_data()
>>> x_train, x_test = x_train / 255.0, x_test / 255.0
>>> model = tf.keras.models.Sequential([
...   tf.keras.layers.Flatten(input_shape=(28, 28)),
...   tf.keras.layers.Dense(512, activation=tf.nn.relu),
...   tf.keras.layers.Dropout(0.2),
...   tf.keras.layers.Dense(10, activation=tf.nn.softmax)
... ])
2022-05-23 11:57:15.100130: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-05-23 11:57:16.302334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:b2:00.0 name: NVIDIA TITAN V computeCapability: 7.0
coreClock: 1.455GHz coreCount: 80 deviceMemorySize: 11.78GiB deviceMemoryBandwidth: 607.97GiB/s
2022-05-23 11:57:16.405242: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-05-23 11:57:17.446421: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-05-23 11:57:17.995540: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-05-23 11:57:18.546447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-05-23 11:57:19.260189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-05-23 11:57:19.656225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-05-23 11:57:20.837449: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-05-23 11:57:20.852655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2022-05-23 11:57:20.853406: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2022-05-23 11:57:20.886903: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2100000000 Hz
2022-05-23 11:57:20.887830: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55aaea148100 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-05-23 11:57:20.887871: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-05-23 11:57:21.073669: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55aaea158db0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-05-23 11:57:21.073701: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA TITAN V, Compute Capability 7.0
2022-05-23 11:57:21.074813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:b2:00.0 name: NVIDIA TITAN V computeCapability: 7.0
coreClock: 1.455GHz coreCount: 80 deviceMemorySize: 11.78GiB deviceMemoryBandwidth: 607.97GiB/s
2022-05-23 11:57:21.074860: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-05-23 11:57:21.074874: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-05-23 11:57:21.074887: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-05-23 11:57:21.074899: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-05-23 11:57:21.074911: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-05-23 11:57:21.074923: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-05-23 11:57:21.074936: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-05-23 11:57:21.084690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2022-05-23 11:57:21.084761: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-05-23 11:57:21.086334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-23 11:57:21.086348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2022-05-23 11:57:21.086360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2022-05-23 11:57:21.088197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11054 MB memory) -> physical GPU (device: 0, name: NVIDIA TITAN V, pci bus id: 0000:b2:00.0, compute capability: 7.0)
>>> model.compile(optimizer='adam',
...               loss='sparse_categorical_crossentropy',
...               metrics=['accuracy'])
>>> model.fit(x_train, y_train, epochs=5)
Epoch 1/5
2022-05-23 11:58:15.454941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2217 - accuracy: 0.9338
Epoch 2/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0975 - accuracy: 0.9700
Epoch 3/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0684 - accuracy: 0.9784
Epoch 4/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0535 - accuracy: 0.9831
Epoch 5/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0450 - accuracy: 0.9857
<tensorflow.python.keras.callbacks.History object at 0x7f17c1ea0f10>
>>> model.evaluate(x_test, y_test)
313/313 [==============================] - 0s 1ms/step - loss: 0.0707 - accuracy: 0.9795
[0.07074514031410217, 0.9794999957084656]

Next, is an example of running the MNIST tensorflow example, using your own installation of tensorflow and keras.

[jhpce01 /users/mmill116]$ qrsh -l gpu -l mem_free=100G,h_vmem=100G
Last login: Mon May 23 11:43:53 2022 from jhpce01.cm.cluster
[compute-117 /users/mmill116]$ module load conda
[compute-117 /users/mmill116]$ source activate [jhpce01 /users/mmill116]$ qrsh -l gpu -l mem_free=100G,h_vmem=100G
[compute-123 /users/mmill116]$ module load python/3.9.10
[compute-123 /users/mmill116]$ module load cudnn
[compute-123 /users/mmill116]$ pip3 install --user tensorflow-gpu
WARNING: Ignoring invalid distribution -ip (/jhpce/shared/jhpce/core/python/3.9.10/lib/python3.9/site-packages)
WARNING: Ignoring invalid distribution -ip (/jhpce/shared/jhpce/core/python/3.9.10/lib/python3.9/site-packages)
Collecting tensorflow-gpu
  Downloading tensorflow_gpu-2.9.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.7 MB)

. . .

WARNING: Ignoring invalid distribution -ip (/jhpce/shared/jhpce/core/python/3.9.10/lib/python3.9/site-packages)

[notice] A new release of pip available: 22.1.2 -> 22.2.2
[notice] To update, run: pip install --upgrade pip
[compute-123 /users/mmill116]$[compute-123 /users/mmill116]$ nvidia-smi
Mon Aug 29 20:33:28 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100S-PCI...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   45C    P0    39W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100S-PCI...  Off  | 00000000:5E:00.0 Off |                    0 |
| N/A   45C    P0    37W / 250W |  28849MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100S-PCI...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   39C    P0    37W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100S-PCI...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   37C    P0    37W / 250W |      0MiB / 32510MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    1   N/A  N/A    102054      C   python3                         28845MiB |
+-----------------------------------------------------------------------------+

[compute-123 /users/mmill116]$ export CUDA_VISIBLE_DEVICES=2
[compute-123 /users/mmill116]$ python3
Python 3.9.10 (main, Feb 22 2022, 16:34:24)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2022-08-29 20:36:16.969307: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
>>> print(tf.__version__)
2.9.1
>>> mnist = tf.keras.datasets.mnist
>>> (x_train, y_train),(x_test, y_test) = mnist.load_data()
>>> x_train, x_test = x_train / 255.0, x_test / 255.0
>>> model = tf.keras.models.Sequential([
...   tf.keras.layers.Flatten(input_shape=(28, 28)),
...   tf.keras.layers.Dense(512, activation=tf.nn.relu),
...   tf.keras.layers.Dropout(0.2),
...   tf.keras.layers.Dense(10, activation=tf.nn.softmax)
... ])
2022-08-29 20:38:40.578122: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-29 20:38:41.230889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30987 MB memory:  -> device: 0, name: Tesla V100S-PCIE-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
>>> model.compile(optimizer='adam',
...               loss='sparse_categorical_crossentropy',
...               metrics=['accuracy'])
>>> model.fit(x_train, y_train, epochs=5)
Epoch 1/5
1875/1875 [==============================] - 6s 2ms/step - loss: 0.2177 - accuracy: 0.9344
Epoch 2/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0973 - accuracy: 0.9703
Epoch 3/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0684 - accuracy: 0.9785
Epoch 4/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0532 - accuracy: 0.9827
Epoch 5/5
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0429 - accuracy: 0.9858
<keras.callbacks.History object at 0x7f1f15e3e3d0>
>>> model.evaluate(x_test, y_test)
313/313 [==============================] - 1s 2ms/step - loss: 0.0711 - accuracy: 0.9799
[0.07107411324977875, 0.9799000024795532]