General Storage Knowledge Sharing

Over the years , we have accumulated a number useful tips about using the various storage systems on the JHPCE cluster. Many of these reference other links on the JHPCE web site, but they are presented below in a single list.

  1. Anything I/O intensive should be done on the compute nodes rather than the jhpce01 login node.  Anything more than a quick ls (such as copying large files, recursively changing permissions, creating or extracting tar or zip archives, running a “find” on a large filesystem) should be done within a “srun” session on a compute node.
  2. Data transfers of files larger about 1GB should be done through jhpce-transfer01.jhsph.edu rather than the jhpce01.jhsph.edu login node.  The jhpce01 login node is intended for logging into the JHPCE cluster, and only has a 1Gbit network connection.  Large file transfers can fully exhaust the network bandwidth on jhpce01 and cause cluster connections to appear slow.  For details on using the transfer node, please see https://jhpce.jhu.edu/knowledge-base/file-transfer/
  3. Try to avoid having directories with more than 100 files in them.  This is especially an issue on Lustre storage systems, such as DCL01 and DCL02, as an “ls” of a directory requires an individual network connection to the storage system for every file in the directory.  This can be painfully slow when there are hundreds or thousands of files in a directory.  If you do have directories with thousands of files in them, these files should be merged into a single tar or zip file, or broken out in a hierarchical tree structure.
  4. Try to avoid storing programs and scripts in DCL01 and DCL02.  We have found that using editors such as emacs and vi on text files on the Lustre-based DCL01 and DCL02 can appear to work slowly.
  5. Bear in mind that most storage on the cluster is not backed up.  We do back up home directories and a few other select directories on the DCS and DCL systems for groups that have requested backups.  We do have additional backup storage capacity available for a small fee (about $20/TB/year).
  6. Make use of your 1 TB of “fastscratch” storage for IO intensive jobs.  The 2 common areas where using fastscratch may help speed up the performance of your jobs are a) jobs that are writing lots of small intermediary files, or b) fobs that are repeatedly reading from the same large data file.  For details on using the fastscratch space, please see https://jhpce.jhu.edu/knowledge-base/fastscratch-space-on-jhpce/ .  Bear in mind that files older than 30 days are systematically removed from fastscratch, so any result files that you wish to keep should be copied back to long-term storage, such as your home directory, or DCS/DCL directories.
  7. On a similar note, please remember that DCS and DCL stand for “Dirt Cheap Storage” and “Dirt Cheap Lustre”, and we designed with cost-effectiveness as a primary driving factor over performance.
  8. Sharing data can be done in several ways on the cluster:
    1. Traditional Unix file permissions and groups.  This is the simplest and preferred method of sharing data with other users on the cluster.  Please email bitsupport@lists.jhu.edu if you would like to have a Unix group set up
    2. Access Control Lists (ACLs).  ACLs are useful in a few scenarios more fine-grained sharing of data is needed.  Please see https://jhpce.jhu.edu/knowledge-base/granting-permissions-using-acls/ for much more details.
  9. Sharing files with external collaborators can be done via Globus.  Please see https://jhpce.jhu.edu/knowledge-base/using-globus-to-transfer-files/ for more information on using Globus.