Authoring Note
This document was copied over from the old web site and is being slowly updated.
File Transfer - Overview¶
A number of options exist for transfering files to-and-fro between JHPCE and your local host. Which solution you chose depends on your use case.
Copying files around inside the cluster, between JHPCE file systems, is a different activity. We have a document about using rsync to copy files. This tool can be used for both internal copies and for moving files into or out of the cluster.
Use the transfer node and partition¶
For transferring files to and from the cluster, you should use
jhpce-transfer01.jhsph.edu
rather than a login node.
This is both significantly faster, as the transfer node has a 40G Ethernet connection to the outside world while the login nodes have 10G connections. In any case, EVERYONE depends on the login nodes, and you should not run ANYTHING on them that occupies them.
Interactive use of the transfer node¶
You can ssh directly to jhpce-transfer01.jhsph.edu
and do your work.
We have a transfer
SLURM partition which uses the same node that you can use for interactive or batch sessions.
jhpce01% srun --pty --x11 -p transfer bash
Batch use of the transfer node/partition¶
Here is a sample SLURM batch job that you can use as a model to do an internal transfer on a compute node using good rsync arguments. You can adapt it to run on the transfer
partition to perform transfers into or out of the cluster. You can also change it to use some other transfer programs. Usually such batch jobs require care when providing authentication information.
Common Tools¶
Many of these transfer protocols have both command-line and graphic user interface programs available.
- scp or sftp — file transfer via command line
- rsync - a very useful uni-directional mirroring utility program
- GUI for sftp — file transfer by drag and drop from your desktop
- GlobusOnline — fast file transfers between GlobusOnline endpoints
- Aspera — very fast file transfer to and from Aspera servers
- OneDrive access with rclone — Use “rclone” to access your OneDrive directory (as well as other network drives (AWS buckets, Google Drive…)
- Unison — keep directories synced (can be configured to be bi-directional)
- Mount remote filesystems — directories at JHPCE mounted on your local host. IS THIS MATERIAL STILL ACCURATE IN 2024? Is this example SSHFS doc worth re-using?
- ftp (kind of…)
scp
and sftp
¶
The scp
and sftp
command-line tools are the most common tools used for
transferring data to and from the cluster. The basic tradeoff is
between speed (scp
is faster) and flexibility (sftp
is more
flexible). The scp
and sftp
commands are available from the Terminal on a
MacOS or Linux based laptop/desktop, or from a CMD
or Powershell
prompt on recent Windows systems.
Although both SCP and SFTP utilize the same SSH encryption during file transfer with the same general level of overhead, SCP is usually faster than SFTP at transferring files, especially on high latency networks. SFTP should be used when you may need an interactive session on the cluster to navigate to a directory before transferring the files, whereas SCP should be used when you know the exact path of the file you want to transfer.
scp
¶
The scp
command can be though of as a network cp
command. The command to transfer a file called data.txt
from your local system to your home directory on the cluster would be:
scp LOCAL_PATH/data.txt USERID@jhpce-transfer01.jhsph.edu:REMOTE_PATH/REMOTE_TARGET_FILENAME
where the paths default to your current local directory and home directory on the remote. The target filename if omitted will be the local filename.
If you want to copy a file from the cluster to your local laptop/desktop, you would reverse the arguments. For example to copy data2.txt
from the cluster to a local file:
scp USERID@jhpce-transfer01.jhsph.edu:REMOTE_PATH/data2.txt LOCAL_PATH/LOCAL_TARGET_FILENAME
sftp
¶
The sftp
command is another means of transfering data to and from the cluster. To use sftp
, you would run the command:
sftp USERID@jhpce-transfer01.jhsph.edu
Once you’ve connected, you’ll be shown an sftp>
prompt. From here you can use
the shell command ls
to get a directory listing, and and cd
to
change directories. In addition to ls
and cd
you can use the
get
command to transfer a file from the cluster it to your local
system, or the put
command to transfer a file from your local system
to the cluster. Once you are done with sftp
, you would type exit
to
end the session.
GUIs¶
If you prefer drag and drop interface rather than using shell commands, then an application that presents a window for drag and drop is what you want. Depending on which OS you are using, we can recommend the following applications:
macOS users might consider Filezilla. It is an outstanding application that not only provides a GUI browser for FTP, SFTP, but it also allows you to browse WebDav, Amazon S3, and OpenStack Swift file systems. It is free to download and install.
Our recommended application for Windows users accessing the JHPCE cluster is MobaXterm. You can also use https://en.wikipedia.org/wiki/WinSCP or Putty if you are already familiar with them.
Rclone¶
Rclone can be used to access network file resources, such as OneDrive, Google Drive, and AWS. See here for instructions on using it to connect to Hopkins OneDrive storage.
Aspera¶
Aspera is a commercial product from IBM that allows file transfers that are
reportedly 20 times faster than ftp
. Documentation is available for macOS, Linux and Windows.
Aspera is required to download data from the NCBI Aspera server or download/upload data from/to JHU CIDR on the Bayview campus.
The Aspera license does not allow us to install the client for our users. You must install it yourself. You may either download the linux client from the Aspera Download site or else use the client that we already downloaded. (It may be out of date.) If you prefer the latter, simply copy the installation script from here:
/jhpce/shared/jhpce/core/JHPCE_tools/3.0/packages/ibm-aspera-connect_4.2.10.749_linux_x86_64.sh
into your home directory, and then run the script:
bash ibm-aspera-connect_4.2.10.749_linux_x86_64.sh
This will install the ascp
command under your home directory at ~/.aspera/connect/bin
. You can either add ~/.aspera/connect/bin
to your PATH
, or use the full path to the ascp
command to run it.
You may also need to do other steps, such as install an extension to your web browser. Instructions on how to do that for Linux, as an example, are available from IBM here.
Unison¶
Warning
As of 20240318 we don't have unison installed in the cluster. We will work on adding it. (You can install your own copy of it.)
Using Unison, you can keep data synchronized between directories, including ones on a single computer or between the cluster and on your local system. Both CLI and GUI versions are available. Unison needs to be installed on both computers if used across a network.
Unison is a synchronization tool. It can be told to update files in both SOURCE and DESTINATION locations according to some rules.
Unison home page is here with a wiki that provides access to documentation and some binaries.
An extensive tutorial at ostechnix.
A wiki about using it from ArchLinux.
This document written by a previous JHPCE user (Jacob Fiksel) might still be useful.
Globus¶
We have a Globus endpoint. Please see this document.
Mounting virtual file systems¶
Obsolete
This might be obsolete information as of 20240220 - OSXFUSE is now named macFUSE and is hosted at a different location than what is described below.
A common use case occurs when a user has a pipeline that is periodicially emiting tab delimited files and the user wants to plot these files with a favorite plotting or analysis application that runs on their local host. In this case it is common to mount the remote file system on the local host via NFS or SMB.
Unfortunately, given the size and hetergeneity of our user base (which spans the entire medical campus), this is not practical. Instead, we recommend that users create a virtual file system on their OSX machine with the MACFusion application. MacFusion is free and allows you to create a mount point on your local host that looks like just another directory in your local file system. So any applications and scripts on your local host can access the data in that mount point. From the user perspective, it acts just like an SMB or NSF mount point. Data is transferred back and forth via an encrypted link.
MacFusion requires the installation of an OSX kernel extenstion and some associated tools. OSXFuse provides the needed extension. OSXFuse implements a so called ” FileSystem in USErspace”. This technology is described here. There exist FUSE kernel modules for most flavors of unix and linux. The procedure for installing OSXFuse and MACFusion is described below.
- Downloaded OSXFUSE from sourceforge repository
- Install OSXFUSE
- Launch the OSXFUSE installer and perform a custom install. Be sure to select “MacFuse Compatibility Layer” in the Custom Install screen.
- After installing the kernel extension it may, or may not, be necessary to reboot your mac.Screen Shot
- Download and install the Macfusion app from: http://macfusionapp.org.
- Startup MacFusion, and create an entry for
jhpce-transfer01.jhsph.edu
— enter your login and password. - Select a mount point, e.g.
~/jhpce/myhome/
- Once the drive is mounted, you can cd to the directory in the shell or view it in a window on your desktop. To do this you need to “Reveal” the drive by pressing Command-R. Once the directory is revealed, you can drag and drop files into the director in the usual way you drag and drop files into any directory on your mac.
ftp¶
We don’t have the ftp client installed on the cluster. It is an older, less secure, unencrypted channel for transferring files. However if you are downloading files from an older site that does not support SFTP or one of the other more modern mechanisms, you have a couple of options for ftp.
If you want to be able to interactively browse through the ftp site you can use the lynx text based browser command:
lynx ftp://USER@ftp.site.gov
Once connected, you can then use the arrow keys to move around the site, and to select a file to download or a directory to descend into.
If you know the exact path to the file you want, you can use the “wget” command:
wget ftp://USER:PASSWORD@ftp.site.gov/path/to/file
All of these should be done from the transfer
queue to make use of our high speed ScienceDMZ network connection.