File Transfer

A number of options exist for transfering files to-and-fro between JHPCE and your local host. Which solution you chose, depends on your use case.

  1. scp or sftp — file transfer via command line
  2. GUI for sftp — file transfer by drag and drop from your desktop
  3. Aspera  —  very fast file transfer to and from Aspera servers
  4. GlobusOnline — fast file transfers between GlobusOnline endpoints
  5. Mount remote filesystems — directories at JHPCE mounted on your local host.
  6. ftp (kind of…)

For transferring files, you should use “jhpce-transfer01.jhsph.edu” rather than “jhpce01.jhsph.edu” when connecting. The “jhpce01.jhsph.edu” node is only meant to be used for logging into the cluster, and not transferring large files. The “jhpce01.jhsph.edu” node only has a 1 Gb/s network connection to the JHU network, whereas the “jhcpe-transfer01.jhsph.edu” node has a 40 Gb/s connection. While an individual file transfers would not be able to achieve the full 40Gb/s speed, it will be significantly faster than using the 1Gb/s connection on the “jhpce01.jhsph.edu

scp or sftp

The scp and sftp commands transfer files using the SCP and SFTP protocols respectively. These are the two most commonly used methods for transfering files to-and-fro between JHPCE and local hosts. Which of these two should you use? It depends. The basic tradeoff is between speed (scp is faster) and flexibility (sftp is more flexible). The differences are described below (taken from wikipedia ).

Compared to the earlier SCP protocol, which allows only file transfers, the SFTP protocol allows for a range of operations on remote files – it is more like a remote file system protocol. An SFTP client’s extra capabilities compared to an SCP client include resuming interrupted transfers, directory listings, and remote file removal. [1] For these reasons it is relatively simple to implement a GUI SFTP client compared with a GUI SCP client.

Although both SCP and SFTP utilize the same SSH encryption during file transfer with the same general level of overhead, SCP is usually much faster than SFTP at transferring files, especially on high latency networks. This happens because SCP implements a more efficient transfer algorithm, one which does not require waiting for packet confirmations. This leads to faster speed but comes at the expense of not being able to interrupt a transfer, so unlike SFTP, SCP transfer cannot be canceled without terminating the session.

Graphical User Interfaces for drag and drop file transfer

If you prefer drag and drop rather than writing shell commands, then an application that presents a window for drag and drop is what you want. Depending on which OS you are using, we can recommend the following applications:

  • Apple OSX Cyberduck is an outstanding application that not only provides a GUI browser for FTP, SFTP, but it also allows you to browse WebDav, Amazon S3, and OpenStack Swift file systems. It is free to download and install.
  • Microsoft Windows MobaXterm is our recommended application for accessing the JHPCE cluster, or transferring files to and from the cluster.  Instructions for setting up MobaXterm for file transfer can be found here.  You can also use WinSCP  if you are already familiar with it.Linux

Aspera

Aspera is a commercial product that allows file transfers that are reportedly 20 times faster than ftp. If you download data from the NCBI Aspera server or download/upload data from/to JHU CIDR on the Bayview campus, then you will use Aspera. The Aspera license does not allow us to install the client for our users. You must install it yourself. You may either download the linux client from the aspera site  or else use the client that we already downloaded. If you prefer the latter, simply copy the installation script from here:

/jhpce/shared/jhpce/core/JHPCE_tools/1.0/packages/aspera-connect-3.6.2.117442-linux-64.sh

into your home directory, and then run the script:

bash aspera-connect-3.6.2.117442-linux-64.sh

This will install the “ascp” command under your home directory at  ~/.aspera/connect/bin .  You can either add ” ~/.aspera/connect/bin” to your PATH, or use the full path to the “ascp” command to run it.

Globus

See:  http://jhpce.jhu.edu/knowledge-base/using-globus-to-transfer-files/

Mounting virtual file systems

A common use case occurs when a user has a pipeline that is periodicially emiting tab delimited files and the user wants to plot these files with a favorite plotting or analysis application that runs on their local host. In this case it is common to mount the remote file system on the local host via NSF or SMB.

Unfortunately, given the size and hetergeneity of our user base (which spans the entire medical campus), this is not practical. Instead, we recommend that users create a virtual file system on their OSX machine with the MACFusion application. MacFusion is free and allows you to create a mount point on your local host that looks like just another directory in your local file system. So any applications and scripts on your local host can access the data in that mount point. From the user perspective, it acts just like an SMB or NSF mount point. Data is transferred back and forth via an encrypted link.

MacFusion requires the installation of an OSX kernel extenstion and some associated tools. OSXFuse provides the needed extension. OSXFuse implements a so called ” FileSystem in USErspace”. This technology is described here. There exist FUSE kernel modules for most flavors of unix and linux.  The procedure for installing OSXFuse and MACFusion is described below.

 

  1. Downloaded OSXFUSE from sourceforge repository
    http://sourceforge.net/projects/osxfuse/
  2. install OSXFUSE
    Launch the OSXFUSE installer and perform a custom install.  Be sure to select “MacFuse Compatibility Layer” in the Custom Install screen. After installing the kernel extension it may, or may not, be necessary to reboot your mac.Screen Shot 2014-02-13 at 1.25.09 PM
  3. Download and install the Macfusion app from:
    http://macfusionapp.org
  4.   Startup MacFusion, and create an entry for enigma2.jhsph.edu
    — enter your login and password.
    — select a mount point, e.g. ~/jhpce/myhome/
  5. Once the drive is mounted, you can cd to the directory in the shell or view it  in a window on your desktop. To do this you need to “Reveal” the drive by pressing ⌘-R.  Once the directory is revealed, you can drag and drop files into the director in the usual way you drag and drop files into any directory on your mac.

ftp

We don’t have the ftp client installed on the cluster.  It is an older, less secure, unencrypted channel for transferring files.  However if you are downloading files from an older site that does not support SFTP or one of the other more modern mechanisms, you have a couple of options for ftp.

If you want to be able to interactively browse through the ftp site you can use the “lynx” text based browser command:

lynx ftp://USER@ftp.site.gov

Once connected, you can then use the arrow keys to move around the site, and to select a file to download or a directory to descend into.

If you know the exact path to the file you want, you can use the “wget” command:

wget ftp://USER:PASSWORD@ftp.site.gov/path/to/file

All of these should be done from the “rnet” queue to make use of our high speed ScienceDMZ network connection.