Availability of RStudio on the JHPCE cluster

We recently installed the RStudio graphical R Integrated Development Environment (IDE) on the JHPCE cluster. Some people find the RStudio program helpful in organizing R projects, and writing and debugging R programs.  For more detailed instructions on using RStudio, please see https://jhpce.jhu.edu/question/how-do-i-get-the-rstudio-program-to-work-on-the-cluster/

Posted in Announcements | Comments Off on Availability of RStudio on the JHPCE cluster

JHPCE Cluster Downtime – Oct 17 – 19

We have just been informed that there is going to be power work performed at the Bayview/MARCC facility from October 17 – October 19th.  We will need to power down the JHPCE cluster for this work, so JHPCE the cluster will unavailable from 6:00 AM on Monday, October 17 until 5:00 PM on October 19th. Please schedule your jobs accordingly.  Sorry for this inconvenience!

Posted in Announcements | Comments Off on JHPCE Cluster Downtime – Oct 17 – 19

Status of the datacenter move

We moved the hardware to Bayview over the weekend. Mark and Jiong are currently bringing up, updating and testing the various systems.  We are relieved to report that all the file systems survived the move. Accordingly there will be no need to recover data from the disaster-recovery backups that we created over the summer.

Over the next two days, we will be updating and testing the systems. We are taking advantage of the downtime to perform various long-delayed upgrades of the system software and some networking hardware.

Below are a few pictures from over the weekend. To get a sense of what this move entailed, imagine moving a car from one garage to another. Easy right? Not really, because you have to take the car apart in the first garage, carefully transport the pieces to the other garage, and then reassemble the car.

module1_exerior

 

Figure 1. The datacenter module at Bayview is on the left. The cooling unit is on the right.

moving_teamFigure 2. The moving crew ready to go at 6:30 in the morning.

staged_serversFigure 3. Servers and storage JBODs staged in preparation for racking (entering through the door at the far end). Equipment was transported in 7 shipments.

racking_a_serverFigure 4. Racking a server.

all_racks

 

Figure 5. Panorama of the racks in our aisle. At this point most equipment has been racked. Mark (seated) is testing equipment.

rats_nest

Figure 6. Rats nest of cables in the back of the main cabinet with the core switch and head nodes.

 

Posted in Status Updates, Uncategorized | Comments Off on Status of the datacenter move

Cluster is down for the move to the Bayview datacenter

Logins have been disabled in preparation for our move to the University’s colocation facility at Bayview.  Stay tuned for updates…

Posted in Status Updates, Uncategorized | Comments Off on Cluster is down for the move to the Bayview datacenter

Upcoming JHPCE Cluster Move and Downtime – August 18th – 24th

We will be moving the JHPCE cluster equipment to the University’s new collocation site on the Bayview Campus. We are on track to outgrow the existing server room, which has limited power and cooling capabilities.

The change of venue will have no impact on the way the JHPCE resources are used, nor will it change the names of any of the login servers. However, the move is very complex and it will require an extended (6-day) service interruption to shutdown, disassemble, move, reassemble, and reboot the system.  At the present time the plan is to shut down the cluster at 9:00 AM on Thursday August 18. At that time all logins will be disabled and all running jobs will be killed. We expect to be up and running again by 9am Wednesday August 24.  Please let us know if you have any questions or concerns.

Posted in Status Updates, Uncategorized | Comments Off on Upcoming JHPCE Cluster Move and Downtime – August 18th – 24th

Upcoming changes to home directories.

tl;dr – At some point over the next month, your home directory will be
changing from /home/*/USERID to /users/USERID.

Recently we purchased a new storage array for the JHPCE cluster, with the
intention of consolidating a number of our older, smaller arrays onto this new
storage array. This new storage array is much larger, will perform faster,
and will be less expensive to use than our existing smaller arrays. One of
the older storage array that we are migrating is the current Amber1 array,
which contains the home directories for all of the JHPCE cluster users.

Throughout May an June we will be migrating users’ home directories in
several phases to this new storage system. For each phase, we will be
notifying the users affected prior to their being moved, and again once the
move has been completed. During the migration of your home directory, you
should refrain from being logged into the cluster, or running jobs on the
cluster. We will strive to be flexible in the scheduling of the migrations,
so as to affect the fewest running jobs as possible.

When your home directory is moved, the path name will change from
/home/*/USERNAME to /users/USERNAME. We are eliminating the various
partitions under /home (/home/bst, /home/mmi, /home/other…) and consolidating
all user directories under /users. Please make note that any scripts or
programs that you have that include a hardcoded path to /home/… will need to
be changed.

As part of this migration to the new storage array, we will be increasing
everyone’s quota in their home directory from 25GB to 100GB. This should
hopefully ease some of the pressure felt by those who occasionally need more
space than what is available in the current home system, but are unable to
justify the purchase of an allocation on one of our larger arrays.

We also expect the monthy cost of home directory space to decrease when
we migrate to the new array. The old Amber1 array is a 7 year old system
for which we pay a considerable amount to Sun/Oracle for a yearly support
contract.

Please let us know if you have any questions.

Posted in Status Updates, Uncategorized | Comments Off on Upcoming changes to home directories.

JHPCE Cluster Maintenance – Thursday, August 27th from 6:00 PM – 10:00 PM

On Thursday, August 27th from 6:00 PM – 10:00 PM, we will be performing Maintenance on the Amber1 storage array to upgrade its Operating System.   We have had a number of issues with the Amber1 array in the recent past, and the root cause appears to be a memory leak in the current version of the OS, thus the vendor is recommending the upgrade.

This storage array is used for home directories on the JHPCE cluster, so logins to the cluster will be disabled during this time.  We are recommending that jobs not be run on the cluster during this maintenance window.  From our experience, when we have performed work like this in the past, jobs have generally paused while the array was unavailable and then resumed once the array is back, but we are still strongly recommending that you avoid running jobs during this maintenance window.  We apologize for any inconvenience.

Posted in Status Updates, Uncategorized | Comments Off on JHPCE Cluster Maintenance – Thursday, August 27th from 6:00 PM – 10:00 PM

JHPCE Cluster Change To Set User Memory Limit – Tuesday, August 4th, from 6:00 PM – 7:00 PM

Dear JHPCE Community,

With the ever increasing usage on the JHPCE cluster, we have had, during very busy times, a number of cases where individuals have been unable to access cluster resources.  In investigating the issue we have found that the limiting factor has been the amount of memory that was being used by individual users on the cluster.

To address this issues, the JHPCE management team will be modifying the cluster configuration to set a 1TB memory limit per user for jobs running on the shared queue.  This limit should not be noticed by the vast majority of users on the cluster, and will only affect those that run a high number of jobs that use a large amount of memory.  Please note that jobs running on dedicated queues will not be subjected to this limit; only jobs on the shared queue will be affected.

This 1TB memory limit will be managed in a similar manner to the current 200 core limit that is currently in place.  As with the 200 core limit, we can temporarily increase the 1TB memory limit for individuals that will be submitting jobs that need more than 1TB of memory, but only at times when the cluster is less heavily loaded.

The 1TB limit will be based on the “mem_free” setting for running jobs that were submitted with qsub and interactive sessions run by qrsh.  For every running job and every qrsh session, the “mem_free” values will be deducted from the 1TB limit.  As an example, if you have 20 jobs running on the cluster, where each job has “mem_free” set to 10GB, then a total of 200GB will be counted against your 1TB memory limit, leaving 800GB available for additional jobs.  As your jobs complete, your memory limit will be restored.

This change will be done on Tuesday, August 4th, from 6:00 PM – 7:00 PM.  We do not anticipate any downtime on the cluster, or any impact to jobs currently running on the cluster.  The way that you submit jobs will not change, other than being more cognizant of memory usage.

As a general reminder, please bear in mind the following guidelines for memory when submitting jobs to the cluster:

– When submitting jobs, please strive to set your “mem_free” to be as close as possible to the actual anticipated memory usage of your program.  Using a “mem_free” value that is overly large will a) potentially cause your job to be delayed in running as it awaits a node with sufficient memory, b) limit the number of jobs you can run concurrently by deducting more memory than needed from the 1TB limit, and c) prevent others from accessing memory resources on the cluster that your job is unnecessarily holding.
– There is, sadly, no hard and fast rule for estimating how much memory a program will need.  A good place to start is the size of data files that will be loaded into the program.  You can also make use of the “qacct -j <jobnumber>” command to review memory usage from similar previous jobs, to determine memory usage for future jobs.
– Please set your “h_vmem” setting to be equal to, or, at most, 1GB more than your “mem_free” setting.  Setting a higher “h_vmem” can cause oversubscription of memory on the compute nodes, which can cause other user’s jobs to be unceremoniously killed by the Linux oom-killer (Out-Of-Memory Killer).

Please feel free to email bitsupport if you have any questions

Posted in Status Updates, Uncategorized | Comments Off on JHPCE Cluster Change To Set User Memory Limit – Tuesday, August 4th, from 6:00 PM – 7:00 PM

Work to be done on the JHPCE login servers on Thursday, March 19th from 8:00 AM – 9:00 AM

Dear JHPCE Community,

Please be advised that work will be performed on the jhpce01 and jhpce02 login servers  on March 19th from 8:00 AM – 9:00 AM. During this time there may be brief periods when the JHPCE servers will not be available for login. This work will only affect new login attempts. All currently running sessions and jobs will be unaffected and will continue to run. Please let us know if you have any questions.

Mark

Posted in Status Updates, Uncategorized | Comments Off on Work to be done on the JHPCE login servers on Thursday, March 19th from 8:00 AM – 9:00 AM

Two-factor authentication is coming

Dear Users,

Up to now, we have depended on unix userid/passwords to restrict access to the JHPCE cluster. Unfortunately, it is well known that passwords are hopelessly obsolete and we cannot depend on a string of characters for protection. Don’t believe me? Read this hair-raising account.

The need to protect research data and university intellectual property from unauthorized access and actors with malicious intent, has never been greater than it is today. NIH data-use agreements and HIPAA regulations require us to implement “best practice” security protocols whenever possible .

To address these issues, we will soon require two-factor authentication to access the JHPCE cluster. Two-factor authentication is a security strategy that requires two forms of authentication to access a system. Two independent credentials are required: (1) Something you know, i.e. a password and (2) something you have, e.g. a specific smart phone or a specific laptop. Configuring two-factor authentication on the JHPCE cluster should take no-more than 10 minutes. You run an app on the login server and you install and configure a free app on your smart phone (apple or android). You use information provided by the server app to configure your smart phone app. Thereafter, every time access the JHPCE login server with a password, you will be prompted for an additional 6 digit “token” that is generated by your smart phone. Instructions and more details on two-factor authentication are on the JHPCE web site.

Those of you who already use key-pair authentication to login from specific machines, will not be prompted for either a password or the 6-digit token, so this will not impose any additional burden on you. Also, if you do not own a smart phone, there is a google chrome extension that will generate the 6-digit tokens. Your smart phone requires neither a wi-fi or cell phone connection to generate the 6-digit token. Note that the google-authenticator code was developed by google software developers, but there is no interaction with any google servers at any time.

Google two-factor authentication has been in place on the login nodes since last spring as an optional capability. Over the past two months we have required all new users to use two-factor authentication. The authentication system is now well tested and we are recommending that you configure 2-factor authentication as soon as is possible. On April 1, you will be locked out of your account if you have not configured two-factor authentication. Prior to that date we will schedule periodic in-person help sessions for anyone having difficulty configuring two-factor authentication.

For instructions on setting up 2 Factor Authentication in the JHPCE cluster, please see https://jhpce.jhu.edu/knowledge-base/authentication/2-factor-authentication/.

Stay safe!

Fernando Pineda
Director Joint High Performance Computing Exchange
Associate Professor, Dept. of Molecular Microbiology & Immunology

Posted in Status Updates, Uncategorized | Comments Off on Two-factor authentication is coming