HPC Bursting
HPC may provide bursting capabilities to researchers or classes, in some cases, in order to augment the available resources. Bursting is ideal for when you need a large amount of resources for a very short period of time. The way that bursting is made possible is by running a scalable SLURM cluster in the Google Cloud Platform (GCP), which is separate from the on-premise HPC clusters.
Bursting is not available to all users and requires advanced approval. In order to get access to these capabilities, please contact hpc@nyu.edu to check your eligibility. Please let us know the amount of storage, total CPUs, Memory, GPU, the number of days you require access, and the estimated total CPU/GPU hours you will use. For reference, please review the GCP cost calculator. Please send a copy of your cost calculation to hpc@nyu.edu as well.
To request access to the HPC Bursting capabilities, please complete this form.
Running a Bursting Job
This is not public, only per request of eligible classes or researchers
ssh <NetID>@greene.hpc.nyu.edu
ssh to the class on GCP (burst login node) - anyone can login but you can only submit jobs if you have approval
ssh burst
Start an interactive job
srun --account=hpc --partition=interactive --pty /bin/bash
If you got an error "Invalid account or account/partition combination specified" it means your account is not approved to use cloud bursting.
Once your files are copied to the bursting instance you can run a batch job from the interactive session.
Access to Slurm Partitions
In the example above the partition "interactive" is used. You can list current partitions by running command
sinfo
However, approval is required to submit jobs to the partitions. Partitions are set up by the resources available to a job, such as the number of CPU, amount of memory, and number of GPUs. Please email hpc@nyu.edu to request access to a specific partition or create a new partition (e.g. 10 CPUs and 64 GB Memory) for more optimal cost/performance of your job.
Storage
Torch's /home
and /scratch
are mounted (available) at login node of bursting setup.
Compute node however, do have independent /home
and /scratch
. These /home
and /scratch
mounts are persistent, are available from any compute node and independent from /home
and /scratch
at Torch.
User may need to copy data from Torch's /home
or /scratch
to GCP mounted /home
or /scratch
When you run a bursting job the compute nodes will not see those file mounts. This means that you need to copy data to the burst instance.
The file systems are independent, so you must copy data to the GCP location.
To copy data, you must first start an interactive job. Once started, you can copy your data using scp from the HPC Data Transfer Nodes (greene-dtn). Below is the basic setup to copy files from Torch to your home directory while you are in an interactive bursting job:
scp <NetID>@greene-dtn.hpc.nyu.edu:/path/to/files /home/<NetID>/
Current Limits
20,000 CPUs available at any given time for all active bursting users