High performance computing cluster is an imperative resource for NGS dry lab operation. Traditionally, we can set up SGE or Slurm cluster on on-premises servers, which requires significant capital investment and IT resource commitment. With the growth of Cloud technology, setting up the cluster infrastructure not only becomes easier and cheaper, but also can achieve higher flexibility.
Based on pay-for-service model, computing resources on Cloud only incur cost when they are put in use. Customers do not need to pay anything after the resources are released. The model fits very well for the computational jobs we need to run for NGS sequence analysis or any other computation intensive tasks. When new job comes, virtual machine (VM) can be booted up on cloud to perform number crunching. After it is done and results are archived, VM will be released and customers can stop paying extra.
Another advantage of doing computational jobs on Cloud is to take advantage of cheaper instances (running of VM) Cloud platform allocates for allowing customers to bid for. For example, AWS has spot instance whose running cost can fluctuate based on customer demands. Generally, it can achieve 50%-70% cost saving compared to the regular instances. The main drawback of such kind of instance is that the Cloud platform can reclaim it at any time if the demands become high. However, if the cluster you build in cloud can handle such situation well, you can take advantage of the great saving plan.
The share-resource nature of Cloud platform also brings challenges when implementing the cluster for running computational jobs. The most important one is security. Unlike on-premises IT infrastructure on which firewall setting, routing rules and network protection have been taken care of, computing resources in Cloud need to specially handled to avoid security pitfalls. However, generally the cluster users, most of the time, Bioinformaticians may not be security savvy and therefore, they need to get more helps from IT to use Cloud resource, which is indeed in contradict to the initial motivation of using Cloud for most of the businesses.
To benefit from the advantages provided by Cloud platforms and avoid the disadvantages, we have developed the LIMS-ext-cloud platform for NGS dry lab to use. Computing VM can be launched automatically when jobs are scheduled and released when they are done. If the VM is reclaimed by the Cloud platform, a substituted VM will be started and can pick up the remaining job tasks. The whole system is set up in its own VPC (Virtual Private Cloud), for which the security groups, network routing rules and firewall settings are pre-defined and codified using the concept of Infrastructure as Code. It makes sure the system can be booted up not only uniformly, but also quickly.
The LIMS-ext-cloud platform is a self-contained system including all of the components required for running workflows by NGS dry labs. For example, it includes its own Docker Registry for managing Docker containers used for computing jobs. Such setting will save the data transfer cost charged by the Cloud platform if the Docker images are pulled out from an external resource.
The diagram above shows the architecture of LIMS-ext-cloud platform. For more detailed information and how the system is used, we will discuss more in future blog articles.