Available Technology

Processing Protected Data on High Performance Computing Clusters

Lawrence Livermore National Laboratory (LLNL), operated by the Lawrence Livermore National Security (LLNS), LLC under contract no. DE-AC52-07NA27344 (Contract 44) with the U.S. Department of Energy (DOE), is offering the opportunity to collaborate on and commercialize LLNL’s new method and system for securely processing protected data on high performance computing clusters.

 

Background: The markets for computing with big data sets are rapidly growing.  Data analysts, biomedical researchers and a wide variety of other scientists are seeking to run simulations on large scales – using traditional high-performance computing centers as well as cloud computing – and with real-world data sets that have been specially curated and packaged.  With this increasing intersection between big data and big hardware and specialized software, there is a growing need for securing the data in use to meet regulatory and privacy demands as well as preserving the organization’s competitive advantages.

Using traditional high-performance computing (HPC) clusters requires additional security for sensitive data. Typical traditional HPC systems execute large and complex compute tasks, such as sophisticated simulation and data analysis, utilizing hundreds to thousands of individual computers (“compute nodes”) that work together. HPC clusters typically operate in “batch” mode: a user submits a request for computation time to the “batch system” which then runs through a variety of steps to execute the job. A key feature of this mode of execution is that the user need not be connected when their job is launched or executed on the cluster. A second key feature is that the job (usually) executes with all the permissions and access afforded to the user when they are connected. Finally, many users’ jobs can be executing simultaneously, using separate sets of compute nodes in the cluster.

Many data application domains require stringent access control, protection, logging, and auditing for storage and use of sensitive data. The most stringent controls require encryption of data at rest (stored on disk or tape), and in transit (while being transferred over a network). Additional controls may be required wherever data is decrypted or encrypted: wiping of memory, emptying of caches, and secure management of encryption keys.

 

The traditional way of applying encryption tools to protect data result in two protection states for a piece of protected data with respect to a specific user: either the data is encrypted, and not usable by the user, or it is decrypted and completely usable by the user. This traditional approach has a number of issues. These issues are particularly severe in a typical HPC cluster, which operates as a shared resource and in batch mode, providing storage and access to many users simultaneously. Available approaches to utilize encryption in HPC settings require significant changes to the HPC operational and execution environment, and only partially address these issues.

LLNL has developed a new method for securely processing protected data on HPC systems with minimal impact on the existing HPC operations and execution environment. It can be used with no alterations to traditional HPC operations and can be managed locally. It is fully compatible with traditional (unencrypted) processing and can run other jobs, unencrypted or not, on the cluster simultaneously.  The method has been prototyped and is continuing to be developed at LLNL. 

Benefits 
  • The requesting user identity, as claimed in a user certificate, is explicitly verified to ensure that the requesting process is executing as assigned by the verified user.
  • The trusted components are explicitly identified, including how they are authenticated, what trusted information they have access to, and the specific version executing.
  • The user software never has access to the actual decryption keys and does not need modification. The user software can perform arbitrary local processing on the unencrypted data, except read or write output, outside the LLNL method.
  • All accesses to read or write protected data are logged and auditable. The log also provides authenticated provenance on all produced output. Provenance and chain of custody tracking is available for derived data objects on HPC clusters.
  • Data owners are explicitly identified, explicitly set enforceable policy, control individual access, and can revoke or deny access at any time in the future.
applications 
Lab Representatives
Share to Facebook Share to Twitter Share to Google Plus Share to Linkedin