Big science means big data, and big data often means big headaches. Whether examining molecular bindings on the proteins of the coronavirus or combing through the history of the universe, the problems that supercomputers work on produce massive amounts of data that can often prove challenging during post-processing, especially when working across institutions. Over the last few years, Argonne National Laboratory has developed the “Petrel” service to enable those kinds of super-sized data transfers.
Petrel, backed by 3 petabytes of high-speed storage, is the result of a partnership between Argonne and data-sharing firm Globus. Historically, Argonne’s limitation hasn’t been its storage systems, but rather a complex system of credentialing required to access them that flummoxed collaborating institutions. By contrast, Globus provided Petrel with application programming interfaces (APIs) that allow collaborators outside of Argonne to bypass those requirements and establish direct, high-speed connections to Argonne’s storage systems.
“There was growing demand for a way to distribute data more widely so as to easily engage multiple institutions,” said Ian Foster, who directs Argonne’s Data Science and Learning division, in an interview with Argonne’s Nils Heinonen. “The remedy was to allocate a more or less broadly accessible storage system and, crucially, to make it manageable via Globus protocols, which are mechanisms for controlling the flow of data, as well as who can see it.”
Petrel is already in use for data stemming from Argonne’s Advanced Photon Source (APS), a massive X-ray machine (pictured in the header image) that uses a linear accelerator and a synchrotron to produce ultra-high-res scans of materials.
“Petrel has been critical in integrating research conducted at APS beamlines with the HPC resources of the ALCF [Argonne Leadership Computing Facility], and in disseminating scientific results to the APS user community in a timely fashion,” said Nicholas Schwarz, who leads the X-Ray Science Division Scientific Software Engineering and Data Management Group at the APS. “Whether being employed to develop new materials, probe the structure of protein molecules, or explore the structure of matter, data collected at the APS convey an extraordinary amount of information — so much, in fact, that not only do its collation and analysis require powerful computational systems, but its delivery for those purposes can prove extremely challenging.”
Argonne has also leveraged Petrel for its COVID-19 research, largely for the aforementioned molecular docking datasets.
“We’ve got hundreds of terabytes of data that have been created by different members of this collaboration, and we’re using Petrel to organize all of those data,” Foster said. “In tandem with a relational database we’ve constructed, Petrel has become the place where we collect, share and access all of our computed results, so it’s become a vital component of this important work.”
In its mission to enable yet broader access to data from its facilities, the ALCF is also welcoming Eagle, another Globus-produced data service. A 50 petabyte community filesystem, Eagle is being rolled out as part of a fully supported production environment.