FAIR Ecosystems for Science at Scale
By: Sean R. Wilkinson, Patrick Widener
Potential Business Impact:
Helps scientists share computer tools to do science faster.
High Performance Computing (HPC) centers provide resources to users who require greater scale to "get science done". They deploy infrastructure with singular hardware architectures, cutting-edge software environments, and stricter security measures as compared with users' own resources. As a result, users often create and configure digital artifacts in ways that are specialized for the unique infrastructure at a given HPC center. Each user of that center will face similar challenges as they develop specialized solutions to take full advantages of the center's resources, potentially resulting in significant duplication of effort. Much duplicated effort could be avoided, however, if users of these centers found it easier to discover others' solutions and artifacts as well as share their own. The FAIR principles address this problem by presenting guidelines focused around metadata practices to be implemented by vaguely defined "communities"; in practice, these tend to gather by domain (e.g. bioinformatics, geosciences, agriculture). Domain-based communities can unfortunately end up functioning as silos that tend both to inhibit sharing of solutions and best practices as well as to encourage fragile and unsustainable improvised solutions in the absence of best-practice guidance. We propose that these communities pursuing "science at scale" be nurtured both individually and collectively by HPC centers so that users can take advantage of shared challenges across disciplines and potentially across HPC centers. We describe an architecture based on the EOSC-Life FAIR Workflows Collaboratory, specialized for use with and inside HPC centers such as the Oak Ridge Leadership Computing Facility (OLCF), and we speculate on user incentives to encourage adoption. We note that a focus on FAIR workflow components rather than FAIR workflows is more likely to benefit the users of HPC centers.
Similar Papers
Designing FAIR Workflows at OLCF: Building Scalable and Reusable Ecosystems for HPC Science
Distributed, Parallel, and Cluster Computing
Helps scientists share and reuse computer tools.
Towards FAIR and federated Data Ecosystems for interdisciplinary Research
Databases
Lets scientists share and reuse each other's data.
Automatic Metadata Capture and Processing for High-Performance Workflows
Distributed, Parallel, and Cluster Computing
Helps scientists track how computer programs run.