Exploring Object Stores for High-Energy Physics Data Storage

Published on Jul 15, 2021in Epj Web of Conferences
· DOI :10.1051/EPJCONF/202125102066
Javier López-Gómez , Jakob Blomer12
Estimated H-index: 12
Sources
Abstract
Over the last two decades, ROOT TTree has been used for storing over one exabyte of High-Energy Physics (HEP) events. The TTree columnar on-disk layout has been proved to be ideal for analyses of HEP data that typically require access to many events, but only a subset of the information stored for each of them. Future colliders, and particularly HL-LHC, will bring an increase of at least one order of magnitude in the volume of generated data. Therefore, the use of modern storage hardware, such as low-latency high-bandwidth NVMe devices and distributed object stores, becomes more important. However, TTree was not designed to optimally exploit modern hardware and may become a bottleneck for data retrieval. The ROOT RNTuple I/O system aims at overcoming TTree's limitations and at providing improved efficiency for modern storage systems. In this paper, we extend RNTuple with a backend that uses Intel DAOS as the underlying storage, demonstrating that the RNTuple architecture can accommodate high-performance object stores. From the user perspective, data can be accessed with minimal changes to the code, that is by replacing a filesystem path by a DAOS URI. Our performance evaluation shows that the new backend can be used for realistic analyses, while outperforming the compatibility solution provided by the DAOS project.
References4
Newest
#1Jakob BlomerH-Index: 12
#2Philippe CanalH-Index: 8
Last. Danilo Piparo (CERN)
view all 4 authors...
The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (“branches”) that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the R...
8 CitationsSource
#1Mike FolkH-Index: 3
#2Gerd HeberH-Index: 2
Last. D. RobinsonH-Index: 16
view all 5 authors...
In this paper, we give an overview of the HDF5 technology suite and some of its applications. We discuss the HDF5 data model, the HDF5 software architecture and some of its performance enhancing capabilities.
249 CitationsSource
#1Hongzhang Shan (LBNL: Lawrence Berkeley National Laboratory)H-Index: 18
#2John Shalf (LBNL: Lawrence Berkeley National Laboratory)H-Index: 59
The HPC community is preparing to deploy petaflop-scale computing platforms that may include hundreds of thousands to millions of computational cores over the next 3 years. Such explosive growth in concurrency creates daunting challenges for the design and implementation of the I/O system. In this work, we first analyzed the I/O practices and requirements of current HPC applications and used them as criteria to select a subset of microbenchmarks that reflect the workload requirements. Our analys...
94 Citations
Abstract The ROOT system in an Object Oriented framework for large scale data analysis. ROOT written in C++, contains, among others, an efficient hierarchical OO database, a C++ interpreter, advanced statistical analysis (multi-dimensional histogramming, fitting, minimization, cluster finding algorithms) and visualization tools. The user interacts with ROOT via a graphical user interface, the command line or batch scripts. The command and scripting language is C++ (using the interpreter) and lar...
3,454 CitationsSource
Cited By0
Newest