3rd International Workshop on the Lustre Ecosystem:
Support for Deepening Memory and Storage Hierarchy
7:30-8:30am: Registration and Working Breakfast (Breakfast provided)
Agenda: Networking and discussion of panel topics
8:30–9:00am: Welcome and Introductions: Neena Imam, Oak Ridge National Laboratory
9:00-9:45am: Keynote Address 1:Tiered Data Management for Lustre: Enabling New Use Cases and Deployment Models, Mark Seamans, HPE
Abstract: The overall capabilities of Lustre have matured significantly over the past few years, and a major advance has been the addition of infrastructure to HSM-enabled Lustre via the addition of external data management platforms. This new architecture takes the use cases for Lustre beyond its legacy reputation as a 'fast scratch' file system to an environment that can be leveraged for persistent data storage backed by capabilities for capacity management, data assurance, and disaster recovery. This session will provide an overview of the Lustre HSM framework and will walk through use case examples of how the addition of data management platforms with Lustre can enable the integration of storage technologies such as tape, object storage, and the cloud; with the goal of supporting both storage cost management and long term data assurance.
Bio: Mark Seamans is the Director of HPC Data Management and Storage at Hewlett Packard Enterprise. In this role Mark manages the strategy, engineering, and delivery teams focused on storage and data management solutions for HPE's high-performance computing customer base, including Lustre file system solutions and the HPE Data Management Foundation (DMF) tiered data platform. Prior to HPE, Mark was the Senior Director for HPC Storage Solutions at Silicon Graphics Inc. (SGI) which was acquired by HPE in November 2016. Mark joined SGI following their acquisition of FileTek, a leading software provider of tiered data management software where he was Chief Technology Officer.
9:45–10:30am: Invited Talk 1: DAOS: A New Storage Paradigm, Mohamad Chaarawi, Intel
Abstract: The Distributed Asynchronous Object Storage (DAOS) is an open-source (https://github.com/daos-stack/daos) storage stack designed from the ground up to exploit NVRAM and NVMe storage. DAOS provides true byte-granular data and metadata I/O using persistent memory, combined with NVMe storage for bulk data. This, combined with end-to-end OS bypass, results in ultra-low I/O latency. The DAOS stack aims at increasing data velocity by several orders of magnitude over conventional storage stacks, while providing extreme scalability and resilience.
The DAOS API abstracts the underlying multi-tier storage architecture and offers a unified storage model over which domain-specific data models can be developed, such as HDF5, ADIOs, HDFS, and Spark. POSIX access is similarly built on top of the DAOS API. The core data model is a byte-granular key-value store which allows I/O middleware to overcome POSIX limitations and to have access to advanced capabilities like nonblocking I/O, ad-hoc concurrency control, distributed snapshots, native producer-consumer pipeline, end-to-end data integrity, index/query and in-situ analysis. DAOS also provides scalable, distributed transactions to I/O middleware with improved data consistency and automated recovery.
Bio: Mohamad Chaarawi is a senior software engineer in High Performance Data Division at Intel. He is responsible for the DAOS client library and works closely with application and I/O middleware developers to migrate to the new DAOS storage model. Before that, Mohamad was a lead developer at the HDF group and participated in design and development of several DOE funded projects such as ExaHDF5, Exascale FastForward, and Extreme Scale Storage IO (ESSIO), in addition to supporting HDF5 users. Mohamad earned his doctoral degree in computer science from the University of Houston. His doctoral research focused on Parallel I/O.
10:30–11:00am: Morning Break
11:00–11:45am: Tutorial 1: Lustre and Memory: More than Allocation, James Simmons, Oak Ridge National Laboratory
Abstract: While most people focus on the performance of Lustre in the terms of CPU cycles, the management of memory plays a critical role as well. An in-depth examination of how memory is managed and optimized in the Lustre kernel stack across its modular components will be presented. Today we have tools that give us a wealth of information about memory utilization which can have an impact from improving Lustre code to optimized node configuration. The details presented here will benefit both software developers that are exploring the Lustre code base as well as administrators looking to provide the best settings for consumers of the Lustre file system.
Bio: James Simmons has been involved in Linux kernel development since the late 90s. Over his career, he has contributed to the Linux kernel to support everything from small embedded devices to supercomputers. He also has been the maintainer and driver of several open source projects. Shortly after joining ORNL in 2008, James started working on Lustre and since then he has become the largest contributor to Lustre outside of Intel. Besides contributing to the OpenSFS tree, he is one of the maintainers of the Lustre client in the Linux kernel proper.
11:45am–12:30pm: Tutorial 2: Operational Preparation for Large-Scale Robinhood, Jesse Hanley, Oak Ridge National Laboratory
Abstract: In preparation for newer large-scale Lustre file systems arriving at NCCS, the storage team of HPC Operations at Oak Ridge National Laboratory has assessed Robinhood's ability to manage these systems. This talk will cover tuning, benchmarking, and an evaluation of Robinhood. Additional focus will be placed on operational readiness and deployment.
Bio: Jesse Hanley is a systems administrator at the Oak Ridge Leadership Computing Facility of Oak Ridge National Laboratory, where he is a member of the storage team in HPC Operations. He focuses mainly on metrics, monitoring, and automation of Lustre and Spectrum Scale file systems. He holds a Bachelor of Science Degree from Wofford College in Computer Science.
12:30–1:30pm: Working Lunch (Lunch provided)
Agenda: Discussion and feedback on morning presentations
1:30–2:15pm: Invited Talk 2: Accelerating HPC I/O with DDN Infinite Memory Engine, Jason Cope, DataDirect Networks
Abstract: This talk will present a description of IME, DDN's unique application accelerator. It will provide a description of the product, how it interacts with back-end file systems such as Lustre, and provide some performance information. Some future direction information will also be provided.
Bio: Jason Cope is a Senior Software Engineer at DataDirect Networks (DDN) and is a veteran of storage and I/O technologies. At DDN, Jason is responsible for the Infinite Memory Engine (IME) product, which aims to improve application performance using intelligent caching of data. Jason holds a PhD from University of Colorado at Boulder and prior to joining DDN, Jason has worked for Argonne National Laboratory and University of Chicago.
2:15–3:00pm: Invited Talk 3: DC-RAM: Indiana's Newest Data Capacitor, Stephen Simms, Indiana University
Abstract: In 2006 Indiana University (IU) constructed the NSF-funded Data Capacitor to function as a high bandwidth, high capacity storage system to help smooth mismatched I/O rates between different resources. In 2009, IU implemented a system called DC-WAN to provide similar service across wide area networks in support of the TeraGrid. Last year, the newest Data Capacitor at Indiana was constructed using SSDs. This talk will cover IU's experiences constructing DC-RAM and our experiences with this small but powerful solid state storage system.
Bio: Stephen Simms works for the Research Technologies division of University Information Technology Services (UITS) at IU. Simms has worked in High Performance Computing at IU since 1999 and currently leads IU's High Performance File Systems Team. He and his team have been active in Lustre research, pioneering the use of Lustre across the WAN. Simms has served as the president of OpenSFS, a user-driven non-profit organization dedicated to promoting development and use of the Lustre file system.
3:00–3:15pm: 1st Afternoon Break
3:15-4:00pm: Invited Talk 4: Multi-level Security in Lustre, Henry Newman, Seagate
Abstract: Multi-level Security (MLS) allows control of the data to which a user has access, based on the authentication information for the user; these capabilities are provided by the Red Hat SELinux operating system and the secure Lustre file system. The MLS ecosystem provides a number of platforms and tools that provide important capabilities for protecting various types of data at all security levels.
Bio: Mr. Henry Newman has over 36 years of advanced systems architecture and performance analysis expertise in solving the most complex challenges for customers in government, scientific research, and industry around the world; including hardware and software requirements analysis and design, file system and HSM design and optimization, system performance analysis and optimization, storage system architecting, high-performance networking, capacity planning, and hierarchical storage management with a focus on high performance computing and advanced UNIX, Linux, and Windows systems. For the last few years Mr. Newman has been focusing on security issues for large storage systems. Mr. Newman worked at Cray Research in a variety of capacities for over 11 years until 1992. Mr. Newman was then the CTO/CEO of Instrumental until its recent acquisition by Seagate. Mr. Newman is now the CTO for Seagate Government Solutions and works with engineering teams across Seagate's production lines from individual storage components to complete systems, with a special emphasis on security.
4:00–4:35pm: Tutorial 3: Lustre Over Long-Haul Connections Using LNet Routers, Nageswara Rao, Jesse Hanley, Sarp Oral, Neena Imam, Oak Ridge National Laboratory
Abstract: The Lustre file system over wide-area networks provides a number of desired features: (i) file transfers are natively supported by the copy operation, obviating the need for file transfer tools such as XDD, Aspera and GridFTP; and (ii) applications involving file operations can be supported transparently over wide-area networks. Typical site Lustre file systems are mounted over IB networks, which are subject to a 2.5ms latency limitation of IB flow control; solutions using IB range extenders are not scalable due to the high cost, since a pair is needed for each long-haul connection. We present a solution that utilizes LNet routers on Linux hosts to extend Lustre over existing wide-area Ethernet infrastructures. Ethernet clients on remote servers utilize LNet routers at the source site to mount Lustre on the site IB network. Throughput measurements are collected using 10GigE emulated connections between two pairs of hosts; namely, 48-core stand-alone systems used for data transfers, and 32-core hosts which are part of a compute cluster. The hosts are configured to use Hamilton TCP and their buffers are set at the largest allowable values. There results demonstrate the Lustre file system mounted over wide-area connections with 0-366ms rtt, much longer than previous works.
Bio: Dr. Nageswara Rao is a Corporate Fellow at Oak Ridge National Laboratory. He is currently involved in developing wide-area network transport solutions for memory and file transfers over dedicated connections for DOE and DOD scenarios. His work involves developing systematic test designs and analysis methods to assess and optimize data transfer methods based on UDP and TCP variants for Lustre and xfs file systems
4:35–4:45pm: 2nd Afternoon Break
4:45–5:30pm: Panel: Burst Buffer for Lustre in Exascale, Panelists: Mark Seamans, Mohamad Chaarawi, Jason Cope, Stephen Simms, Henry Newman, John Bent. Moderator: Sarp Oral, Oak Ridge National Laboratory
7:00-10:00pm: Social Event Sponsored by Warp Mechanics
Dinner and Reception
7:30-8:30am: Registration and Working Breakfast (Breakfast provided)
Agenda: Networking and feedback on day 1 talks
8:30–9:00am: Welcome and First Day Recap: Sarp Oral, Oak Ridge National Laboratory
9:00–9:45am: Keynote Address 2: The Correct Number of Tiers is 2, John Bent, Seagate
Abstract: An ever-increasing portfolio of storage media (NVDIMM, SSD, SMR, etc.) has storage developers excited and storage users burdened. Auto-tiering techniques seem to allow multiple media types with differing performance and capacity characteristics to be transparently blended into a single logical storage system. Such a system would satisfy both developers and users alike. Unfortunately, there are several challenging requirements of HPC workflows that require such an abstraction to be broken. This need for layer violation is the reason for our claim that the correct number of tiers is 2. In this talk, we will examine these challenging workflow requirements and the resulting two-tiered system needed to satisfy them.
Bio: John Bent is the Chief Architect at Seagate Government Solutions and has researched storage and I/O throughout his career. His recent focus is on parallel storage systems for High Performance Computing (HPC). At Seagate Government Solutions, John collaborates with United States government storage architects to predict future storage challenges and address them with specifically designed solutions. A graduate of Amherst College, with a focus on anthropology, John served for two years as a Peace Corps volunteer in the Republic of Palau before earning a PhD in computer science from the University of Wisconsin-Madison. John worked at Los Alamos National Labs and at EMC before joining Seagate Government Solutions in 2016.
9:45–10:30am: Tutorial 4: Lustre Distributed Lock Management, Oleg Drokin, Intel
Abstract: To guarantee POSIX compliance, Lustre uses a distributed lock management subsystem. The Lustre Distributed Lock Manager (LDLM) is therefore an essential part of the Lustre technology and it controls many important aspects from concurrency to caching. This talk will provide an overview of the LDLM and will allow the system operators to better configure and tune operational Lustre file systems. The talk will also provide a basis for better understanding of some common Lustre problems.
Bio: Oleg Drokin joined the Lustre team in 2003 and has amassed a wealth of Lustre knowledge since. He has been involved in numerous ground-breaking developments with Lustre over the years, including a spell working in the ORNL Lustre Centre of Excellence which culminated in the Jaguar cluster being the top-ranking filesystem in the top500.org list in 2009. He currently works at Intel and serves as the gatekeeper for community releases and is recognized as one of the foremost experts on the Lustre Distributed Lock Manager (LDLM) subsystem.
10:30–11:00am: Morning Break
11:00–11:45am: Tutorial 5: Open Methodology for Managing Self-Encrypting Hard Drives with Lustre/ZFS, Josh Judd, Warp Mechanics
Abstract: Encrypting data at rest for HPC systems is almost always desirable. But doing so in an open Lustre system involves cost and performance trade-offs. Software encryption imposes performance penalties undesirable in HPC systems, whereas hardware systems traditionally required vendor-locked and expensive proprietary approaches. This talk will discuss how WARP Mechanics architected a simple software solution for managing COTS hardware self-encrypting HDDs, to strike a balance using high performance hardware encryption with low cost open software.
Bio: Josh Judd has been a leader in the IT industry for over 20 years, in roles ranging from senior UNIX systems admin, to mechanical engineer and machinist, to programmer, to best-selling author, to Chief Architect, and Chief Technology Officer. While he was a pre-IPO employee at Brocade he helped create the SAN industry. He wrote numerous storage-related patents, invented and created architectural specifications for dozens of award-winning products, and has published numerous technical books on storage and networking. For the past eight years, he has been CTO at WARP Mechanics: the leading provider of commercially supported ZFS on Linux and Lustre/ZFS appliances.
11:45am–12:00pm: Closing Remarks and Adjourn