Subfiling and Multiple dataset APIs: An introduction to two new features in HDF5 version 1.14
For parallel I/O, the principle behind Subfiling is to find the middle ground between a single shared file and one file per process, thereby avoiding the complexity of one file per process and minimizing the locking issues of a single shared file on a parallel file system. The first part of the talk will cover Subfiling’s implementation, its usage, and the performance benefits observed compared to a single shared file. The second part of the talk will introduce new HDF5 multiple dataset APIs and highlight the performance benefits when using them. The HDF5 library allows a data access operation to access one dataset at a time. However, accessing multiple datasets requires the user to issue an I/O call for each dataset. Hence, the new multiple dataset APIs allow users to access multiple datasets with a single I/O call. In addition, the new routines can improve performance, especially when data is accessed across several datasets from all processes.
Presenters: Neil Fortner and Jordan Henderson
The webinar will be held on September 30, 2022.
The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC, and the DOE Exascale Computing Project (ECP), organizes the webinar series on Best Practices for HPC Software Developers.
As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The October webinar is titled Investing in Code Reviews for Better Research Software; and will be presented by Thibault Lestang (Imperial College London), Dominik Krzemiński (University of Cambridge), and Valerio Maggio (Software Sustainability Institute). The webinar will take place on Wednesday, October 12, 2022 at 1:00 pm ET.
Abstract:
Code review is a development practice that improves readability and maintainability of software projects, in addition to making collaboration easier and teamwork more effective. Typically, code review is a conversation between reviewer(s) and the author(s) of the code under review. The code is dissected and analyzed in order to find areas of improvement according to the focus of the review. Examples include, but are not limited to, readability, security or performance improvements. Despite code review being an effective tool for improving software quality, it is still not a standard practice within the scientific software development process. The webinar will detail the benefits that code review can bring to scientific software developers, particularly improvements in software quality, improved teamwork and knowledge transfer. The presenters will highlight common difficulties faced by researchers to set up, perform and maintain frequent code reviews, and they will discuss several approaches and good practices to mitigate these difficulties. The presenters will also describe common tools that make code reviews easier and give examples of how to use them effectively, while explaining a typical code development cycle with continuous integration and automatic code checks.
Abstract
In April, the United States presidential administration announced a whole-of-government effort focused largely on gathering and increasing access to disaggregated data on the experiences of historically underserved groups. The importance of disaggregating the data on specific subpopulations can easily be overlooked in efforts that target diversity broadly. Drawing inspiration from astrophysics, this talk will focus on data and analyses related to the hiring of a specific population that is underrepresented in scientific research: African-American doctoral degree holders. Using the Drake equation to frame the discussion, the talk will address the extent to which the search for African-American terrestrial intelligence (SATI) can be understood through the analytical lens of the search for extra-terrestrial intelligence (SETI). With this framing, we will tackle an oft-cited cause for underrepresentation, the pipeline, in light of statistical arguments suggesting the implausibility that pipeline problems fully explain the observed underrepresentation in some elite settings. The talk will briefly touch some unexpected benefits of involving a more diverse population in science, arguing that diverse groups both do scientific research differently and do different scientific research. The talk will conclude with a call for accountability through disaggregating data in diversity, equity, and inclusion (DEI) initiatives.
Closed captions will be available for this talk.
This webinar is brought to you by the Exascale Computing Project (ECP) HPC Workforce Development and Retention Action Group, which organizes a webinar series on topics related to developing a diverse, equitable, and inclusive work culture in the computing sciences.
The talk will be recorded and posted to our archive, but the Q&A session will not be recorded.
The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC, and the DOE Exascale Computing Project (ECP), organizes the webinar series on Best Practices for HPC Software Developers.
As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The November webinar is titled Managing Academic Software Development; and will be presented by Sam Mangham (University of Southampton). The webinar will take place on Wednesday, November 9, 2022 at 1:00 pm ET.
Abstract:
Developing academic software can be an unusual exercise, especially compared to traditional software development. The goals and inputs can be undefined and fluctuating, whilst the code itself has traditionally been a stepping stone – a byproduct on the way to papers, ending up ad-hoc, unplanned and undocumented. Fortunately, things are changing. There are tools and techniques that make it easier to design, use, distribute and cite scientific software. This webinar discusses approaches to managing the development and release of academic software, ranging from coding best practices and project boards, to development environments and automated documentation that can help you write sustainable code that is easy to use, cite and collaborate with and on.
In response to the COVID-19 pandemic and transition to remote work, ECP and the IDEAS Productivity project launched the panel series Strategies for Working Remotely, which explores important topics in this area.
Abstract:
- This panel features brief presentations followed by engaging discussion from contributors to the SC22 Early Career Program invited talks on life/work balance conducted at the annual International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22). Speakers will provide tips and lessons shared with the SC22 Early Career Program panel on work/life balance, parenting, strategies for working remotely, and on how everyone, especially those early in their careers, can apply lessons learned from pandemic-driven change and resiliency.
Speakers:
- Scott Callaghan, University of Southern California (USC)
- Julia Mullen, Massachusetts Institute of Technology (MIT) Lincoln Laboratory
- Elaine Raybourn, Sandia National Laboratories
Moderators:
- Osni Marques, Lawrence Berkeley National Laboratory
- Suzanne Parete-Koon, Oak Ridge National Laboratory
The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC, and the DOE Exascale Computing Project (ECP), organizes the webinar series on Best Practices for HPC Software Developers.
As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The December webinar is titled Lab Notebooks for Computational Mathematics, Sciences & Engineering; and will be presented by Jared O’Neal (Argonne National Laboratory). The webinar will take place on Wednesday, December 14, 2022 at 1:00 pm ET.
Abstract:
As computational mathematics, science, and engineering problems become larger, more ambitious, and more complex, it is increasingly important to develop and use tools and techniques that ensure that computational research is based on a strong foundation of general, low-level scientific best practices. In this webinar, the speaker will relate his experience of transitioning from working in the worlds of experimental and observational sciences to the world of computational sciences as well as his experience adapting experimental tools and techniques to computational research. In particular, the speaker will focus on the role of lab notebooks in experimental sciences and present concrete examples to address the challenges associated with adapting lab notebooks to computational research.
The IDEAS Productivity project, in partnership with the DOE Computing Facilities of the ALCF, OLCF, and NERSC, and the DOE Exascale Computing Project (ECP), organizes the webinar series on Best Practices for HPC Software Developers.
As part of this series, we offer one-hour webinars on topics in scientific software development and high-performance computing, approximately once a month. The January webinar is titled Openscapes: supporting better science for future us; and will be presented by Julia Stewart Lowndes (Openscapes). The webinar will take place on Wednesday, January 11, 2023 at 1:00 pm ET.
Abstract:
Openscapes champions open practices in environmental science to help uncover data-driven solutions faster. In this webinar the speaker will share how she transitioned from doing her own marine ecology research to founding Openscapes to support other researchers and grow the global Open Science movement. The speaker will share lessons learned from her work mentoring government, non-profit, and academic environmental and Earth teams, with specific stories from projects with NASA and NOAA Fisheries. The webinar will reuse parts of a recent keynote at RStudio::conf that was the global launch of Quarto, a new, open-source, scientific and technical publishing system. The webinar will include a demo on some features of Quarto for R and Python users and highlight how more reusing and less reinventing is critical for science. The speaker will also discuss how open source/science is a daily practice, and an important avenue to increase inclusion in science and contribute to the climate movement.
Join us February 6–10, 2023, for the virtual ECP Project Tutorial Days covering best practices for exascale-era systems. Topics include power management on exascale platforms with Variorum, performance evaluation using the TAU performance system, auto-tuning tools, and developing robust and scalable next-generation workflows, applications, systems, and much more. Interested participants need to sign up.
Join us February 6–10, 2023, for the virtual ECP Project Tutorial Days covering best practices for exascale-era systems. Topics include power management on exascale platforms with Variorum, performance evaluation using the TAU performance system, auto-tuning tools, and developing robust and scalable next-generation workflows, applications, systems, and much more. Interested participants need to sign up.
The Exascale Computing Project (ECP) 2023 Community Birds-of-a-Feather (BOF) Days will take place February 14–16, with multiple sessions each day.
The annual BOF Days provide an opportunity for the high-performance computing community to engage with ECP teams to discuss the project’s latest development efforts.
Each of the 2023 BOF sessions on a given topic will last from 60 to 90 minutes and include a brief overview and a Q&A. The BOFs will be conducted via Zoom.