Chasing Exascale

At SC22, Dr. Dan Stanzione, associate vice president for research at the Texas Advanced Computing Center (TACC) in Austin, Texas, provided insights on the ways his team helps the scientific and development community prepare for the exascale era and how oneAPI multiarchitecture programming is critical to that effort.

TACC’s Frontera supercomputer is currently the fastest supercomputer on a university campus in the U.S. While TACC had previous experience with Stampede, Stampede and other HPC systems, one of the biggest challenges with Frontera was scaling up beyond what they’d attempted before. The success in maximizing the cutting-edge Intel technology it contains is enabling TACC to advance science across so many different disciplines, and helping researchers make the most of Frontera. Here are some of Stanzione’s insights about how they are aiding developers.

Q: For those preparing to make the best use of Frontera, do you have advice about what things programmers should focus on? How much can your team help the developers who may be great scientists but maybe not code optimization experts?

A: Arguably, chips are the most complicated structures humans have ever engineered, but the software is an inherently complex living organism. We are working with about 20 of our biggest codes and doing a deep-dive analysis to determine what architectures are more efficient. By improving poorly structured I/O, for example, we’ve seen a factor of five in speed-up without changing the hardware or even the core algorithm. At TACC, we have many more users than staff to assist them with optimization. So, it’s important to propagate best practices to assist developers. We also appreciate help for developers from big corporations like Intel to help address these types of challenges moving forward.

Q: How would you differentiate the work TACC supports with Frontera versus the Department of Energy [DoE] projects running on it?

A: The big difference between them is the breadth of applications. The Department of Energy has done a tremendous job with the Exascale Project. They’ve ported a set of applications across different architectures, and they use the most performant abstractions to accomplish that. Their work pushes the field forward. I think a big difference in our other scientific work is the sheer volume of codes. The DoE has worked on 36 specialized codes with a one-and-a-half-billion-dollar budget. In contrast, we have hundreds of thousands of codes in the mix. At least 26 teams use our Frontera system. Some of those projects tapped half the scale of the Frontera system. That’s a huge statement considering Frontera has more than a quarter million cores and counting! Plus, most of the 26 software teams are very small and comprised of a few grad students, a post-doctoral expert, and one research scientist. For us, it’s all about enabling that ecosystem of programmers. This is where oneAPI is critical since it assists with all the different abstractions needed on multiarchitecture systems, even those with multiple vendors’ hardware.

Q: People are changing the world with Frontera. Of those hundreds of thousands of codes, can you describe a few most impactful projects?

A: In the last few years, the COVID-related codes remained a top priority for us. About 30% of our cycles in mid-2020 went into COVID projects supporting vaccine development and other drug treatments. For that effort, Frontera supported some enormous molecular dynamics workloads to identify how one molecule docks with another. That required about 200 million simulations. Tensorflow-based artificial intelligence (AI) code helped evaluate about four billion molecules to find the best drug candidates. Ultimately, Frontera helped identify good drug candidates in the first three months, which was a fantastic achievement. During that process, we found that some workloads ran best on GPUs, and others ran better on CPUs. A hybrid of programming languages supported those workloads, including C++, Python, and Fortran.

We also supported many astronomy projects through image processing for the Event Horizon Telescope’s black hole imaging processing and the James Webb Space Telescope’s imaging of very distant celestial bodies. That’s inspiring work too.

Q: It’s been said we’re in a golden age of computing. In what ways do you see that manifesting?

A: We’ve witnessed many architectural innovations to accommodate complex physics. We also see how the FLOPS-per-watt ratio remains relatively constant across generations of processors like Intel’s multiple lines of CPUs and GPUs, including Intel® Xeon® Scalable processors, Intel® Data Center GPU Max Series, and Intel’s Habana® Gaudi® AI Training processor. There’s quite a bit of complexity in the software ecosystem, and it’s essential to pick the right abstractions to reduce that complexity. That’s where AI has an advantage. We can virtualize computers to make them work in an era of increasingly diverse architectures.

Q: A few decades ago, the cost to cool a computer was similar to the cost to run it. Those numbers have come down, but what can we do to reduce their carbon footprint further?

A: HPC systems and the processors behind them have become more energy efficient. However, we’re also seeing that the cost of electricity — especially in Europe — has increased substantially. That trend is expected to continue. Frontera will require about $11 million a year to power. We currently have a PUE [power usage effectiveness] of 1.21, and we’re exploring several ways to reduce power requirements further. One key is exploring how to use our system’s processing power more efficiently. For some workloads, we’re running at about 70% power. But we could run the same workload at 80% power for one-quarter of that time. That means we can use fewer joules for a given workload. Software also plays a critical role. We examine how efficient — or inefficient — particular algorithms are. By improving the data center infrastructure, we can reduce our carbon footprint by maybe 10 percent. We can achieve a much more significant reduction by enhancing the software to make it run more efficiently.

It’s been quite a journey getting Frontera ready to support breakthrough science. It is exciting to see how it’s making a significant impact worldwide.

Learn more

· HPCWire: Visualization & Filesystem Use Cases Show Value of Large Memory Fat Notes on Frontera

· Visualize photorealistic, high-resolution climate data with the German Climate Computing Centre (Deutsches Klimarechenzentrum, DKRZ) on Frontera: DKRZ simulation180-degree display
DKRZ is experimenting with 1 kilometer global data modeled for years 1850–2300.

· Intel® oneAPI Tools

This article was produced as part of Intel’s editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC and AI communities through advanced technology. The publisher of the content has final editing rights and determines what articles are published.

×


Watch the oneAPI DevSummit hosted by UXL:

Watch Now