Optimized Math Operators for Arm contributed to oneMath

Announcement

SiPearl has joined the UXL Foundation as a General Member. 

The Unified Acceleration (UXL) Foundation is a cross-industry, open-source initiative focused on establishing open standards for accelerated computing. It operates under the umbrella of the Linux Foundation’s Joint Development Foundation. The UXL Foundation is a collaborative effort by major tech companies ( with Steering members from Arm, Broadcom, Fujitsu, GE HealthCare, Google, Imagination, Intel, Qualcomm and Samsung) to create an open and standardized software ecosystem for the era of accelerated computing, ensuring that developers can write high-performance code that works efficiently on virtually any hardware.

This Blog

The oneMath library aims to deliver a unified and vendor neutral API for the most commonly used math operators in a wide range of software. This makes it possible for software developers to use a single API and target a range of CPUs and GPUs. This blog shows how SiPearl have added to the oneMath project a new Arm target, and how this brings you performance and portability.

SiPearl

Founded in 2019, SiPearl is building high-performance energy efficient European processors for HPC and AI. Born out of the European Processor Initiative (EPI) consortium and funded by the European Union, SiPearl contributes to ensuring Europe’s technological sovereignty and independence. Designed with the high-performance energy-efficient Arm Neoverse V1 platform, its first generation processor Rhea1 will include in a single package 80 Arm Neoverse V1 cores with 2 Scalable Vector Extension (SVE) of 256 bits each, built-in High Bandwidth Memory with 4 stacks of HBM and 4 DDR5 interfaces. It could be integrated seamlessly with third-party accelerators like GPUs, AI or quantum accelerators. In addition to High Performance Computing (HPC) workloads, its initial target market, Rhea1 is also suitable for AI inference workloads. Tailored to address critical challenges in defense, security, health, energy, and climate by enabling rapid and secure data processing while maintaining a reduced carbon footprint, SiPearl’s first product represents a significant step toward Europe’s technological sovereignty in HPC. It will equip the general-purpose cluster module of JUPITER, Europe’s first exascale supercomputer, which is owned by EuroHPC and operated by Forschungszentrum Jülich.

Integration of Arm CPUs in oneMath by SiPearl

Governed by the UXL Foundation, oneMath aims at providing a portable and vendor-neutral interface for calling highly optimized mathematical functions across various devices. Math functionality is at the core of most scientific applications and with such an interface, developers write their code once, and it is then transparently dispatched to the optimal underlying math library for the target hardware (CPU, GPU or other accelerators). The dynamic dispatch offered by this library allows vendors to provide dedicated, optimized implementation to take the most out of their hardware. Such a framework allows the developer to ensure the best performance, regardless of the combination of hardware used. This flexibility comes at minimal cost, since a single and consistent API is used. In the context of HPC, such flexibility combined with the guarantee of performance, is crucial to target multiple supercomputers with a variety of hardware.

oneMath provides interfaces for many mathematical domains critical for HPC and AI applications:

  •  BLAS (Basic Linear Algebra Subprograms): These libraries handle fundamental linear algebra operations, like vector and matrix computations. Examples include NETLIB’s original BLAS, cuBLAS (NVIDIA), rocBLAS (AMD), or MKL BLAS (Intel).
  •  LAPACK (Linear Algebra PACKage): Focused on more advanced linear algebra routines such as solving systems of equations, eigenvalue problems, and singular value decomposition (Original version from NETLIB, MKL Lapack, AMD RocSolver, Nvidia CuSolver…)
  •  FFT (Fast Fourier Transform): Libraries in this domain are dedicated to efficient computation of Fourier transforms, crucial for signal processing and other applications.
  • Sparse Libraries: These handle operations on sparse matrices, including solving sparse linear systems.
  •  RNG (Random Number Generation): For stochastic processes, simulations, and other applications requiring random numbers.
  • Sparse Solver Libraries: This domain specializes in solving linear systems and decompositions over sparse matrices.

oneMath also provides backends for HPC-oriented devices, particularly CPUs and GPUs of various vendors, through either vendor provided libraries or open-source ones. With the rise of Arm-based architectures in both datacenters and HPC supercomputers due to their efficiency in compute and power, Arm developed a suite of libraries called Arm Performance Libraries (ArmPL). ArmPL provides optimized implementation for most popular mathematical libraries used in the HPC world (BLAS/Lapack/RNG/..). These implementations take the most out of the two SIMD paradigms available on the AArch64 Neoverse-V profile, namely SVE and Neon.

With SiPearl developing a high-performance Arm based CPU, bringing oneMath to Arm CPUs, using ArmPL as support was a natural step. SiPearl’s effort started with the BLAS library, the standard building blocks for many HPC codes. This first milestone demonstrated the viability of a “one code” approach of oneAPI for Arm CPUs. Thanks to ArmPL, which implements BLAS and many later extensions from Intel’s MKL, SiPearl successfully integrates ArmPL with oneMath. Among the others features integrated or in the process of being integrated:

  • LAPACK is also provided in ArmPL, with LAPACKE C interface, which allows for a quick integration of most of the features in oneMath, apart from the batched calls, for the current version. This backend was submitted and merged in oneMath in March.
  • For Random Number Generator (RNG) Arm develops the open source OpenRNG library. Included in ArmPL for Arm processors, it’s a drop-in replacement for Intel® Vector Statistics Library, which is part of Intel ® oneMKL. As oneMKL backend is already available for oneMath RNG library, providing an OpenRNG/ArmPl backend for ARM CPUs was natural, and such backend was merged in March in oneMath repository.
  • ArmPl provides a FFTW3-based backend for fast discrete Fourier transformations (DFT). The work to propose an ArmPL backend will also benefit the FFTW3 community, as native FFTW will also be made compatible with oneMath. This work is still ongoing and shall be upstreamed to oneMath in the future.

Status of Integration as of June 2025

This backend was accepted and merged with oneMath in January 2025. Today 93% of the calls of oneMath API can fallback to efficient implementation for Arm. Only 7% remain unimplemented as of today. In the process of this work, collaboration with both the oneMath community and Arm team led to the discovery and correction of several issues. This has led to a more robust environment for every user of all the involved projects and recently public CI has been added for this Arm target in oneMath. We also presented this work to the UXL Foundation Math SIG meeting, you can catch up with the recording.

Experiments show that today, thanks to the addition of ArmPL to oneMath, performance comparable to native call to ArmPL can be achieved on BLAS call. This opens a wide set of opportunities for HPC application developers to simplify their flow while ensuring good performance. Typical use cases are well known HPC applications such as BigDFT (material science) or Gromacs (molecular dynamics). SiPearl is continuing this effort to oneMath by planning the future integration of the DFT domain to ensure these applications will work perfectly with Arm CPU.

The AERO Project

This work was realized in the context of the AERO project. 

SiPearl is a member of the AERO Project, an EU-funded collaborative research project, which aims to enable the future heterogeneous European cloud infrastructure. This project complements the effort of the European Processor Initiative (EPI) consortium, the initiative behind SiPearl’s inception. It aims to develop the open-source software ecosystem required to improve the efficiency of SiPearl’s processor but also accelerate and ease its integration into the cloud. The outcome of the AERO Project will be a set of compilers, runtime systems, operating systems, system software, and auxiliary software deployment services that can seamlessly exploit the heterogeneity aspects of the processor with regards to high performance, energy efficiency, and security. Apart from SiPearl, the partners of the project are the Foundation for Research and Technology – Hellas (Forth, Greece), the Institute of Communication and Computer Systems (ICCS – Greece), Pierer Innovation (Germany), Red Hat (US), Sednai (Switzerland), Ubitech (Greece), Universta di Pisa (Italy), Université de Genève (Switzerland), University of Manchester (UK), Virtual Open Systems (France) and Codeplay Software (UK).

By Etienne Renault and Augustin Degomme, SiPearl

×

Join us at the UXL Foundation Mini Summit @ OSS in Denver, Colorado
June 26, 2025

Register