Back to Blog

How We Ported oneDAL on Arm for Accelerated AI Workloads for FUJITSU-MONAKA

November 5, 2024

Introduction

Open-source ecosystem plays a critical role in the rapid advancement and widespread implementation of new digital technologies. By porting Intel’s oneAPI Data Analytics Library (oneDAL) to Arm-based architectures and leveraging Arm’s Scalable Vector Extension (SVE), Fujitsu demonstrates how open-source contributions can drive significant performance improvements and energy efficiency in AI workloads.

This effort supports the vision of green data centers and sustainable digital transformation. Fujitsu’s active participation in the open-source community collaboration highlight how open ecosystems facilitate the seamless integration of cutting-edge technologies across diverse platforms, ensuring accessibility and innovation in high-performance computing.

According to Vivek Mahajan, Corporate Executive Officer, Global CTO, CPO, in charge of System Platform at Fujitsu, “An Open-Source Ecosystem Is Vital for the Swift Propagation of New Digital Technologies, Enabling Their Implementation Anywhere and Everywhere.” Fujitsu’s Open-Source Efforts, Contributing to a Sustainable World, December 2023, https://activate.fujitsu/en/insight/tl-mahajan-art-oss-20231205

Background

Innovative digital transformation is driving rapid growth in data processing, leading to increased demand for large-scale data centers (DCs). However, this growth comes with a rise in power consumption, contributing to a higher carbon footprint and exacerbating global warming concerns. Fujitsu’s vision is to develop an energy-efficient AI software stack to enable maximized Arm CPU performance and support green data centers for sustainable digital transformation.

Fig 1: Fujitsu. “The Environmental Impact of Growing Data Processing Demand.” Linaro Connect 2024 Conference, https://resources.linaro.org/en/resource/jbEniWjE2s5gTopqUEjSTr . Accessed 2024

In data center operations, 80% of the workload is machine learning applications. Our focus is on AI software acceleration to contribute towards low-power green data centers with Arm-based HPC technologies.

Fig 2: Fujitsu. “DC Market Size Analysis.” Statista Research Report, presented during Linaro Connect 2024 conference, https://resources.linaro.org/en/resource/jbEniWjE2s5gTopqUEjSTr . Accessed 2024

Introduction to Scikit-Learn-Intelex and oneDAL

Scikit Learn is the most popular ML library used for classification, regression, clustering, dimensionality reduction, model selection, preprocessing etc. Intel® Extension for scikit-learn (scikit-learn-intelex) speeds up scikit-learn applications for x86 CPUs and GPUs across single- and multi-node configurations using oneDAL (Data Analytics Library). It is a free software AI accelerator designed to deliver over 10-100X acceleration to existing scikit-learn code.

Achieves up to 100x faster training and inference with equivalent mathematical accuracy.
Easily integrate the extension into Scikit-learn applications without code changes.

Historically, oneDAL could only be compiled on x86 architecture due to Intel’s Math Kernel Library (MKL) binary-only backend to accelerate Machine Learning algorithms. So, the challenges were software acceleration achieved with vector instructions, AI hardware-specific memory optimizations, threading, and optimizations were only limited to Intel hardware. This library, formerly part of oneAPI, is now part of UXL Foundation: Unified Acceleration.

Fujitsu Successfully Ported oneDAL on Arm

Fujitsu’s innovation enabled the porting of oneDAL to Arm architecture, marking a significant advancement in open-source contributions to the UXL Foundation. This code contribution by Fujitsu successfully enables oneDAL multi architecture build with reference backend selection on Arm and optimized Scikit-Learn Algorithms.

Fig 4: Fujitsu. “oneDAL PR (#2614) Successful Porting on Arm by Fujitsu”, Copyright contributors to the oneDAL project https://github.com/oneapi-src/oneDAL/pull/2614

oneDAL has many critical optimized compute kernels for BLAS/LAPACK operations and FFT computations, which are part of Intel MKL. To enable the functionality of oneDAL on Arm architecture, Fujitsu replaced the MKLFPK backend used on x86 with open-source optimized compute kernels from OpenBLAS. Additionally, compiler macros were introduced throughout the codebase to isolate x86-specific code and accommodate Arm-specific optimizations, supported by reference backends and updated makefile compiler options.

Fig 3: Fujitsu. “oneDAL Arm Porting Design and Methodology.” Linaro Connect 2024 Conference, https://resources.linaro.org/en/resource/jbEniWjE2s5gTopqUEjSTr . Accessed 2024

oneDAL’s machine learning acceleration capabilities, originally limited to Intel hardware, have been expanded to Arm platforms by substituting proprietary Intel Math Kernel Library (MKL) calls with open-source alternatives like OpenBLAS, leveraging NEON and SVE on Arm.

Performance Results

This research work on oneDAL porting enhancements on Arm demonstrates significant performance gains across various ML algorithms. Specifically, the training speedup of top two ML algorithms used by Fujitsu AutoML shows a remarkable improvement, with Random Forest achieving a speedup of x31 and Logistic Regression achieving a speedup of x40.

Fig 5: Fujitsu. “Performance Results on AWS Graviton3 Arm-Based CPU c7g.8xlarge 32-Cores.” Linaro Connect 2024, https://resources.linaro.org/en/resource/jbEniWjE2s5gTopqUEjSTr . Accessed 2024

oneDAL Multi Architecture Enablement

These contributions by FRIPL MONAKA Software R&D Unit Software Engineers Ajay Kumar Patel, Chandan Sharma, and Rakshith G B help in advancing AI performance and cross-platform compatibility on Arm architecture and resulted in more efficient development processes, improved performance, and enhanced platform versatility. This work represents a collaborative effort to push the boundaries of technology and enhance the capabilities of modern computing environments.

Left to Right: Rakshith G B, Chandan Sharma & Ajay Kumar Patel

Enable scikit-learn-intelex on Arm

Ajay Kumar Patel, Senior Software Engineer, says “I’ve had the opportunity to integrate scikit-learn-intelex on Arm CPUs with OpenBLAS reference backend, and the accelerated computation on Arm has significantly boosted performance for machine learning workloads” https://github.com/intel/scikit-learn-intelex/pull/1771

Enable Cross Compilation of Arm SVE on x86 Cl

Rakshith G B, Senior Software Engineer, shares “Adding Arm cross-compilation support for x86 CI systems has simplified integration of cross-compilers and Arm architectures making it easier to manage and deploy oneDAL lib across different platforms.” https://github.com/oneapi-src/oneDAL/pull/2691

oneDAL makefile refactoring for Arm

As per Chandan Sharma, Senior Software Engineer, “Working on makefile improvements in oneDAL have helped to streamline platform support and maintainability for multi-architecture builds, making development process smoother and more efficient.” https://github.com/oneapi-src/oneDAL/pull/2672

In future endeavors, our focus will be on optimizing AI algorithms and math kernels, refining block size optimization for Arm architectures, and expanding Bazel build support to more architectures. These efforts will ensure oneDAL expands multi-architecture support and our solutions remain at the forefront of computational efficiency in high-performance computing on Arm.

FUJITSU’s Unified Acceleration Collaboration

Fujitsu remains committed to fostering an ecosystem in partnership with the open-source community. Most recently, Fujitsu announced our leadership in new open-source projects in the area of Unified Acceleration, became an inaugural member of The Linux Foundation’s Unified Acceleration Foundation (UXL) on October 5, 2023.

Fig 6: Fujitsu’s Goals & focus areas for multi architecture unified acceleration ecosystem

Through our active participation in the UXL’s open-source community, we are enabling customers to operate various accelerators, including FUJITSU-MONAKA, using standard protocols. This initiative is aimed at expanding hardware options for our customers. UXL Foundation tries to simplify accelerator programming, uniting industry experts to drive innovation and implement cross-platform solutions.

Conclusion

Fujitsu’s dedication to open-source collaboration is exemplified through its leadership in software development for the FUJITSU-MONAKA processor. By porting oneDAL to Arm architecture, Fujitsu reinforces its commitment to open-source development, particularly in mission-critical systems.

Fig 7: Fujitsu. “Fujitsu’s key contributions to open-source community.” Linaro Connect 2024 Conference, https://resources.linaro.org/en/resource/jbEniWjE2s5gTopqUEjSTr. Accessed 2024

According to Fujitsu Global CTO Vivek Mahajan, ” Fujitsu is committed to fostering a sustainable world through innovation and trust-building. The company believes that open-source technologies are instrumental to this vision.” Fujitsu’s Open-Source Efforts, Contributing to a Sustainable World, December 2023, https://activate.fujitsu/en/insight/tl-mahajan-art-oss-20231205

The FUJITSU-MONAKA processor epitomizes Fujitsu’s commitment to open ecosystems and energy-efficient CPUs, advancing AI democratization. Fujitsu’s strategic focus on open collaboration extends to encompass numerous software stacks, UXL, Linaro, oneAPI, Arm, among others. This commitment drives innovation across a broad spectrum for societal benefit.

Fig 8: Fujitsu. “Software Ecosystem for AI and HPC with FUJITSU-MONAKA.” Linaro Connect 2024 Conference, https://resources.linaro.org/en/resource/jbEniWjE2s5gTopqUEjSTr. Accessed 2024

Through extensive collaboration with open-source communities, Fujitsu recognizes the vital role of open-source technology in addressing global challenges and promoting sustainable value. This vision is deeply rooted in Fujitsu’s ethos and serves as a driving force behind its continued efforts to contribute to the open-source ecosystem.

Acknowledgements

This presentation is based on results obtained from a project, JPNP21029 subsidized by the New Energy and Industrial Technology Development Organization (NEDO).

Comment from developers

FUJITSU-MONAKA Software R&D unit team members

Our project focuses on AI framework engineering and performance optimization for FUJITSU-MONAKA with cutting-edge AI technologies such as machine learning, deep learning, real-time data processing, cloud-based data security and natural language processing (NLP), including computer vision, generative AI, and large language models (LLMs). All our teams actively collaborate with the open-source software (OSS) community to drive innovation in advanced computing with HPC AI core technologies, delivering accelerated software for data-intensive workloads.

< Previous Post Next Post >