Everything you need to know about oneAPI DevSummit 2023

We have exciting updates from the recent oneAPI DevSummit 2023, a virtual community conference focused on technical talks and tutorials that highlight the capabilities of oneAPI tools to motivate you to start building your own oneAPI projects within the community.

The event offered attendees a rich experience filled with insightful discussions on various topics. These included CUDA to SYCL migration experiences, emphasizing the smooth transition between the two frameworks. Attendees also learned about the vendor portability advantages offered by SYCL and oneAPI across different computing architectures such as CPU, GPU, and FPGAs. Demonstrations showcased the powerful capabilities of SYCL, including application development using SYCL, oneAPI, and oneAPI libraries. Perspectives from both enterprise and academia shed light on the capabilities of oneAPI and SYCL, and the event also featured live tutorials that actively engaged the community and provided real-world experience in application development using oneAPI.

oneAPI State of the Union with Tony Mongkolsmai

This exciting talk led by Tony Mongkolsmai and other notable speakers kicked off the event. Mongkolsmai, an Intel software architect and evangelist, delved into the significance of oneAPI, its evolution, and its potential for the future.

The oneAPI ecosystem has rapidly gained momentum since its inception. Leading technology companies, including Google, Microsoft, Accenture, Fujitsu, Adobe, and Anaconda, have embraced oneAPI’s openness and incorporated it into their workflows. Esteemed academic institutions like Berkeley, the University of Illinois, and the KTH Royal Institute of Technology, as well as renowned national labs such as Oakridge, Lawrence Livermore, Sandia, and Argonne, are leveraging oneAPI to tackle complex problems and enhance performance in their computational workloads.

The synergy between software and hardware companies is crucial for optimizing performance and achieving desired outcomes. Hardware companies recognize the value of oneAPI in creating a cohesive software-hardware development stack. The support for oneAPI extends beyond software firms, with hardware companies like RISC-V, a leading open standard CPU architecture, embracing the open stack approach offered by oneAPI. This collaboration between different entities drives the advancement of accelerated computing.

Beyond theoretical discussions, the State of the Union talk highlighted practical examples where oneAPI is making a significant impact. Organizations worldwide are leveraging oneAPI to solve real-world problems, enhance performance, and achieve scalability. Notable applications include GROMACS, a molecular dynamics software used in various computational fields, and the integration of oneAPI in supercomputers like Aurora and Frontier, which are pushing the boundaries of high-performance computing.

oneAPI is also making waves in the content creation industry, benefiting popular software suites that have seamlessly integrated oneAPI to improve their capabilities, leading to more realistic lighting effects and better rendering performance. For example, Blender’s Cycles renderer integration with oneAPI now allows millions of users to enhance their content creation workflows using oneAPI-enabled hardware. Similarly, Chaos V-Ray, used in content creation suites like DS Max, Maya, Houdini, and Cinema D, harnesses the power of oneAPI to achieve these impressive lighting effects. DreamWorks Animation has also embraced oneAPI by integrating it into their production renderer, Moonray, resulting in visually captivating movies that we love to watch.

Artificial Intelligence (AI) has swiftly moved into the mainstream, transforming how we work and interact with technology. Large language models like GPT-3, with its language generation abilities, have captured widespread attention. Companies like Microsoft are integrating AI features into products like Office and GitHub Copilot, allowing users to benefit from intelligent assistance. Developers also have access to AI frameworks like Tensorflow and PyTorch, which have seamlessly integrated oneAPI for optimized deep learning capabilities. This integration empowers developers to leverage the full potential of AI technologies in their projects.

oneAPI seamlessly integrates with popular deep learning frameworks like TensorFlow and PyTorch. The oneAPI Deep Neural Network (oneDNN) library is integrated into TensorFlow’s core codebase, providing enhanced performance for deep learning tasks. Similarly, PyTorch users can leverage oneAPI and oneDNN to accelerate their workflows. With oneAPI’s compatibility with different hardware architectures, including GPUs and CPUs, developers can enjoy optimized performance across a range of devices.

The impact of oneAPI extends far and wide, reaching beyond big industry players. It has become a popular choice among students, innovators, and startups who recognize its potential. The oneAPI Student Ambassadors program provides support to university students, enabling them to harness the power of oneAPI for their projects. For instance, Joshua Shiells and Harvey Johnson optimized their AI chatbot’s data pipeline using oneAPI, while Melbin Martin utilized it to classify waste materials for recycling efforts. Furthermore, the Liftoff for Startups program assists early-stage companies in leveraging oneAPI to accelerate their solutions and enter the market more quickly. To encourage collaboration and knowledge sharing, the oneAPI community provides valuable resources for developers to build their own oneAPI solutions. The oneAPI Innovator Program specifically supports developers working on groundbreaking applications. Additionally, the Liftoff for Startups program aids early-round startups in utilizing oneAPI to expedite the launch of their solutions into the market.

The Ginkgo Project, led by Professor Hartwig Anzt from the University of Tennessee, exemplifies how oneAPI empowers developers to achieve outstanding results. Ginkgo is a high-performance math library used in simulations, and it has successfully been ported to the oneAPI ecosystem. Through hand tuning and leveraging specialized hardware functionality, the Ginkgo Project attained impressive performance gains, enabling OpenFOAM simulations to run significantly faster on Intel GPUs.

One of the standout features of oneAPI is its ability to efficiently scale GPUs, allowing accelerated workloads to be executed on a large scale. This means that developers can leverage oneAPI to run their programs across different hardware architectures, achieving impressive performance gains. The ease and speed with which the code was migrated from CUDA to oneAPI further emphasizes the advantages of this platform.

BeeSearch, a collaboration between Beewant and Weaviate, showcases how oneAPI can be used to build AI solutions that make unstructured data useful and searchable. Beewant, based in Paris, specializes in enabling large-scale searches across vast volumes of images, videos, and documents. They utilize Weaviate, an open-source vector database designed for machine learning, to store embeddings and perform lightning-fast queries. By leveraging oneAPI, Beewant and Weaviate were able to access the required hardware power and acceleration for processing their data at scale. Interestingly, their image processing time was dramatically reduced, resulting in significant performance improvements and faster data loading.

As the conversation concluded, the focus shifted to the oneAPI initiative, which offers a platform for developers to create innovative solutions. To showcase the diverse possibilities of oneAPI, a community-driven oneAPI repository on GitHub has been established, along with a curated list of awesome oneAPI and SYCL projects for solutions across industry and community. This repository features projects from various domains, spanning AI, energy, gaming, and manufacturing. Whether you’re a gamer, entrepreneur, or working on cutting-edge AI, this repository is a valuable resource to showcase your own oneAPI-based projects. By submitting your project here, you can share your achievements with the community and inspire others to build amazing things with oneAPI. Don’t forget to use the hashtag #checkitout when sharing your project with others.

Driving oneAPI Presence in Accelerated Computing

In a panel discussion, Intel engineer James Reinders led a talk on accelerated computing, focusing on performance portability, the role of libraries, and the relationship between SYCL and C++. The panelists, which included developers, data scientists, and academics, emphasized the significance of performance portability for scientific computing, big data analytics, and supercomputing.  SYCL and oneAPI were recognized as solutions to achieve portability and efficient execution across diverse devices. The discussion highlighted the importance of libraries in facilitating interoperability and collaboration.  SYCL’s agility and adaptability compared to C++ were discussed, along with its potential to address hardware-specific features. The panelists expressed optimism about SYCL’s future and its community-driven development.

Python’s prevalence in AI applications running on GPUs was noted as well, and the vision of oneAPI was discussed, aiming to provide multiple entry points for different applications and levels of abstraction. Integrating different languages with oneAPI was debated, with one perspective suggesting closer integration with Python. Interoperability challenges were acknowledged, and projects like Raja and Kokkos kernels were mentioned as solutions. The panelists agreed that no single language suits all applications, but they desire seamless integration of higher-level scripting languages with hardware acceleration.

The challenges of working with CUDA were discussed, with notable performance gains, but concerns about documentation and ease of use were expressed.  SYCL was praised for delivering significant performance compared to optimized CUDA code, especially with NVIDIA devices, though issues with Intel GPUs were mentioned. The advantages of using SYCL for performance portability were highlighted, offering close-to-optimal performance across hardware platforms without rewriting code.

The need for portable programming languages and frameworks to simplify accelerator utilization across different hardware architectures was stressed.  SYCL’s ability to run on AMD, NVIDIA, and Intel GPUs was discussed, but concerns were raised about the support for certain hardware platforms like Intel’s AI tile. Vendor support was deemed crucial for hardware accessibility and the adoption of simplified programming models.

Target NVIDIA and AMD with oneAPI and SYCL

 Coldplay Software is dedicated to making AI and high-performance computing accessible through open standards and open-source software. In collaboration with Intel, they advance physics in oneAPI, supporting NVIDIA and AMD hardware. Rafael Bielski, a particle physics researcher, presented Coldplay’s work on bringing NVIDIA and GPU support to oneAPI with SYCL.

Coldplay emphasizes open standards for software projects, providing scalability and competitive hardware prices. They leverage C++ and SYCL, an open standard programming model for high-performance computing, enabling parallel execution without separate code files or special instructions. Coldplay’s plugins for oneAPI with SYCL offer code portability, comparable performance, and open-source availability. Developers can compile a single C source code using the oneAPI toolkit compiler and execute it on NVIDIA and AMD GPUs without sacrificing performance. The plugins are open source, customizable, and compatible with professional and consumer GPUs.

During a live demo, Bielski guided viewers in using Coldplay’s plugins on multiple architectures, covering prerequisites, GPU availability, installation, and software optimization. He showcased single target and multi-target compilation methods, enabling execution on different hardware devices. Debugging and profiling tools like CUDA GDB and GDP oneAPI aid in identifying and optimizing code performance. The process includes navigating timelines, GPU summaries, and using NVIDIA Insight for metrics. Optimization techniques such as work group size, indexing, inlining, thread divergence avoidance, and memory access patterns were also discussed. A live demo of code optimization and profiling using a crowd simulation example concluded the tutorial.

Overview of oneAPI ISC DevSummit 2023

In a featured tech talk, Ph.D. Associate Professor Istvan Zoltan Reguly delved into techniques for formulating structured and unstructured mesh computations within the SYCL programming model. The presentation covered multidimensional parallel_for computations, race conflict mitigation methods for unstructured meshes (atomics and coloring techniques), and compared them to other parallelization approaches like OpenMP and CUDA. Evaluations of modern parallel architectures from Intel, AMD, and NVIDIA, including CPUs and GPUs, were conducted using the DPC++ compiler and hipSYCL. These assessments provided insights into performance, efficiency, and scalability for informed hardware and software tool selection, optimizing stencil computations in diverse contexts. The aim was to empower developers and researchers with comprehensive formulation options and practical performance insights.

Ph.D. student James Bickerstaff gave a talk on accelerating graph analytics with oneAPI and Intel FPGAs. To improve analytical throughput and handle larger data sizes, specialized techniques and hardware are needed for accelerated graph processing. Utilizing the oneAPI toolkit for FPGAs, Bickerstaff demonstrated how he and others developed accelerators for minimum-spanning-tree (MST) and breadth-first search (BFS). Their results demonstrated BFS performance of up to 75 million traversed edges per second, achieving a remarkable 3.0× speedup over the Intel Xeon 6128 CPU baseline while requiring 5.85× fewer lines of code. The MST designs also exhibited speedups of approximately 1.5× compared to the CPU baseline.

Another Ph.D. student, Zhibo Li, presented on declarative data collections for portable parallel performance based on oneAPI. A novel approach to data collections is offered through Collection Skeletons, which employ properties to provide a declarative and parallel implementation for app developers. By encapsulating implicit parallelism and offering explicit properties matching parallel algorithmic skeletons, developers can abstract away parallel implementation details. Implicit parallelism is realized through concurrent data structures with member functions based on oneTBB, while explicit parallelism is achieved through parallel algorithmic skeletons based on SYCL.

Master of Computer Applications (MCA) student Melbin Martin gave a talk on smart garbage classification using oneDNN for improved efficiency and sustainability. The application of machine learning algorithms to smart garbage classification for recycling plays a vital role in accurately categorizing waste materials such as plastic, paper, and metal. By leveraging oneDNN, recycling efficiency and accuracy are significantly improved. oneDNN optimizes deep learning operations, resulting in faster execution times and superior performance on modern CPUs. This optimization is critical in maximizing the recovery of recyclable materials, minimizing human error, and enhancing overall accuracy. The implications of oneDNN extend to promoting a sustainable future through enhanced recycling processes.

Aksel Alpay, a researcher and software engineer, led a presentation entitled, “HipSYCL’s Quest for Universal SYCL Binaries: One Binary from NVIDIA and AMD to Intel Data Center GPU Max Series.” Ensuring compatibility across different hardware configurations when distributing SYCL binaries to end users can be challenging. Current SYCL implementations like DPC++ or hipSYCL require multipass compilation, parsing the code separately for each targeted backend and potentially per target device in the case of AMD hardware. This process can lead to impractical compile times. To address this, Alpay discussed hipSYCL’s new generic single-pass compiler that generates universal binaries while parsing the source code only once. This compiler offers comparable runtime performance to existing compilers but significantly outperforms them in terms of compile time. It provides instant binary portability across NVIDIA, AMD, and Intel GPUs, including the Intel Data Center GPU Max Series. He also showcased the first performance numbers of hipSYCL on the Intel Data Center GPU Max Series, demonstrating its capabilities.

The latest on the oneAPI industry spec: Direct programming and API-based programming

On June 5th, Codeplay CEO Andrew Richards unveiled his company’s newest addition to the ever-expanding oneAPI ecosystem—a groundbreaking open-source project that enables SYCL code to run seamlessly on custom architectures dedicated to high-performance computing and AI. Richards introduced the oneAPI Construction Kit, which includes a reference implementation tailored for RISC-V® vector processors. However, the versatility of this kit extends far beyond RISC-V®, as it can be adapted to support an array of processors. This remarkable development marks another significant milestone in Codeplay’s relentless pursuit of building a robust and inclusive open ecosystem for accelerated computing.

As software continues to evolve rapidly, hardware vendors are increasingly developing specialized AI processors that offer superior performance compared to off-the-shelf hardware solutions. While these custom processors deliver enhanced efficiency, they present challenges for developers. One of the primary challenges is ensuring compatibility between the latest software and the latest generation of processors. This often involves complex software porting efforts to proprietary and non-standard programming models, placing an additional burden on customers. They are then required to invest significant time in optimizing and porting their applications, leading to delays and increased maintenance efforts.

 

The oneAPI Construction Kit tackles these challenges head-on by bringing the benefits of simplified heterogeneous programming to custom hardware. It extends the capabilities of oneAPI to encompass custom architectures and provides seamless access to a rich selection of supported SYCL libraries. This has tangible advantages for developers who can leverage SYCL to write high-performance applications efficiently, instead of having to learn a new custom language for each specific hardware platform. As a result, developers can reduce their porting efforts and eliminate the need to maintain separate codebases for different architectures, allowing them to focus more on innovation and driving progress.

Codeplay has already demonstrated the power of the oneAPI Construction Kit by showcasing a comprehensive software programming environment for next-generation RISC-V vector processors using oneAPI and SYCL. In a short video clip, CEO and co-founder Andrew Richards illustrated this capability by running code on both an Intel CPU and a custom FPGA simulating a RISC-V accelerator. The oneAPI Construction Kit, which is open-source, includes a framework that simplifies code transitions, making it even easier for developers to embrace custom architectures and unlock their full potential.

Watch the entire oneAPI DevSummit 2023 on demand to enhance your knowledge and skills in creating amazing applications and solving complex problems without worrying too much about the technical details of a wide range of devices. 

Go to oneAPI.io to learn more about how oneAPI streamlines the development process to help you create efficient and optimized software that can leverage the full potential of diverse hardware, which leads to improved performance and productivity, so you can focus more on your application’s functionality and less on the intricacies of specific hardware architectures. 

Check next oneAPI events, and stay updated by joining the oneAPI Community on LinkedIn.

×


Watch the oneAPI DevSummit hosted by UXL:

Watch Now