Comparative Analysis of Intel HLS Design Tools on a Case Study in Neuromorphic
Academic computing clusters and cloud-based systems, such as Amazon Web Services and Google Cloud, have been integrating high-end FPGAs for high-performance computing (HPC) into their ecosystems, increasing the availability of FPGAs to a broader community. On these platforms, high-level synthesis (HLS) tools are featured to enable developers to describe FPGA designs using familiar, high-level languages such as C/C++. As HLS tools continue to mature, it is critical to understand their capabilities to produce efficient FPGA designs. One key domain of interest is state-of-the-art algorithms for machine learning (ML), such as convolutional neural networks (CNNs), which are expensive in terms of memory and compute resources required for high-accuracy classification. By contrast, neuromorphic object-classification algorithms have lower memory and compute complexities than CNNs at similar accuracies, which can improve the scalability of ML apps. This research explores and evaluates the efficacy of HLS design tools from Intel (OpenCL SDK for FPGAs and oneAPI DPC++ for FPGAs) in terms of design latency and hardware resources on a case study featuring a novel, neuromorphic ML algorithm. Evaluated on both Intel Stratix 10 and Arria 10 Programmable Acceleration Cards, oneAPI-based designs were on average 10% lower latency while using significantly less available FPGA-board resources enabling faster, more scalable designs.
 
                                
