Heterogeneous Computing on Cloud FPGAs

Course Overview

The course introduces the latest research and development in heterogeneous computer systems on the cloud. This course emphasizes the need for specialization and domain-specific computing platforms on the cloud and expands the principles of computer architecture to cover a more extensive system with a combination of heterogeneous processing elements. In particular, this course focuses on empowering FPGAs for cloud computing with a focus on Xilinx cloud FPGAs solution. In this context, the course will cover fundamental differences between the general-purpose latency-optimized processors (e.g., ARM and RISC-V), throughput-optimized processors (as in NVIDIA and AMD GPUs), and reconfigurable fabrics (e.g., Xilinx FPGAs) for application/domain specific specialization. The lectures also discuss performance models and emerging architecture models for hardware accelerators. Then, the course provides an in-depth understanding of FPGAs programming models, OpenCL-HLS tools, SDX, and Xilinx ML design flow on AWS.

On the programming side, this course primarily focuses on OpenCL as the major unified programming model across heterogeneous devices (Multi-Core, GPUs, FPGAs). The course also focuses on OpenCL-HLS design flow. and fundamental differences programming and architectural differences between FPGAs vs GPUs when running massively parallel applications. In addition, the course provides an introduction to other heterogeneous programming models such as CUDA and OpenACC.

In a broader perspective, the course also covers system-level challenges of heterogeneous systems including communication fabric, unified memory, cache coherency, and memory virtualization.

Textbook

The course is highly research-intensive with the aim to expose the students to the latest technology and architectural research in heterogeneous computing systems. There is no specific textbook associated with this course. Throughout the semester, the students will continuously review recently published papers, Xilinx technical documents, manuals, and tutorials, as well as Xilinx public GitHub repositories:

Topics

Application trends, emerging machine learning and deep neural network algorithms, need for hardware acceleration and cloud computing
Overview of heterogeneous architectures for cloud platform (trends, opportunity, and challenges)
GPUs and hardware architecture for massive thread-level parallelism
Introduction to FPGA Reconfigurable fabrics and the need for domain/application-specific hardware accelerators
Benefits of FPGAs over ASIC for adaptive computing (FPGA vs ASIC trade-off)
Performance models, and emerging architecture models for accelerators
Open Computing Language (OpenCL) – detailed presentation on OpenCL.v2
OpenCL for FPGAs and OpenCL High-level-Synthesis (OpenCL-HLS)
Overview of CUDA and OpenACC
Xilinx OpenCL-HLS
Understanding memory coalescing in FPGAs and GPUs
OpenCL source-level optimization for FPGAs with focus on memory optimization and compute unit parallelism
Overview of Xilinx ML stack for development, training, and synthesis of Neural Network Model.
Deep Learning Systolic Array Accelerators, with focus on Xilinx XDNN hardware architecture
Overview of XfDNN Compiler, XfDNN Quantizer, XfDNN APIs.

Outcomes:

By end of this semester, the students should demonstrate the following competencies:

Basic understanding of machine learning and deep learning algorithms
Ability to understand the principles of heterogeneous computer systems, including GPUs, FPGAs, and Systolic Arrays
In-Depth benefits of FPGAs for adaptive high-performance computing
In-depth knowledge of OpenCL programming and OpenCL-HLS for FPGAs
Knowledge of CUDA and OpenACC programming
Comprehensive understanding of Xilinx Amazon AWS solution including Xilinx ML stack and hardware accelerators for machine learning applications