Vitis AI using Tensorflow and Keras Tutorial Part 8

Part 8: Compiling our CNN

We now have our complete model and must make it ready to be run on the FPGA. To do this, we must compile our model with the Vitis AI compiler which will convert and optimise our model into a format that is runnable on the DPU. This tutorial will focus on the compilation process.

Introduction
Getting Started
Transforming Kaggle Data and Convolutional Neural Networks (CNNs)
Training our Neural Network
Optimising our CNN
Converting and Freezing our CNN
Quantising our graph
Compiling our CNN (current)
Running our code on the DPU
Conclusion: Improving Convolutional Neural Networks: The weaknesses of the MNIST based datasets and tips for improving poor datasets
Conclusion Part 2: Sign Language Recognition: Hand Object detection using R-CNN and YOLO

The Sign Language MNIST Github

The Vitis AI compiler: VAI_C

The Vitis AI compiler or VAI_C works in a multi-stage process:

The compiler parses the quantised CNN model and produces a computation graph consisting of a data flow and a control flow
It will then optimise the data and control flow through processes such as fusing the batch normalization layers and exploiting data re-use
Finally it generates the code to be run. The model will be split into kernels, some of which will be run on the DPU whilst others run on the CPU

We can run VAI_C through the following command:

BOARD=ZCU104
ARCH=/opt/vitis_ai/compiler/arch/dpuv2/${BOARD}/${BOARD}.json

vai_c_tensorflow \
--frozen_pb=./quantize/deploy_model.pb \
--arch=${ARCH} \
--output_dir=launchmodel \
--net_name=SignLanguageMNISTnet \
--options "{'mode':'normal'}"

We need to configure the VAI_C by providing the link to our frozen and quantised model. We also provide the output directory and the name of our neural network. The –arch option supplies the specific configuration of the DPU Architecture. For edge devices we use the DPUCZDX8G (formerly DPUv2), which we will look at the architecture in more detail later on. The current arch option is deprecated and will automatically switch to the newest option, which Vitis AI provides us a warning about:

WARNING: arch/dpuv2/ZCU104/ZCU104.json is deprecated. Replacing with arch/DPUCZDX8G/ZCU104/arch.json

The –options parameter provides specific options for either edge or cloud flows of FPGAs. We can specify if we want our device to dump debug files or if we want to use a split IO memory model. Most of the time, the only thing we need to specify is if we want to run in ‘debug’ or ‘normal’ mode. In ‘debug’ mode, the nodes of the DPU are run one at a time so we can explore debugging or performance profiles of each node. In ‘normal’ mode the DPU runs without interruption and hence is the best for release models.

Once we run VAI_C, the compiler will produce a summary of the kernels it has deployed. In our case, the only node the DPU cannot run is the final softmax layer, so the compiler produces two kernels. One to run the majority of the model on the DPU and another to run the final softmax on the CPU:

[VAI_C][Warning] layer [activation_4_1_Softmax] (type: Softmax) is not supported in DPU, deploy it in CPU instead.

Kernel topology "SignLanguageMNISTnet_kernel_graph.jpg" for network "SignLanguageMNISTnet"
kernel list info for network "SignLanguageMNISTnet"
Kernel ID : Name
0 : SignLanguageMNISTnet_0
1 : SignLanguageMNISTnet_1

Kernel Name : SignLanguageMNISTnet_0
--------------------------------------------------------------------------------
Kernel Type : DPUKernel
Code Size : 0.01MB
Param Size : 1.56MB
Workload MACs : 10.02MOPS
IO Memory Space : 0.04MB
Mean Value : 0, 0, 0, 
Total Tensor Count : 9
Boundary Input Tensor(s) (H*W*C)
input_1_1:0(0) : 28*28*1

Boundary Output Tensor(s) (H*W*C)
dense_2_1_MatMul:0(0) : 1*1*25

Total Node Count : 8
Input Node(s) (H*W*C)
conv2d_1_1_convolution(0) : 28*28*1

Output Node(s) (H*W*C)
dense_2_1_MatMul(0) : 1*1*25


Kernel Name : SignLanguageMNISTnet_1
--------------------------------------------------------------------------------
Kernel Type : CPUKernel
Boundary Input Tensor(s) (H*W*C)
activation_4_1_Softmax:0(0) : 1*1*25

Boundary Output Tensor(s) (H*W*C)
activation_4_1_Softmax:0(0) : 1*1*25

Input Node(s) (H*W*C)
activation_4_1_Softmax : 1*1*25

Output Node(s) (H*W*C)
activation_4_1_Softmax : 1*1*25

We can take a look at the kernel graph by converting the outputted .gv into a .png file:

dot -Tpng -o launchmodel/SignLanguageMNISTnet_kernel_graph.png launchmodel/SignLanguageMNISTnet_kernel_graph.gv

which should look like:

Each kernel summary will provide a summary of what nodes of our model are handled by that kernel. It will specify the boundary input and output tensors as well as the total node count in between the boundaries. It will also provide the size of the workload and memory usage. Specifics on all the information presented can be found here.

The model is then compiled into an .elf held in the launchmodel folder, which will contain all the information we need to run our model on our FPGA board. Next time we will take a look at running our kernel on the FPGA itself.