Converting Neural Network To TensorRT . Part 2 Creating a Custom Layer.
In part 1 i’ve described how to convert neural network with supported layers to TensorRT plan. In this part i’ll try to describe how to create a custom layer for TensorRT. Example will be a “l2norm_helper” plugin that i created to support TensorFlow l2_normalize operation.
Source code: https://github.com/r7vme/tensorrt_l2norm_helper
TensorRT plugin requires two major parts to be implemented:
Why l2_normalize is not supported by TensorRT?
This is a reasonable question. First let’s check what l2_normalize is in TensorFlow.
It consists bunch of operations actually (not only one). And two throwing errors in TensorRT (checked with 5.0.6). Seems these two do not pass some internal TensorRT restrictions (related thread).
l2_normalize/Maximum: Unsupported binary op max with constant right
l2_normalize/Rsqrt: Unary not supported for other non-constant node
Let’s start with workhorse (CUDA kernels). If you’re new to CUDA, highly recommend this presentation from NVIDIA. Rsqrt kernel does “1/sqrt(x)” for every member of input vector and assigns value to output vector. Easy.
Actually initially i though that i’ll be implementing whole l2_normalize, but after realizing what it costs to just implement “simple” reduce_sum, i fell back to solution with rsqrt and max. Just check out awesome presentation from Mark Harris about parallel reduction.
Ok, continue, second kernel for max. Compare vector member and compare with “eps” (really small number that prevents division by zero) and take bigger one.
Actually implementing this kernel can be avoided by replacing “math_ops.maximum” with “math_ops.add”, but this require reimplementing l2_normalize in your network definition. That’s it, just two CUDA kernels.
Implementing IPluginV2 and IPluginCreator
In general, you have to just implement two classes IPluginV2 and IPluginCreator, easy! This even can be single .cpp file like in “samplePLugin”.
NOTE: for 5.1 you have to go with IPluginV2Ext, but in 5.0 it’s not supported yet.
NOTE: In older plugin versions, you have to implement IPluginFactory, in newer (from 5.0?) it’s not necessary, but IPluginCreator have to be implemented instead.
Before implementing your own plugin, i recommend to start with official docs (which contain many important details). Good starting points are official normalizePlugin from TensorRT repo (thanks for making it opensource, NVIDIA). Or under “samples” there are bunch trivial ones (e.g. samplePlugin is just one file). Below i’ll highlight most important details that took me a time to realize on my own.
Plugin input can consist:
- custom parameters (in my case “op_type” and “eps”)
- input vector dimensions C, H, W
- weights (no weights in my case)
At runtime, input values read from serialized buffer (separate constructor required)
here are useful read/write snippets for Serialization/Deserialization from nvidia samples.
Execution of CUDA kernels should be called from enqueue
Input is always a 4D matrix (in NCHW dimension order) (as far as i can tell). Even if in TF you had 3D matrix (e.g. shape (?,6,2)) in TRT it will be 3D matrix (?,6,2,1). In my case, batch members processed sequentially (TODO: this is actually can easily be done in parallel).
IPluginCreator stays mostly intact. Main function is createPlugin. It gets values from “PluginFieldCollection” and then creates new instance of a plugin.
Plugin registration happens automatically by adding special line in header.
After this done, application can use plugin by simply including header file.
Cmake from 3.8 has built-in support for CUDA. The only problem i was not able to solve is autodetection of gpu compute capabilities (cmake 3.10.2). Hard-coded value for Xavier (in fp16 branch) (Help appreciated).
Last, but not least FP16 (half-precision) support. I implemented all pieces that are necessary imo (branch f16support), but still i see that before and after it adds reformat layers. Is it even possible to use FP16 in custom layer?
Adding reformat layer: leaky_re_lu_3/LeakyRelu output to be reformatted 0 (leaky_re_lu_3/LeakyRelu) from Half(1,1,1,12) to Float(1,1,1,12)
Adding reformat layer: orientation/l2_normalize output to be reformatted 0 (orientation/l2_normalize) from Float(1,1,2,12) to Half(1,1,2,12)
This was my first own TensorRT plugin, any feedback appreciated.