Image for post
Image for post

What does your typical work day as a (software) developer look like?

You probably spend some time writing actual code, a lot of time integrating it with existing systems and finally spend weeks delivering it to production.

In this short post I want to emphasize the importance of unified tooling.

By “tooling” I mean everything — from debuggers to monitoring systems.

By “unified” I mean that the tooling stack is explicitly defined i.e. set as a “standard” across the company.

The working process without “unified tooling” is deplorable. At best you’ll have a long feedback loop and will not be able to scale development of your product, at worst your product quality will suffer. From my professional experience, if you are trying to build a scalable product, you definitely need a unified tooling. …

In this blog post i want to share a quick way (one command) of recompiling Linux kernel with PREEMPT_RT patch for NVIDIA Jetson AGX Xavier.

Supported version: Jetpack 4.2.1 (L4T 32.2.1), kernel 4.9.140

Supported hw: Jetson AGX Xavier

New platforms (Nano, TX2, etc.), new versions are welcomed here.

  • Cross-compilation will happen inside docker on your laptop.
  • Kernel update do not require full re-flashing, so Jetson filesystem will be preserved (kittens are safe). Done with Nvidia OTA update service.

Short version

To build RT kernel for Xavier execute following commands on your x86_64 laptop

git clone
cd xavier-base-docker-images/realtime-kernel
docker build -t xavier-rt-kernel:32.2.1 -f Dockerfile.l4t_32_2_1 …

Image for post
Image for post

Part 1:

In part 1 i’ve described how to convert neural network with supported layers to TensorRT plan. In this part i’ll try to describe how to create a custom layer for TensorRT. Example will be a “l2norm_helper” plugin that i created to support TensorFlow l2_normalize operation.

Source code:

TensorRT plugin requires two major parts to be implemented:

  1. CUDA kernels (aka device code)
  2. TensorRT classes: IPluginV2 and IPluginCreator

Why l2_normalize is not supported by TensorRT?

This is a reasonable question. First let’s check what l2_normalize is in TensorFlow.

It consists bunch of operations actually (not only one). And two throwing errors in TensorRT (checked with 5.0.6). …

Image for post
Image for post
Every 3D bounding box estimation on image above took only 6 milliseconds.

What is TensorRT?

TensorRT is a framework from NVIDIA that allows significantly speed-up inference performance of neural network. TensorRT does this by fusing multiple layers together and selecting optimized (cuda) kernels. In addition, lower-precision data type can be used (e.g. float16 or int8). In the end, up to 4x-5x performance boost can be achieved, which is critical for real-time applications.

TensorRT usually comes together with JetPack (NVIDIA’s software for Jetson series embedded devices). For non-Jetpack installations checkout TensorRT installation guide.

DISCLAIMER: This post describes specific pitfalls and advanced example with custom layer. For “out-of-the-box” TensorRT examples please check out tf_to_trt_image_classification repo. …

Image for post
Image for post

This post describes mathematics behind TensorFlow’s tutorial example “Partial differential equations”.

Please find a Jupyter notebook with blog post under the link below.


Roman Sokolkov

Systems Software Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store