Skip to content

farouk09/Distributed-computing-for-lane-detection-models

Repository files navigation

Distributed-computing-for-lane-detection-models

Distributed inference between edge devices for lane detection models.

Framework used

PytorchAutoDrive is a pure Python framework includes semantic segmentation models, lane detection models based on PyTorch. Here we provide full stack supports from research (model training, testing, fair benchmarking by simply writing configs) to application (visualization, model deployment).

Paper: Rethinking Efficient Lane Detection via Curve Modeling (CVPR 2022)

Poster: PytorchAutoDrive: Toolkit & Fair Benchmark for Autonomous Driving Research (PyTorch Developer Day 2021)

This repository is under active development, results with models uploaded are stable. For legacy code users, please check deprecations for changes.

A demo video from ERFNet:

demo_3.0.mp4

Highlights

Various methods on a wide range of backbones, config based implementations, modulated and easily understood codes, image/keypoint loading, transformations and visualizations, mixed precision training, tensorboard logging and deployment support with ONNX and TensorRT.

Models from this repo are faster to train (single card trainable) and often have better performance than other implementations, see wiki for reasons and technical specification of models.

Supported datasets:

Task Dataset
semantic segmentation PASCAL VOC 2012
semantic segmentation Cityscapes
semantic segmentation GTAV*
semantic segmentation SYNTHIA*
lane detection CULane
lane detection TuSimple
lane detection LLAMAS
lane detection BDD100K (In progress)

* The UDA baseline setup, with Cityscapes val set as validation.

Supported models:

Task Backbone Model/Method
semantic segmentation ResNet-101 FCN
semantic segmentation ResNet-101 DeeplabV2
semantic segmentation ResNet-101 DeeplabV3
semantic segmentation - ENet
semantic segmentation - ERFNet
lane detection ENet, ERFNet, VGG16, ResNets (18, 34, 50, 101), MobileNets (V2, V3-Large), RepVGGs (A0, A1, B0, B1g2, B2), Swin (Tiny) Baseline
lane detection ERFNet, VGG16, ResNets (18, 34, 50, 101), RepVGGs (A1) SCNN
lane detection ResNets (18, 34, 50, 101), MobileNets (V2, V3-Large) RESA
lane detection ERFNet, ENet SAD (Postponed)
lane detection ERFNet PRNet (In progress)
lane detection ResNets (18, 34, 50, 101), ResNet18-reduced LSTR
lane detection ResNets (18, 34) BézierLaneNet

Testing models on Jetson TX2 and Jetson AGX Xavier

image

collaborative inference method applied on road markings detection models

Our design explores the trade-off between latency and power consumption, while taking into account different user requirements on latency and dynamic network environments. End-to-end latency includes task execution, data transmission, and serialization (the process of converting data to be transmitted to a byte stream). We now formally formulate the task sharing problem. Let us define 𝑖 ∈ {0, 1, ... , 𝑛 + 1}, where 𝑖 is the partition point 𝑖𝑡ℎ in our model and 𝑛 is the number of layers in an N-layer deep neural network (DNN). The corresponding partition point corresponding to it is shown in the Figure below

image

for each DNN model and for each hardware configuration (CPU and GPU clock frequency number of cores, number of Iot! devices, etc.) we calculate the execution time and and the transmission time of each layer of the DNN in order to find the best partitioning point partitioning point that satisfies the energy consumption criteria.

image

For the CNN inference, we used two embedded computing devices with GPUs that Nvidia recently started selling. In terms of computational and storage capabilities as well as GPU micro-architecture, these two platforms are completely different. These differences allowed us to test the portability of our approach. We give in the table below, the main characteristics of these two platforms :

image

Result of the collaborative inference between the Jetson TX2 and the Jetson AGX

For each hardware configuration, we run the partitioning algorithm before the execution of the pre-trained models in order to find the optimal partitioning point, so one part will be executed on the Jetson TX2 and the other will be executed on the Jetson AGX whose goal is to optimize the model latency and the power consumption as shown on the figure below.

image

The latency of the models before and after the collaborative inference is calculated as shown in the figure below. The orange bars show the latency before the execution of the collaborative inference algorithm between the Jetson TX2 and the AGX Xavier and the blue bars show the results of the latency optimization. In some scenarios, we were able to accelerate the latency by a factor of 5 (see SCNN modules)

image

About

Distributed inference between edge devices for lane detection models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published