Comparative Analysis of YOLO, Faster R-CNN, RetinaNet,and DETR for Autonomous Vehicle Object Detection

Main Article Content

Kandlakunta Sumana Mounya

Abstract

Object detection is a cornerstone of autonomous vehicle (AV) perception, enabling
the identification of vehicles, pedestrians, traffic signs, and other road objects in real
time. This paper presents a comprehensive literature review comparing four leading object
detection architectures—YOLO (You Only Look Once), Faster R-CNN (with Feature
Pyramid Networks), RetinaNet (with Bidirectional FPN enhancements), and the Detection
Transformer (DETR)—with a focus on their application in autonomous driving systems.
We examine their architectures, training methodologies, inference speeds, detection accuracy,
and suitability for deployment under stringent AV constraints. Challenges such as
multi-scale detection, occlusions, class imbalance, and adverse environmental conditions
are analyzed using results from domain-specific benchmarks (KITTI, BDD100K, nuScenes).
Our findings indicate that one-stage detectors (YOLO, RetinaNet) generally achieve higher
frame rates suitable for on-board inference, while two-stage detectors (Faster R-CNN) often
offer superior accuracy at the cost of speed. Transformer-based DETR introduces a
new paradigm with fewer heuristics and a streamlined pipeline, though requiring specialized
improvements for small-object detection and efficient training. We conclude with future
research directions, including convergence between convolutional and transformer architectures,
multi-modal sensor fusion, and efficiency optimizations to meet AV safety and latency
requirements.

Downloads

Download data is not yet available.

Article Details

How to Cite
Kandlakunta Sumana Mounya. (2025). Comparative Analysis of YOLO, Faster R-CNN, RetinaNet,and DETR for Autonomous Vehicle Object Detection. Doupe Journal of Top Trending Technologies, 1(2). Retrieved from https://www.doupe.in/index.php/ttt/article/view/12
Section
Articles