Estudio de sistemas basados en cámara para la detección tridimensional de objetos en conducción autónoma

Antunes Garcia, Miguel

Show full item record

Date

2023-09-15

Affiliation

Universidad de Alcalá

Bibliographic citation

Antunes García, Miguel. Estudio de sistemas basados en cámara para la detección tridimensional de objetos en conducción autónoma. Trabajo Fin de Máster. Universidad de Alcalá, 2023.

Keywords

Deep learning

Transfer learning

Detección de objetos 3D

KITTI

CARLA

3D object detection

Document type

info:eu-repo/semantics/masterThesis

Version

info:eu-repo/semantics/acceptedVersion

Rights

Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Access rights

info:eu-repo/semantics/openAccess

Abstract

La percepción del entorno es un componente esencial en el funcionamiento de sistemas autó nomos. Este rol se vuelve especialmente crítico en el caso de vehículos autónomos, donde el entendimiento del entorno se enfrenta a escenarios altamente complejos y velocidades variables y elevadas. En este contexto, la cámara se ha consolidado como uno de los sensores más amplia mente utilizados, gracias a su costo asequible y facilidad de implementación. Adicionalmente, los avances significativos en técnicas de reconocimiento y detección en los últimos años han elevado su potencial de manera notable. El objetivo principal de este trabajo de fin de máster es el análisis del estado del arte de técnicas que permitan obtener detecciones de objetos en el entorno tridimensional de un vehículo, utilizando la información procedente únicamente de las cámaras. Concretamente se cuantifican las diferencias entre dos tipos de sistemas de percepción: por un lado, los sistemas end-to-end monoculares, que obtienen las detecciones de objetos directamente en una única etapa a partir de una imagen de la cámara; y por otro lado, los sistemas modulares estéreo, que requieren múltiples etapas para obtener las detecciones. En este último enfoque, se utiliza una primera etapa para estimar la profundidad en la imagen antes de realizar las detecciones de los objetos, lo cual es común en el SOTA. Tras una identificación inicial de las principales técnicas, se procede a seleccionar varios sistemas, que destacan por sus resultados sobresalientes o por su eficiencia en términos de velocidad de procesamiento, y a someterlos a varios procesos de entrenamiento y evaluación. Los datos utilizados para los experimentos proceden del dataset KITTI [1], sin embargo, debido a que este trabajo se enmarca en un contexto de una arquitectura completa de conducción autónoma, en la que habitualmente se llevan a cabo pruebas en entornos simulados, se utilizan también datos del simulador CARLA [2] recopilados en el dataset SHIFT [3] para cuantificar la capacidad de generalización de los sistemas en estos dos entornos. El objetivo final es estudiar las capacidades y limitaciones de estas técnicas, realizando una evaluación justa que permita elegir la mejor implementación para ser utilizada sobre un vehículo autónomo real y comprobar la capacidad de generalizar el conocimiento tanto en entornos reales como simulados.

Perceiving the environment is an essential component in the operation of autonomous systems. This role becomes especially critical in the case of autonomous vehicles, where understanding the environment is challenged by highly complex scenarios and varying high speeds. In this context, the camera has emerged as one of the most widely used sensors, thanks to its affordability and ease of implementation. Additionally, significant advancements in recognition and detection techniques in recent years have notably enhanced its potential. The main objective of this master’s thesis is to analyze the state of the art in techniques for obtaining object detections in the three-dimensional environment of a vehicle using information only from cameras. Specifically, the differences between two types of perception systems are quantified: on one hand, monocular end-to-end systems, which directly obtain object detections in a single stage from a camera image; and on the other hand, stereo modular systems, which require multiple stages to obtain the detections. In this approach, a preliminary stage is used to estimate depth in the image before performing object detections, a common practice in the state of the art. Starting from this point, several systems are selected, either for their outstanding results or for their processing speed efficiency, and subjected to various training and evaluation processes. The KITTI [1] dataset provides the data used for the experiments. However, since this work is part of a context involving a complete autonomous driving architecture, where testing is typically conducted in simulated environments, data from the CARLA simulator [2], collected in the SHIFT dataset [3], is used to assess the systems generalization capabilities in both real-world and simulated environments. The ultimate goal is to study the capabilities and limitations of these techniques, conducting a fair evaluation that allows for the selection of the best implementation for use on a real auto nomous vehicle and assessing the knowledge generalization capacity in both real and simulated environments.

Files in this item

Files	Size	Format	View
TFM_Antunes_Garcia_2023.pdf	26.87Mb	PDF

Files	Size	Format	View
TFM_Antunes_Garcia_2023.pdf	26.87Mb	PDF

Collections

TFM - Máster Universitario en Ingeniería de Telecomunicación [40]

Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Este ítem está sujeto a una licencia Creative Commons.