Deep learning has revolutionized computer vision over the past decade. Several vision tasks such as object recognition, semantic segmentation, optical flow estimation, and more can now be resolved with unparalleled precision using deep neural networks. Breakthroughs in deep learning have led to recent advancements in GPU technology and scalable algorithms. In particular, in long-standing vision problems for AI developers, such as image recognition or object detection, convolutional neural networks (CNNs) achieve state-of-the-art outcomes. However, autonomous agents in our world that navigate and communicate need 3D.
Let’s talk more about 3D data and representations.
- Overview of Representations of 3D Data
- Deep Learning Architecture on Different 3D Data Representations
- 3D Euclidean-Structured Data Deep Learning Architectures
- Deep Learning Architectures on 3D Non-Euclidean Structured Data
Overview of Representations of 3D Data
Various types of raw 3D data are captured by different scanning devices that vary in both structure and properties. This section classifies them into two prominent families; we go through different 3D data representations: Euclidean-structured data and non-Euclidean data. Previous attempts have attempted to fix one or more representations of 3D data, but not both categories. The current work is intended to be more detailed and offers in-depth information on various 3D data representations.
There is an underlying Euclidean structure in some 3D data representations were the grid-structured data’s properties are retained, such as having a global parametrization and a standard coordinate system. Descriptors, forecasts, RGB-D data, volumetric data, and multi-view data are the critical 3D data representations that fall under this group.
Non-Euclidean data is the second form of 3D data representation. There is no global parametrization of a standard system of coordinates for this form of data. It also lacks a vector space structure, making it not a straightforward job to expand 2D DL paradigms. Significant efforts have been geared towards learning and applying DL techniques to such data representation. Point clouds, 3D meshes, and graphs are the primary form of non-Euclidean data. Such systems have many characteristics in common that will be addressed in this section. It is important to note that according to the scales on which the processing occurs, i.e., globally or locally, both point clouds and meshes can be used as Euclidean and non-Euclidean data. We decided to list them as part of the non-Euclidean data despite this dual nature. Even though this knowledge looks locally like Euclidean, they suffer from infinite curvature and self-intersections.
Deep Learning Architecture on Different 3D Data Representations
Deep Learning has made a remarkable contribution to achieving state-of-the-art results on many 2D computer vision tasks in the field of computer vision; DL began to gain prominence in the 3D domain trying to use the rich 3D data available while considering.
However, due to the dynamic geometric nature of 3D structures and the broad structural variations resulting from having various 3D representations, applying DL models to 3D data is not straightforward. Having different 3D data representations has led data science analysts to follow various DL routes to adapt the learning process to the data properties. In this chapter, different DL paradigms applied to 3D data are overviewed, classifying them into two different families based on data representation: Euclidean data DL architectures and non-Euclidean data DL architectures.
3D Euclidean-Structured Data Deep Learning Architectures
The Euclidean approaches that work on data with an underlying Euclidean structure are the first type of 3D DL approaches. The already developed 2D DL approaches can be directly applied due to the grid-like nature of such data. Representing 3D data in a 2D way means that some processing is subject to the initial 3D representation to create a simplified 2D representation on which the classical 2D DL techniques can work. Over the years, this processing has grown, resulting in various 2D representations of 3D data with different characteristics and properties. DL architectures are tailored to each representation that attempts to capture this data representation’s geometric properties.
Initially, artificial intelligence experts used developments already established to extract features from 3D data that generate grid-like shallow features where 2D DL models can be applied directly. Given the complexity of 3D data, however, this type of approach does not discriminatively understand the intrinsic geometric properties of the shape. It could result in the exclusion of significantly important 3D object details. This moved the depth modality directly from the RGB-D data with DL models to be exploited, useful in some applications. Still, this type of data is not ideal for the study of tricky situations. This inspired researchers to specifically apply DL methods, such as volumetric data and multi-view data, to 3D data.
Deep Learning Architectures on 3D Non-Euclidean Structured Data
The non-Euclidean approaches that try to expand the DL definition to geometric data are the second type of 3D DL approaches. However, the nature of the data presents challenges to how the critical DL activities, such as convolution, can be carried out. Several architectures have been proposed that attempt to extend DL to the geometric domain of 3D. To learn the geometry of a 3D shape and use it for modeling tasks, some of these architectures discuss “3D point clouds” to learn the geometry of a 3D shape and use it for modeling tasks. Results encouraged researchers to leverage the surface information provided in 3D meshes. It is possible to exploit the connectivity between vertices to define local pseudo-coordinates to perform a convolution-like operation on 3D meshes.
The continuing evolution of scanning devices has triggered a significant increase in 3D data available in the 3D computer vision testing community. Despite the difficulties imposed by the data itself, this opens the door for new opportunities for AI developers to explore various 3D structures’ properties and learn their geometric characteristics. Fortunately, the learning output on different 2D computer vision tasks was revolutionized by DL techniques, which inspired the 3D research community to follow the same direction. However, extending 2D DL to 3D data is not a simple task based on the data representation itself and the task at hand.