Deep Learning on Point Clouds for 3D Scene Understanding

Deep Learning on Point Clouds for 3D Scene Understanding
Author: Ruizhongtai Qi
Publisher:
Total Pages:
Release: 2018
Genre:
ISBN:

Point cloud is a commonly used geometric data type with many applications in computer vision, computer graphics and robotics. The availability of inexpensive 3D sensors has made point cloud data widely available and the current interest in self-driving vehicles has highlighted the importance of reliable and efficient point cloud processing. Due to its irregular format, however, current convolutional deep learning methods cannot be directly used with point clouds. Most researchers transform such data to regular 3D voxel grids or collections of images, which renders data unnecessarily voluminous and causes quantization and other issues. In this thesis, we present novel types of neural networks (PointNet and PointNet++) that directly consume point clouds, in ways that respect the permutation invariance of points in the input. Our network provides a unified architecture for applications ranging from object classification and part segmentation to semantic scene parsing, while being efficient and robust against various input perturbations and data corruption. We provide a theoretical analysis of our approach, showing that our network can approximate any set function that is continuous, and explain its robustness. In PointNet++, we further exploit local contexts in point clouds, investigate the challenge of non-uniform sampling density in common 3D scans, and design new layers that learn to adapt to varying sampling densities. The proposed architectures have opened doors to new 3D-centric approaches to scene understanding. We show how we can adapt and apply PointNets to two important perception problems in robotics: 3D object detection and 3D scene flow estimation. In 3D object detection, we propose a new frustum-based detection framework that achieves 3D instance segmentation and 3D amodal box estimation in point clouds. Our model, called Frustum PointNets, benefits from accurate geometry provided by 3D points and is able to canonicalize the learning problem by applying both non-parametric and data-driven geometric transformations on the inputs. Evaluated on large-scale indoor and outdoor datasets, our real-time detector significantly advances state of the art. In scene flow estimation, we propose a new deep network called FlowNet3D that learns to recover 3D motion flow from two frames of point clouds. Compared with previous work that focuses on 2D representations and optimizes for optical flow, our model directly optimizes 3D scene flow and shows great advantages in evaluations on real LiDAR scans. As point clouds are prevalent, our architectures are not restricted to the above two applications or even 3D scene understanding. This thesis concludes with a discussion on other potential application domains and directions for future research.

3D Point Cloud Analysis

3D Point Cloud Analysis
Author: Shan Liu
Publisher: Springer Nature
Total Pages: 156
Release: 2021-12-10
Genre: Computers
ISBN: 3030891801

This book introduces the point cloud; its applications in industry, and the most frequently used datasets. It mainly focuses on three computer vision tasks -- point cloud classification, segmentation, and registration -- which are fundamental to any point cloud-based system. An overview of traditional point cloud processing methods helps readers build background knowledge quickly, while the deep learning on point clouds methods include comprehensive analysis of the breakthroughs from the past few years. Brand-new explainable machine learning methods for point cloud learning, which are lightweight and easy to train, are then thoroughly introduced. Quantitative and qualitative performance evaluations are provided. The comparison and analysis between the three types of methods are given to help readers have a deeper understanding. With the rich deep learning literature in 2D vision, a natural inclination for 3D vision researchers is to develop deep learning methods for point cloud processing. Deep learning on point clouds has gained popularity since 2017, and the number of conference papers in this area continue to increase. Unlike 2D images, point clouds do not have a specific order, which makes point cloud processing by deep learning quite challenging. In addition, due to the geometric nature of point clouds, traditional methods are still widely used in industry. Therefore, this book aims to make readers familiar with this area by providing comprehensive overview of the traditional methods and the state-of-the-art deep learning methods. A major portion of this book focuses on explainable machine learning as a different approach to deep learning. The explainable machine learning methods offer a series of advantages over traditional methods and deep learning methods. This is a main highlight and novelty of the book. By tackling three research tasks -- 3D object recognition, segmentation, and registration using our methodology -- readers will have a sense of how to solve problems in a different way and can apply the frameworks to other 3D computer vision tasks, thus give them inspiration for their own future research. Numerous experiments, analysis and comparisons on three 3D computer vision tasks (object recognition, segmentation, detection and registration) are provided so that readers can learn how to solve difficult Computer Vision problems.

Multimodal Scene Understanding

Multimodal Scene Understanding
Author: Michael Yang
Publisher: Academic Press
Total Pages: 422
Release: 2019-07-16
Genre: Computers
ISBN: 0128173599

Multimodal Scene Understanding: Algorithms, Applications and Deep Learning presents recent advances in multi-modal computing, with a focus on computer vision and photogrammetry. It provides the latest algorithms and applications that involve combining multiple sources of information and describes the role and approaches of multi-sensory data and multi-modal deep learning. The book is ideal for researchers from the fields of computer vision, remote sensing, robotics, and photogrammetry, thus helping foster interdisciplinary interaction and collaboration between these realms. Researchers collecting and analyzing multi-sensory data collections – for example, KITTI benchmark (stereo+laser) - from different platforms, such as autonomous vehicles, surveillance cameras, UAVs, planes and satellites will find this book to be very useful. Contains state-of-the-art developments on multi-modal computing Shines a focus on algorithms and applications Presents novel deep learning topics on multi-sensor fusion and multi-modal deep learning

3D Deep Learning with Python

3D Deep Learning with Python
Author: Xudong Ma
Publisher: Packt Publishing Ltd
Total Pages: 236
Release: 2022-10-31
Genre: Computers
ISBN: 1803233680

Visualize and build deep learning models with 3D data using PyTorch3D and other Python frameworks to conquer real-world application challenges with ease Key FeaturesUnderstand 3D data processing with rendering, PyTorch optimization, and heterogeneous batchingImplement differentiable rendering concepts with practical examplesDiscover how you can ease your work with the latest 3D deep learning techniques using PyTorch3DBook Description With this hands-on guide to 3D deep learning, developers working with 3D computer vision will be able to put their knowledge to work and get up and running in no time. Complete with step-by-step explanations of essential concepts and practical examples, this book lets you explore and gain a thorough understanding of state-of-the-art 3D deep learning. You'll see how to use PyTorch3D for basic 3D mesh and point cloud data processing, including loading and saving ply and obj files, projecting 3D points into camera coordination using perspective camera models or orthographic camera models, rendering point clouds and meshes to images, and much more. As you implement some of the latest 3D deep learning algorithms, such as differential rendering, Nerf, synsin, and mesh RCNN, you'll realize how coding for these deep learning models becomes easier using the PyTorch3D library. By the end of this deep learning book, you'll be ready to implement your own 3D deep learning models confidently. What you will learnDevelop 3D computer vision models for interacting with the environmentGet to grips with 3D data handling with point clouds, meshes, ply, and obj file formatWork with 3D geometry, camera models, and coordination and convert between themUnderstand concepts of rendering, shading, and more with easeImplement differential rendering for many 3D deep learning modelsAdvanced state-of-the-art 3D deep learning models like Nerf, synsin, mesh RCNNWho this book is for This book is for beginner to intermediate-level machine learning practitioners, data scientists, ML engineers, and DL engineers who are looking to become well-versed with computer vision techniques using 3D data.

2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR)

2020 IEEE CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Author: IEEE Staff
Publisher:
Total Pages:
Release: 2020-06-13
Genre:
ISBN: 9781728171692

CVPR is the premier annual computer vision event comprising the main conference and several co located workshops and short courses With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers

Representations and Techniques for 3D Object Recognition and Scene Interpretation

Representations and Techniques for 3D Object Recognition and Scene Interpretation
Author: Derek Hoiem
Publisher: Morgan & Claypool Publishers
Total Pages: 172
Release: 2011
Genre: Computers
ISBN: 1608457281

One of the grand challenges of artificial intelligence is to enable computers to interpret 3D scenes and objects from imagery. This book organizes and introduces major concepts in 3D scene and object representation and inference from still images, with a focus on recent efforts to fuse models of geometry and perspective with statistical machine learning. The book is organized into three sections: (1) Interpretation of Physical Space; (2) Recognition of 3D Objects; and (3) Integrated 3D Scene Interpretation. The first discusses representations of spatial layout and techniques to interpret physical scenes from images. The second section introduces representations for 3D object categories that account for the intrinsically 3D nature of objects and provide robustness to change in viewpoints. The third section discusses strategies to unite inference of scene geometry and object pose and identity into a coherent scene interpretation. Each section broadly surveys important ideas from cognitive science and artificial intelligence research, organizes and discusses key concepts and techniques from recent work in computer vision, and describes a few sample approaches in detail. Newcomers to computer vision will benefit from introductions to basic concepts, such as single-view geometry and image classification, while experts and novices alike may find inspiration from the book's organization and discussion of the most recent ideas in 3D scene understanding and 3D object recognition. Specific topics include: mathematics of perspective geometry; visual elements of the physical scene, structural 3D scene representations; techniques and features for image and region categorization; historical perspective, computational models, and datasets and machine learning techniques for 3D object recognition; inferences of geometrical attributes of objects, such as size and pose; and probabilistic and feature-passing approaches for contextual reasoning about 3D objects and scenes. Table of Contents: Background on 3D Scene Models / Single-view Geometry / Modeling the Physical Scene / Categorizing Images and Regions / Examples of 3D Scene Interpretation / Background on 3D Recognition / Modeling 3D Objects / Recognizing and Understanding 3D Objects / Examples of 2D 1/2 Layout Models / Reasoning about Objects and Scenes / Cascades of Classifiers / Conclusion and Future Directions

Deep Learning for Understanding Dynamic Visual Data

Deep Learning for Understanding Dynamic Visual Data
Author: Xingyu Liu (Researcher in artificial intelligence)
Publisher:
Total Pages:
Release: 2019
Genre:
ISBN:

Teaching machines to interpret the visual observations of our dynamic world as humans do is a central topic in Artificial Intelligence. The goal is to process various types of visual data and generate symbolic or numerical descriptions similar to human understanding to support decision making of autonomous agents. Compared to an individual visual snapshot, a dynamic visual data sequence accumulates more relevant information over time, allows motion information to be leveraged, and therefore potentially enables better generation of such descriptions. The recent success of deep learning inspires us to utilize deep neural networks to analyze the complex patterns of dynamic visual data, in contrast to traditional approaches which rely on hand-crafted spatiotemporal descriptors. Different from previous related deep learning methods, in this thesis, we argue that the correspondences of positions across frames are the dynamic component of visual data and should be modeled by the deep network architectures. We discuss the design philosophies for the deep architecture in terms of selecting correspondence candidates, generating representations from the candidates through learning, and deploying the network to various applications. Accordingly, we present four deep learning methods for processing and understanding dynamic visual data. The processed visual data modality covers two or multiple frames of 2D RGB images or 3D point clouds. We start by introducing FlowNet3D, a deep neural network for estimating scene flow between point clouds at consecutive timestamps in an end-to-end fashion. Our method lets points in one point cloud find correspondence candidates in another point cloud to learn the true correspondences and shows great advantages while being evaluated on existing benchmarks. We then present CPNet and MeteorNet, two deep learning backbone architectures that learn representations for RGB videos and 3D point cloud sequences respectively. Both methods effectively learns temporal relations by proposing and aggregating correspondence candidates. We showcase their leading performance on tasks including action recognition, semantic segmentation and scene flow estimation. We also describe KeyPose, a deep learning architecture for estimating 3D keypoint locations of objects from stereo RGB images, as well as a new dataset for studying transparent objects. Through extensive experiments, we demonstrate that estimating 3D object poses by modeling correspondences in stereo images has advantage over depth-based methods. This thesis concludes with a discussion on other potential application domains and directions for future research.

Computer Vision – ECCV 2022

Computer Vision – ECCV 2022
Author: Shai Avidan
Publisher: Springer Nature
Total Pages: 806
Release: 2022-10-20
Genre: Computers
ISBN: 3031198158

The 39-volume set, comprising the LNCS books 13661 until 13699, constitutes the refereed proceedings of the 17th European Conference on Computer Vision, ECCV 2022, held in Tel Aviv, Israel, during October 23–27, 2022. The 1645 papers presented in these proceedings were carefully reviewed and selected from a total of 5804 submissions. The papers deal with topics such as computer vision; machine learning; deep neural networks; reinforcement learning; object recognition; image classification; image processing; object detection; semantic segmentation; human pose estimation; 3d reconstruction; stereo vision; computational photography; neural networks; image coding; image reconstruction; object recognition; motion estimation.

A Review of Point Cloud Registration Algorithms for Mobile Robotics

A Review of Point Cloud Registration Algorithms for Mobile Robotics
Author: Francois Pomerleau
Publisher:
Total Pages: 122
Release: 2015-05-27
Genre: Technology & Engineering
ISBN: 9781680830248

Deals with the topic of geometric registration in robotics. It provides a historical perspective of the registration problem and shows that the various solutions available can be organized and differentiated in a framework according to a few elements. It also reviews a few applications of this framework in mobile robotics.