Paul Amayo, Oxford University, UK
A unified representation for application of architectural constraints in large-scale mapping
The built urban environment by construction contains a multitude of geometric features such as lines and planes at multiple configurations to each other. In this work we offer a representation which in a natural and unified way, captures the variety of the architectural constraints that can be discovered and applied in laser-based urban reconstructions driven by these geometric features. Furthermore, we present an evaluation of this approach against ground truth collected via a 3rd party professional survey using high-end (static) 3D laser scanners.
Jesus Briales, University of Malaga, Spain
Fast and Global 3D Registration of Points, Lines and Planes
The registration of 3D models by a Euclidean transformation is a fundamental task at the core of many applications in robotics and computer vision. This goes beyond the mere registration of point clouds, often involving other geometric features such as lines and planes. This task turns into an optimization problem that is non-convex due to the presence of an unknown rotation, making traditional iterative optimization methods prone to getting stuck in local minima.
In this talk I will focus on the ubiquitous scenario of registering a set of 3D points to a model formed by points, lines and/or planes, assuming the correspondences are given. First, I will present a unified and very compact (still non-convex) formulation of this problem, rendering its complexity linear on the number of correspondences. Then, based on this formulation, I will show that the non-convexity due to rotation can be circumvented through an appropriate convex relaxation. Using this relaxation (a Semidefinite Program) the problem can be reliably solved by iterative methods, in a global fashion, regardless of the provided initialization. This is a very appealing trait for many scenarios and it is our hope that these findings may drive further advancements towards solving any kind of registration problem fast, reliably, and efficiently.
José Martínez Carranza, INAOE, Mexico
Filling the gaps of 3D mapping in Monocular SLAM: from inverse depth for planes to super-pixels
Visual SLAM systems using visual features and producing 3D point maps are widely used. However, 3D points are not very useful if such points are not first processed to give sense to the 3D structure they represent. In this regard, we will discuss a couple of techniques aiming at "filling the gaps" with planar structures that can be inferred from processing camera images in real time. The first approach is a stochastic technique based on inverse depth but extended to planar structures; the second approach uses super-pixels and GPU for rapid plane estimation.
Single-View and Multi-View Planar Models for Dense Monocular Mapping
Estimating dense 3D maps from multiple monocular views is a challenging task. The image pixels that have a high color gradient can be reliably matched and accurately triangulated (if the parallax is enough). The matching might be noisier if small textureless areas are present, but we can still reconstruct them accurately by multiview correspondence and regularization. But large textureless areas, e.g. blank walls, usually present high reconstruction errors. Notice that large textureless areas usually correspond to 3D planes, particularly in man-made environments.
In this talk we will summarize our work on incorporating multi-planar priors for 3D monocular mapping. We will detail algorithms for 3D mapping using 1) plane discovery, triangulation and fitting from superpixels in multiple views, and 2) detection of Manhattan and mid-level planar patterns from single and multiple views. Our experimental results will show that the addition of such planar priors reduces the reconstruction error. We will also emphasize the limitations and strengths of each of the presented techniques and their complementary nature.
W. Nicholas Greene, MIT, USA
Fast Lightweight Mesh Estimation using Variational Smoothing on Delaunay Graphs
We propose a lightweight method for dense online monocular depth estimation capable of reconstructing 3D meshes on computationally constrained platforms. Our main contribution is to pose the reconstruction problem as a non-local variational optimization over a time-varying Delaunay graph of the scene geometry, which allows for an efficient approach to depth estimation. The graph can be tuned to favor reconstruction quality or speed and is continuously smoothed and augmented as the camera explores the scene.
FLaME (Fast Lightweight Mesh Estimation) can generate mesh reconstructions at upwards of 230 Hz using less than one Intel i7 CPU core, which enables operation on size, weight, and power-constrained platforms. We present results from both benchmark datasets and experiments running FLaME in-the-loop onboard a small flying quadrotor.
Ming Hsiao, Carnegie Mellon University, USA
Dense Planar-Inertial SLAM for Large-scale Indoor 3D Reconstruction
Planes are one of the most common geometric structures found in indoor environments. In this work, we demonstrate that properly making use of these planar observations in a SLAM framework can not only reduce the drift and distortion but also improve the efficiency over state-of-the-art dense 3D reconstruction systems. In addition, we add an inertial sensor and fuse its measurements with planar observations to increase robustness and accuracy. We compare the outputs from our novel CPU-based dense planar-inertial SLAM solution with the dense point cloud models generated from 3D laser scans on large-scale indoor datasets, showing its capability for accurate dense 3D reconstruction in real-time.
Lingni Ma, Technical University of Munich, Germany
Exploiting planar structures for RGB-D tracking and mapping
In this talk, I will discuss how to exploit planar structure in dense 3D indoor reconstruction and real-time RGB-D SLAM. The talk will first cover some efficient algorithms to detect planar structures for point clouds, TSDF volumetric data and depth images. I will then introduce our previous work to simplify dense triangular meshes with underlying planes using quadtree based decimation technique. Last but not least, I will discuss how to formulate planes into direct camera tracking methods and further integrate them as extra constrains in global graph optimization for RGB-D SLAM.
Rubén Gómez Ojeda, University of Malaga, Spain
Visual Odometry and SLAM using Line Segment Features
Traditional approaches to Visual Odometry (VO) or SLAM are typically based on point features to estimate the camera trajectory and map the environment. However, the performance of such approaches deteriorates dramatically in low-textured environments, where it is difficult to find a reliable set of features. In contrast, line segments are usually abundant in human-made scenarios, which are characterized by regular structures rich in edges and linear shapes. We will present two different approaches to visual odometry as a result of the combination of both point and line segment features: PLVO (stereo) and PL-SVO (monocular). Finally, we briefly introduce our recent approach to stereo visual SLAM with line segments, PL-SLAM, where line segments are also employed for keyframe selection, bundle adjustment, and loop closing.
Julian Straub, Oculus, USA
Nonparametric Directional Perception
Artificial perception systems, like autonomous cars and augmented reality headsets, rely on dense 3D sensing technology such as RGB-D cameras and LiDAR scanners. Due to the structural simplicity of man-made environments, understanding and leveraging not only the 3D data but also the local orientations of the constituent surfaces, has huge potential. From an indoor scene to large-scale urban environments, a large fraction of the surfaces can be described by just a few planes with even fewer different normal directions. This sparsity is evident in the surface normal distributions, which exhibit a small number of concentrated clusters. In this work, I draw a rigorous connection between surface normal distributions and 3D structure, and explore this connection in light of different environmental assumptions to further 3D perception. Specifically, I propose the concepts of the Manhattan Frame and the unconstrained directional segmentation. These capture, in the space of surface normals, scenes composed of multiple Manhattan Worlds and more general Stata Center Worlds, in which the orthogonality assumption of the Manhattan World is not applicable. This exploration is theoretically founded in Bayesian nonparametric models, which capture two key properties of the 3D sensing process of an artificial perception system: (1) the inherent sequential nature of data acquisition and (2) that the required model complexity grows with the amount of observed data. The inference algorithms I derive herein inherently exploit and respect these properties. The fundamental insights gleaned from the connection between surface normal distributions and 3D structure lead to practical advances in scene segmentation, drift-free rotation estimation, global point cloud registration and real-time direction-aware 3D reconstruction to aid artificial perception systems.
Yuichi Taguchi, MERL, USA
RGB-D SLAM Using Hybrid Correspondences
We present RGB-D SLAM systems using a mixed set of correspondences. In addition to typical 3D point correspondences, our systems use (1) 3D plane correspondences, (2) 2D-to-3D point correspondences, and (3) correspondences between objects represented as a set of points and planes. We describe how to find those correspondences and use them in initial RANSAC registration and subsequent bundle adjustment. We show that our systems provide accurate registration, compact plane-based 3D models, and distinctive object models for detection and localization, which can be useful for several robotics applications such as navigation and manipulation.
Osman Ulusoy, Microsoft
Towards Probabilistic Volumetric Reconstruction using Ray Potentials
Occlusions, textureless or reflective surfaces, etc. cause ambiguities in dense 3D reconstruction from RGB images. In this talk, I will present a probabilistic framework for multi-view stereo that allows exposing these ambiguities. We formulate 3D reconstruction as inference in a Markov random field where ray potentials accurately capture the dependencies between the input pixels and the estimated voxel occupancy and color. This probabilistic approach also allows incorporating surface priors in a principled way. In the second half of my talk, I will present a prior that encourages piecewise planarity, which effectively addresses ambiguities in reconstructing large textureless surfaces.
Shichao Yang, Carnegie Mellon University, USA
Joint semantic and geometric SLAM using high level features
In this talk, we will present a monocular SLAM system using high-level features such as lines, planes, and objects. It can improve over traditional SLAM to work more robustly in challenging low-texture environments and also produce a dense but compact map. There has been great progress in image based semantic understanding such as object detection and layout estimation which is usually limited to Manhattan box rooms. We propose improvements to make it more robust to various indoor scenarios. The lines, planes from single image understanding are then further optimized by multi-view SLAM. Practical experiments have demonstrated that semantic understanding and SLAM can benefit each other. The combined system is able to track camera more accurately and generate more meaningful maps.
Alejandro Perez-Yus, University of Zaragoza, Spain
Wide RGB-D for Scaled Layout Reconstruction
The possibilities of RGB-D cameras are often demoted by the narrow field of view most systems currently have. In this talk, we propose a method to use a wide angle camera (e.g. a fisheye) to extend the basic depth information to its whole field of view in one single shot. This operation is performed by the estimation of the best fitting Manhattan layout of the scene with the combination of line intersections from the fisheye image and planes from the depth information. The resulting 3D reconstruction combines the initial depth with the layout of the wider scene with its real scale.