Generating Synthetic RGB-D Datasets for Texture-less Surfaces Reconstruction

Samples from the synthetic dataset. There are 35 different texture-less objects in the dataset, which are enumerated in the top seven rows of the figure. The last row shows selected objects from the dataset with color enabled in the RGB image.

Abstract

Monocular 3D reconstruction is a major computer vision task. The state-of-the-art approaches mainly focus on datasets with highly textured images. Most of these methods are trained on datasets like ShapeNet which contain rendered images of well-textured objects. However, in real scenes, many objects are texture-less and it is difficult to reconstruct them. Unlike textured surfaces, reconstruction of texture-less surfaces has not received as much attention mainly because of a lack of large-scale annotated datasets. Some recent works have focused on texture-less surfaces as well, many of which are trained on a small real-world dataset containing 26k images of 5 different texture-less clothing items. To facilitate further research in this direction, we present a dataset generation strategy for texture-less images. We also make available a large dataset containing 302k images with corresponding groundtruth depth maps and surface normal maps. In addition to clothing items, our dataset also contains images of more everyday objects including animals, furniture, statues, vehicles, and other miscellaneous items. There are 35 different objects in total. This will enable future work in reconstructing a wider variety of texture-less surfaces.

Get Full-Text Downloads

In this paper, we introduce two new datasets. The first is a large dataset of synthetic texture-less 3D objects rendered as images in Blender with no textures and different lights. It contains 302k samples from 35 different objects. The second is a small supplementary dataset of real-world objects containing 4k samples. For both datasets, groundtruth depth maps and surface normal maps are provided in addition to RGB images. These datasets can be downloaded on this page.

Synthetic RGB-D Dataset of Texture-less Surfaces

The dataset includes 35 different 3D models of varying degrees of realisticity in terms of deformations and polygon. See the table below for a summary of the included objects.

Category Objects # of Objects
animals asian_dargon, bunny, cats, dragon, duck, pig 6
clothing cape, dress, hoodie, jacket, shirt, suit, tracksuit, tshirt 8
furniture armchair, bed, chair, rocking_chair, sofa, table 6
misc diego, kettle, plants, skeleton, teapot 5
statues armadillo, buddha, lucy, roman, thai_statue 5
vehicles bicycle, car, jeep, ship, spaceship 5


We rendered these 3D objects as images using the 3D modeling software Blender with no textures and different lighting setups. The intended use for the dataset is in 3D reconstruction from a single RGB image, and each sample is labelled with a ground truth (GT) depth map and normal map.

Data Acquisition

The dataset was rendered in Blender 2.93.6 as RGB images and corresponding depth map arrays with a resolution of 512 x 512 px. Surface normals were computed by differentiating the depth map.

The Blender scene used for data generation.
Blender scene. The 3D model is surrounded by four different lights and three different cameras. Ambient floor light always stays on, whereas different combinations of remaining lights are used to render the scene by looking at the object from one camera at a time. Sequences are named after the configuration used.

Each sequence renders a high-polygon 3D model of a common everyday object with realistic deformations which is gradually rotated through 360 rotations around itself. One sample is saved at each degree of rotation, ensuring sufficiently different samples while still completely capturing the object from all sides. This process is repeated in various configurations, as detailed in the following subsections. We obtain 8,640 samples per object, and with 35 objects in total, a database of 302,400 labeled RGB-D samples is generated.

Lighting

Four different lighting conditions are used, with each setup producing different shadows and making the dataset invariant to lighting. There are three lights in the scene, including a cool-blue colored, slightly tilted sunlight far above the object, two pale-yellow halogen lamps facing the object from front on the right and left respectively, and another halogen lamp facing towards the object from behind. Following combinations of lights are used:

  1. 1) Ls: Sunlight only.
  2. 2) Ll: Front-left lamp (plus sunlight).
  3. 3) Lr: Front-right lamp (plus sunlight).
  4. 4) Lb: Back lamp (plus sunlight).
  5. 5) La: All lamps (plus sunlight).

This way, each sequence has at least two light sources, and a wide variety of shadows are generated on same surfaces.

Perspective

The camera is positioned directly in front of the object, and its height and viewing angle are adjusted in three different configurations:

  1. 1) front: Same height as the object, and looking directly at it.
  2. 2) down: Above the object and looking down at it.
  3. 3) up: Below the object and looking up towards it.

The exact camera angles when looking down and up, as well as the distance of the camera from the object change per object, depending on the shape and size of the model.

Color

All sequences are rendered once using a bare, colorless model with no texture added, and again with a diffuse material of a random but uniform color mapped onto the whole surface.

Data Sources

These models were obtained from several sources in the public domain, as listed in the following subsections.

The Stanford 3D Scanning Repository

Models obtained from this repository include 5 Stanford models and 2 XYZ RGB models. These include bunny, dragon, buddha, armadillo, lucy, asian_dargon, and thai.

Keenan’s 3D Model Repository

This repository was published by Keenan Crane of Carnegie Mellon University under the CC0 1.0 Universal (CC0 1.0) Public Domain License. duck, pig, skeleton and diego were obtained from here.

Other Sources

The teapot is Martin Newell’s Utah Teapot, and the remaining 24 models were all obtained for free from CGTrader with a Royalty Free License. A complete list of sources for each individual model can be found here.

Supplementary Dataset of Real Objects

A small supplementary dataset containing 4k samples of real-world objects captured with a Microsoft Kinect camera is also provided.

Source Code

All code used for data generation as well as PyTorch data loaders for reading the dataset are available on GitHub.

License

Our datasets are available under the CC BY 4.0 license (read summary). The source code is provided with the MIT License.

Acknowledgements

This dataset was collected as part of a research project at German Research Center for Artificial Intelligence (DFKI).