The dataset features several common 3D objects rendered as images in Blender with no textures and different lights. Depth and normal maps are provided.
The intended use for the dataset is in 3D reconstruction from a single RGB image, and each sample is labelled with a ground truth (GT) depth map and normal map.
The dataset was rendered in Blender 2.93.6 as RGB images and corresponding depth map arrays with a resolution of 512 x 512 px. Surface normals were calculated from the depth map using the following algorithm:
def dmap2norm(dmap): """Computes surface normals from a depth map. :param dmap: A grayscale depth map image as a numpy array of size (H,W). :return: The corresponding surface normals map as numpy array of size (H,W,3). """ # calculate surface normals zx = cv2.Sobel(dmap, cv2.CV_64F, 1, 0, ksize=5) zy = cv2.Sobel(dmap, cv2.CV_64F, 0, 1, ksize=5) # convert to unit vectors normal = np.dstack((-zx, -zy, np.ones_like(dmap))) n = np.linalg.norm(normal, axis=2) normal[:, :, 0] /= n normal[:, :, 1] /= n normal[:, :, 2] /= n # offset and rescale values to be in 0-1 normal += 1 normal /= 2 return normal[:, :, ::-1]
Each sequence renders a high-polygon 3D model of a common everyday object with realistic deformations which is gradually rotated through 360 rotations around itself. One sample is saved at each degree of rotation, ensuring sufficiently different samples while still completely capturing the object from all sides. This process is repeated in various configurations, as details in following subsections. We obtain 10,800 samples per object, and with 38 objects in total, a database of 410,400 labelled RGB-D samples is generated.
Four different lighting conditions are used, with each setup producing different shadows and making the dataset invariant to lighting. There are three lights in the scene, including a cool-blue colored, slightly tilted sunlight far above the object, two pale-yellow halogen lamps facing the object from front on the right and left respectively, and another halogen lamp facing towards the object from behind. Following combinations of lights are used:
Ls: Sunlight only.
Ll: Front-left lamp (plus sunlight).
Lr: Front-right lamp (plus sunlight).
Lb: Back lamp (plus sunlight).
La: All lamps (plus sunlight).
This way, each sequence has at least two light sources, and a wide variety of shadows are generated on same surfaces.
The camera is positioned directly in front of the object, and its height and viewing angle are adjusted in three different configurations:
front: Same height as the object, and looking directly at it.
down: Above the object and looking down at it.
up: Below the object and looking up towards it.
The exact camera angles when looking down and up, as well as the distance of the camera from the object change per object, depending on the shape and size of the model.
All sequences are rendered once using a bare, colorless model with no texture added, and again with a diffuse material of a random but uniform color mapped onto the whole surface.
The dataset includes 38 different 3D models of varying degrees of realisticity in terms of deformations and polygon. See the table below for a summary of the included objects.
|Category||Objects||# of Objects|
These models were obtained from several sources in the public domain, as listed in the following subsections.
Models obtained from this repository include the following 5 Stanford models and 2 XYZ RGB models:
- Stanford Bunny (
- Stanford Dragon (
- Happy Buddha (
- Armadillo (
- Lucy (
- Asian Dragon (
- Thai Statue (
This repository was published by Keenan Crane of Carnegie Mellon University under the CC0 1.0 Universal (CC0 1.0) Public Domain License. The following 6 models were obtained from here:
- Bob (
- Blub (
- Spot (
- Yeah Right (
- Chiba City Blues (
- San Diego Convention Center (
teapot is Martin Newell’s Utah Teapot, and the remaining 24 models were all obtained for free from CGTrader with a Royalty Free License. A complete list of sources for each individual model can be found here.
This dataset was collected while working with Dr. Muhammad Zeshan Afzal at Deutsche Forschungszentrum für Künstliche Intelligenz (DFKI).