arXiv 2019

DeepHuman: 3D Human Reconstruction from a Single Image


Zerong Zheng1, Tao Yu1,2, Yixuan Wei1, Qionghai Dai1, Yebin Liu1

1 Tsinghua University   2 Beihang University



We propose DeepHuman, a deep learning based framework for 3D human reconstruction from a single RGB image. Since this problem is highly intractable, we adopt a stage-wise, coarse-to-fine method consisting of three steps, namely inner body estimation, outer surface reconstruction and frontal surface detail refinement. Once an inner body is estimated from the given image, our method generates a dense semantic representation from the inner body to encode body shape and pose and to bridge the 2D image plane and 3D space. An image-guided volume-to-volume translation CNN is introduced to reconstruct the outer surface given the input image and the dense semantic representation. One key feature of our network is that it fuses different scales of image features into the 3D space through volumetric feature transformation, which helps to recover details of the subject's outer surface geometry. The details on the frontal areas of the outer surface are further refined through a normal map refinement network, which can be concatenated with the volume generation network using our proposed volumetric normal projection layer. We also contribute THuman, a 3D real-world human model dataset containing approximately 7000 models. The whole network is trained using training data generated from the dataset. Overall, due to the specific design of our network and the diversity in our dataset, our method enables 3D human reconstruction given only a single image and outperforms state-of-the-art approaches.


Fig 1. Illustration of the network architecture.





Fig 2. Example results reconstructed by our method.




Fig 3. 3D human reconstruction from single-view videos using our method.


