Learning to Infer Inner-Body under Clothing from Monocular Video

TVCG 2022

Xiongzheng Li1, Jing Huang1, Jinsong Zhang1, Xiaokun Sun1, Haibiao Xuan1, Yu-Kun Lai2, Yingdi Xie3, Jingyu Yang1 Kun Li1,*
1Tianjin University, 2Cardiff University, 3VRC Inc., *Corresponding author

Abstract

In this paper, we propose the first method to allow everyone to easily reconstruct their own 3D inner-body under daily clothing from a self-captured video with the mean reconstruction error of 0.73cm within 15s. This avoids privacy concerns arising from nudity or minimal clothing. Specifically, we propose a novel two-stage framework with a Semantic-guided Undressing Network (SUNet) and an Intra-Inter Transformer Network (IITNet). SUNet learns semantically related body features to alleviate the complexity and uncertainty of directly estimating 3D inner-bodies under clothing. IITNet reconstructs the 3D inner-body model by making full use of intra-frame and inter-frame information, which addresses the misalignment of inconsistent poses in different frames. Experimental results on both public datasets and our collected dataset demonstrate the effectiveness of the proposed method. The code and the dataset will be provided for research purposes.

Video


Teaser

Inner-Body teaser.

Given a self-captured video of a clothed person, our method can infer the inner-body masks under clothing and further reconstruct the 3D inner-body model with high accuracy, which enables convenient body measurement and virtual try-on applications.


Framework Overview

Inner-Body architecture.

Overview of our method.


Results


BibTeX

@article{li2022tvcg,
    author = {Xiongzheng Li, Jing Huang, Jinsong Zhang, Xiaokun Sun, Haibiao Xuan, Yu-Kun Lai, Yingdi Xie, Jingyu Yang and Kun Li},
    title = {Learning to Infer Inner-Body under Clothing from Monocular Video},
    booktitle = {IEEE Transactions on Visualization and Computer Graphics},
    year={2022},
    }