AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

Huichao Zhang^1* Bowen Chen^1* Hao Yang¹ Liao Qu^1,2 Xu Wang¹
Li Chen¹ Chao Long¹ Feida Zhu¹ Daniel K. Du¹ Shilei Wen¹
^*Equal Contribution.
¹ByteDance ² Carnegie Mellon University

[Arxiv Report] [Github] [BibTeX]

🎉🎉🎉Code Has been released AvatarVerse.

🎉🎉🎉AvatarVerse has been accepted by AAAI2024!

Elsa in Frozen Disney

Woody in Toy Story

Captain America

Super Saiyan Goku

Buzz Lightyear

Link in Zelda

Methodology

Creating expressive, diverse and high-quality 3D avatars from highly customized text and pose is a challenging task owing to the intricacy of modeling and texturing in 3D that ensure details and various styles (realistic, fictional, etc).

In this project, we present AvatarVerse, a stable pipeline for generating high-quality 3D avatars controlled by both text descriptions and pose guidance. At the core of the proposed framework, we trained a DensePose-conditioned 2D diffusion model to establish precise and flexible view consistency control between 2d-3d, even in partial observed scenarios and thus effectively addressing the Janus Problem. Our progressive high-resolution strategies further contribute to a substantial improvement over the quality of the avatars.

Gallery

Here we demonstrate best-quality Head-Only, Half-Body, Full-Body and Pose-Control 3d avatars generated by our method.
Click to play the following animations.

Head-Only

Half-Body

Full-Body

Pose-Control

BibTeX

  @misc{zhang2023avatarverse,

        title={AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose}, 

        author={Huichao Zhang and Bowen Chen and Hao Yang and Liao Qu and Xu Wang and Li Chen and Chao Long and Feida Zhu and Kang Du and Min Zheng},

        year={2023},

        eprint={2308.03610},

        archivePrefix={arXiv},

        primaryClass={cs.CV}

    }

Project page template is borrowed from AnimateDiff.