AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose

Huichao Zhang1* Bowen Chen1* Hao Yang1 Liao Qu1,2 Xu Wang1
Li Chen1 Chao Long1 Feida Zhu1 Daniel K. Du1 Shilei Wen1
*Equal Contribution.
1ByteDance 2 Carnegie Mellon University

[Arxiv Report]     [Github]     [BibTeX]

🎉🎉🎉Code Has been released AvatarVerse.

🎉🎉🎉AvatarVerse has been accepted by AAAI2024!

Elsa in Frozen Disney

Woody in Toy Story

Captain America

Super Saiyan Goku

Buzz Lightyear

Link in Zelda


Creating expressive, diverse and high-quality 3D avatars from highly customized text and pose is a challenging task owing to the intricacy of modeling and texturing in 3D that ensure details and various styles (realistic, fictional, etc).

In this project, we present AvatarVerse, a stable pipeline for generating high-quality 3D avatars controlled by both text descriptions and pose guidance. At the core of the proposed framework, we trained a DensePose-conditioned 2D diffusion model to establish precise and flexible view consistency control between 2d-3d, even in partial observed scenarios and thus effectively addressing the Janus Problem. Our progressive high-resolution strategies further contribute to a substantial improvement over the quality of the avatars.


Here we demonstrate best-quality Head-Only, Half-Body, Full-Body and Pose-Control 3d avatars generated by our method.
Click to play the following animations.






  title={AvatarVerse: High-quality & Stable 3D Avatar Creation from Text and Pose},
  author={Huichao Zhang and Bowen Chen and Hao Yang and Liao Qu and Xu Wang and Li Chen and Chao Long and Feida Zhu and Kang Du and Min Zheng},

Project page template is borrowed from AnimateDiff.