TextMesh: Generation of Realistic 3D Meshes From Text Prompts

Christina Tsalicoglou1, *     Fabian Manhardt2     Alessio Tonioni2     Michael Niemeyer2     Federico Tombari2,3    

1ETH Zurich     2Google    3Technical University of Munich

* Work done while interning at Google.

Accepted at 3DV
A great open source reimplementation of the first stage based on Neuralangelo and DeepFloyd IF can be found at threestudio.


The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on 2D images, for supervision of 3D model generation using view dependent prompts. While achieving impressive results, these methods, however, have two major drawbacks. First, rather than commonly used 3D meshes, they instead generate neural radiance fields (NeRFs), making them impractical for most real applications. Second, these approaches tend to produce over-saturated models, giving the output a cartoonish looking effect. Therefore, in this work we propose a novel method for generation of highly realistic-looking 3D meshes. To this end, we extend NeRF to employ an SDF backbone, leading to improved 3D mesh extraction. In addition, we propose a novel way to finetune the mesh texture, removing the effect of high saturation and improving the details of the output 3D mesh.

Examplary Generated 3D Meshes

Loading 3D Model

A frog wearing a read sweater

Loading 3D Model

An animal with the head of a rabbit, the body of a squirrel, the antlers of a deer, and legs of a pheasant

Loading 3D Model

A lemur writing into a notepad

Loading 3D Model

A squirrel-octpus hybrid

Loading 3D Model

A beautifully carved wooden knight
chess piece

Loading 3D Model

A 3D model of an adorable cottage
with a thatched roof

Loading 3D Model

A small, marbe statue of a cat,
sitting on a mat and licking its paws

Loading 3D Model

A chimpanzee dressed as a
football player

Loading 3D Model

DSLR photo of a chimpanzee holding a cup
of hot coffee

Loading 3D Model

DSLR photo of a marble bust of a
fox head

Reference

Paper PDF | arXiv
@inproceedings{tsalicoglou2024textmesh,
    title={TextMesh: Generation of Realistic 3D Meshes From Text Prompts}, 
    author={Christina Tsalicoglou and Fabian Manhardt and Alessio Tonioni and Michael Niemeyer and Federico Tombari},
    booktitle={International conference on 3D vision (3DV)},
    year={2024},
}
    

Contact

For questions regarding the method please contact us.