MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets

(*Equal contribution ✉Corresponding author)
1Beijing University of Posts and Telecommunications, 2Hong Kong Polytechnic University, 3Institute of Automation, Chinese Academy of Sciences, 4CAIR, HKISI-CAS, 5University of Chinese Academy of Sciences, 6Beijing University of Aeronautics and Astronautics, 7University of Science and Technology Beijing

Renderings of processed assets with material information under different lighting conditions. Given a raw asset, our MaterialSeg3D can actively predict and refine dense explicit surface material based on 2D priors. Equipped with material definitions, 3D assets support physically based rendering, leading to photorealistic visual effects.

Abstract

Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.

Comparison

Overall framework of MaterialSeg3D workflow. The material segmentation model is trained on MIO beforehand. Multi-view renderings are first generated with pre-defined and randomly selected camera angles and are further inferenced by the material segmentation model and attached to a stacked temporary UV map. Weighted voting and region unification are further applied to generate the final material UV.

Video Introduction

Visualizing Performances

To evaluate the effectiveness of the proposed material generation method, we compare previous approaches from the following three aspects: single-image-to-3D generation methods, texture generation methods, and public 3D assets. The corresponding results are shown in Figer. Considering single image-to-3D generation methods, we compare state-of-the-art Wonder3D, TripoSR, and OpenLRM in this section. Specifically, given a reference view as input, Wonder3D, TripoSR, and OpenLRM generate a 3D object with referenced texture. We can observe that the provided MaterialSeg3D significantly outperforms the previous work owing to the adoption of well-defined 3D mesh and Albedo information. For a fair comparison, we modify existing texture generation methods like Fantasia3D, Text2Tex, and online functions provided by Meshy (https://app.meshy.ai/) for evaluation. Given a well-defined geometry mesh, previous work provides texturing results according to the text prompt as shown in Figer. The results demonstrate our method provides much more realistic renderings under different lighting conditions. Note that for Fantasia3D, we only adopt its texture generation (Appearance Modeling) stage during comparison. Moreover, we also provide material generation results for 3D assets obtained from public websites, exemplified as tripo3d (https://www.tripo3d.ai/app/) and turbosquid (https://www.turbosquid.com/). From the results in Figer, we can observe the proposed MaterialSeg3D can generate precise PBR material information while significantly improving the overall quality of the assets.

Quantitative Results

PDF Page 2

Furthermore, we also provide quantitative results comparing our method and existing Image-to-3D methods including Wonder3D, TripoSR, and OpenLRM. We adopt CLIP Similarity, PSNR, and SSIM as the evaluations, and the corresponding results are shown in Table. We choose assets from Objaverse-1.0 dataset as the test sample and randomly select three camera angles as novel views. The ground-truth reference and novel views are captured from assets with ground-truth material information and fixed lighting conditions. Given a well-defined 3D mesh and Albedo, our workflow can provide reliable PBR material, resulting in more realistic rendering visual effects.

BibTeX

@article{li2024materialseg3d,
  title     = {MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets},
  author    = {Li, Zeyu and Gan, Ruitong and Luo, Chuanchen and Wang, Yuxi and Liu, Jiaheng and Zhang, Ziwei Zhu Man and Li, Qing and Yin, Xucheng and Zhang, Zhaoxiang and Peng, Junran},
  journal   = {arXiv preprint arXiv:2404.13923},
  year      = {2024}
}

This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

-->