CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

CVPR 2024

Paper Code Bibtex

Abstract

We propose CorrespondentDream, an effective method to leverage annotation-free, cross-view correspondences yielded from the diffusion U-Net to provide additional 3D prior to the NeRF optimization process. We find that these correspondences are strongly consistent with human perception, and by adopting it in our loss design, we are able to produce NeRF models with geometries that are more coherent with common sense, e.g., more smoothed object surface, yielding higher 3D fidelity. We demonstrate the efficacy of our approach through various comparative qualitative results.

architecture
We optimize NeRF alternately using the SDS loss and cross-view correspondence loss. To compute the cross-view correspondence loss, we render two adjacent view sets from NeRF with identical noise, inputting them into a frozen pre-trained multi-view diffusion model. We then extract multi-layer features from the diffusion U-Net’s upsampling layers to establish diffusion correspondences between each view pair. Utilizing ground-truth camera parameters and NeRF-rendered depth, we reproject pixels to obtain NeRF correspondences. By minimizing the discrepancy between diffusion correspondences (the pseudo ground-truth) and NeRF correspondences, we correct NeRF’s 3D infidelities in the depths.

Top: Baseline (MVDream)

Bottom: Ours (CorrespondentDream)

CorrespondentDream generates multi-view consistent objects and scenes with improved 3D fidelity

Zoom in for clearer visualization of the infidelities in 3D geometry such as concavities / missing surfaces.

A bichon frise wearing academic regalia
A cute steampunk elephant
A DSLR photo of a corgi puppy
a zoomed out DSLR photo of a pug made out of modeling clay
a zoomed out DSLR photo of a yorkie dog dressed as a maid
an astronaut riding a horse
an orangutan holding a paint palette in one hand and a paintbrush in the other hand
Wall-E, cute, render, super detailed, best quality, 4K, HD
A zoomed out DSLR photo of a corgi wearing a top hat