Correlation-aware Encoder-Decoder with Adapters for SVBRDF Acquisition
Proceedings of SIGGRAPH Asia 2024
-
Di Luo*
Nankai University
-
Hanxiao Sun*
Nankai University
-
Lei Ma
Peking University
-
Jian Yang
Nankai University
-
Beibei Wang✝
Nanjing University
Abstract
Capturing materials from the real world avoids laborious manual material authoring. However, recovering high-fidelity Spatially Varying Bidirectional Reflectance Distribution Function (SVBRDF) maps from a few captured images is challenging due to its ill-posed nature. Existing approaches have made extensive efforts to alleviate this ambiguity issue by leveraging generative models with latent space optimization or extracting features with variant encoder-decoders. Albeit the rendered images at input views can match input images, the problematic decomposition among maps leads to significant differences when rendered under novel views/lighting. We observe that for human eyes, besides individual images, the correlation (or the highlights variation) among input images also serves as an important hint to recognize the materials of objects. Hence, our key insight is to explicitly model this correlation in the SVBRDF acquisition network. To this end, we propose a correlation-aware encoder-decoder network to model the correlation features among the input images via a graph convolutional network by treating channel features from each image as a graph node. This way, the ambiguity among the maps has been reduced significantly. However, several SVBRDF maps still tend to be over-smooth, leading to a mismatch in the novel-view rendering. The main reason is the uneven update of different maps caused by a single decoder for map interpretation. To address this issue, we further design an adapter-equipped decoder consisting of a main decoder and four tiny per-map adapters, where adapters are employed for individual maps interpretation, together with fine-tuning, to enhance flexibility. As a result, our framework allows the optimization of the latent space with the input image feature embeddings as the initial latent vector and the fine-tuning of per-map adapters. Consequently, our method can outperform existing approaches both visually and quantitatively on synthetic and real data.
Pipeline
Structure: our network has an encoder-decoder structure, where the encoder consists of a graph convolutional network to learn correlation among the input images, followed by an encoder from Nonlinear Activation Free Network (NAFNet) to encode features into a latent vector z and the decoder includes a material decoder, together with several map adapters to output SVBRDFs. Training: the network is trained end-to-end with the rendering loss and map loss, and all the components are updated. Optimization: During the optimization, the network is optimized for each material with the rendering loss. The input images are fed into the encoder to obtain an initialized latent vector z, and then the decoder performs the latent space optimization (①) starting from z for several iterations. Later, the map adapters are fine-tuned (②) for 1K iterations with the found latent vector (frozen) and frozen material decoder to output the final SVBRDFs.
Multi-image SVBRDF recovery
Comparison between our method, MaterialGAN and DIR on synthetic and real data, where the input image count is set as four. Our model outperforms the other methods in terms of the recovered SVBRDFs and renderings. We use the error map to show the difference between the novel-view renderings and the reference images. The lowest error is marked in bold.
Single image SVBRDF recovery
Comparison between our method, DIR, MaterialGAN, LAT and DeepBasis on synthetic and real data, with a single image as input. For the synthetic data, our model produces the closest SVBRDF maps in most cases, resulting in the highest-quality renderings at both input and novel views. For the real data, our method has less highlight burn-in than other methods, leading to the least error in the novel view renderings.
Anisotropic materials
We validate our method on anisotropic materials, where the roughness is encoded in red/green channel, following OpenSVBRDF. The renderings of both input and novel views can closely match the ground truth.
Other Results
Citation
Acknowledgements
The website template was borrowed from BakedSDF.