NAS3R

CVPR 2026

Imperial College London

Abstract

We introduce NAS3R, a self-supervised feed-forward framework that jointly learns explicit 3D geometry and camera parameters with no ground-truth annotations and no pretrained priors. During training, NAS3R reconstructs 3D Gaussians from uncalibrated and unposed context views and renders target views using its self-predicted camera parameters, enabling self-supervised training from 2D photometric supervision. To ensure stable convergence, NAS3R integrates reconstruction and camera prediction within a shared transformer backbone regulated by masked attention, and adopts a depth-based Gaussian formulation that facilitates well-conditioned optimization. The framework is compatible with state-of-the-art supervised 3D reconstruction architectures and can incorporate pretrained priors or intrinsic information when available. Extensive experiments show that NAS3R achieves superior results to other self-supervised methods, establishing a scalable and geometry-aware paradigm for 3D reconstruction from unconstrained data.

Method Overview

Overview of NAS3R. Unconstrained images are patchified into visual tokens and concatenated with a learnable camera token for camera prediction. A masked decoder regulates cross-view interactions and prevents target-to-context leakage. Refined context tokens are then processed by the Gaussian head to predict Gaussian parameters, while a depth head estimates depth maps that are lifted into 3D Gaussian centers using the predicted context poses. The predicted target poses are finally used to render novel views, providing photometric supervision for end-to-end training.

NVS Visualizations

@article{huang2026nas3r, title = {From None to All: Self-Supervised 3D Reconstruction via Novel View Synthesis}, author = {Ranran Huang and Weixun Luo and Ye Mao and Krystian Mikolajczyk}, journal = {arXiv preprint arXiv:2603.27455}, year = {2026} }

Self-Supervised 3D Reconstruction via Novel View Synthesis

CVPR 2026

Abstract

Method Overview

NVS Visualizations

3D Reconstruction with Predicted Cameras

BibTeX