No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views

Qualitative comparison on DTU.

Qualitative comparison on RE10K.

Qualitative comparison on ACID.

Abstract

We introduce SPFSplat, an efficient framework for 3D Gaussian splatting from sparse multi-view images, requiring no ground-truth poses during training or inference. It employs a shared feature extraction backbone, enabling simultaneous prediction of 3D Gaussian primitives and camera poses in a canonical space from unposed inputs within a single feed-forward step. Alongside the rendering loss based on estimated novel-view poses, a reprojection loss is integrated to enforce the learning of pixel-aligned Gaussian primitives for enhanced geometric constraints. This pose-free training paradigm and efficient one-step feed-forward design make SPFSplat well-suited for practical applications. Remarkably, despite the absence of pose supervision, SPFSplat achieves state-of-the-art performance in novel view synthesis even under significant viewpoint changes and limited image overlap. It also surpasses recent methods trained with geometry priors in relative pose estimation.

Methodology

SPFSplat consists of four main components: an encoder, a decoder, a pose head, and Gaussian prediction heads. These specialized heads are integrated into a shared ViT backbone, simultaneously predicting Gaussian centers, additional Gaussian parameters, and camera poses from unposed images in a canonical space, where the first input view serves as the reference. Only the context-only branch (above) is used during inference, while the context-with-target branch (below) is employed exclusively during training to estimate target poses, which are used for rendering loss supervision. Additionally, a reprojection loss enforces alignment between Gaussian centers and their corresponding pixels, using the estimated context poses from both branches. Our method jointly optimizes 3D Gaussians and poses, improving geometric consistency and reconstruction quality.

Quantitative Results

Method	Small			Medium			Large			Average			Time (s)
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	Time (s)
Pose-Required
pixelSplat	20.277	0.719	0.265	23.726	0.811	0.180	27.152	0.880	0.121	23.859	0.808	0.184	0.152
MVSplat	20.371	0.725	0.250	23.808	0.814	0.172	27.466	0.885	0.115	24.012	0.812	0.175	0.059
Supervised Pose-Free
CoPoNeRF	17.393	0.585	0.462	18.813	0.616	0.392	20.464	0.652	0.318	18.938	0.619	0.388	-
Splatt3R	17.789	0.582	0.375	18.828	0.607	0.330	19.243	0.593	0.317	18.688	0.337	0.596	0.042
NoPoSplat^*	22.514	0.784	0.210	24.899	0.839	0.160	27.411	0.883	0.119	25.033	0.838	0.160	0.042
Self-Supervised Pose-Free
SelfSplat	14.828	0.543	0.469	18.857	0.679	0.328	23.338	0.798	0.208	19.152	0.680	0.328	0.101
PF3plat	18.358	0.668	0.298	20.953	0.741	0.231	23.491	0.795	0.179	21.042	0.739	0.233	1.171
SPFSplat	22.897	0.792	0.201	25.334	0.847	0.153	27.947	0.894	0.110	25.484	0.847	0.153	0.044
SPFSplat^*	23.178	0.796	0.200	25.695	0.853	0.151	28.377	0.899	0.111	25.845	0.852	0.152	0.044

Table 1. Performance comparison of NVS on RE10K. The best and second-best results are highlighted. * denotes evaluation-time pose alignment (EPA) strategy.

Only the time for 3D Gaussian reconstruction is reported.

Method	Small			Medium			Large			Average
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
Pose-Required
pixelSplat	22.088	0.655	0.284	25.525	0.777	0.197	28.527	0.854	0.139	25.889	0.780	0.194
MVSplat	21.412	0.640	0.290	25.150	0.772	0.198	28.457	0.854	0.137	25.561	0.775	0.195
Supervised Pose-Free
CoPoNeRF	18.651	0.551	0.485	20.654	0.595	0.418	22.654	0.652	0.343	20.950	0.606	0.406
Splatt3R	17.419	0.501	0.434	18.257	0.514	0.405	18.134	0.508	0.395	18.060	0.510	0.407
NoPoSplat^*	23.087	0.685	0.258	25.624	0.777	0.193	28.043	0.841	0.144	25.961	0.781	0.189
Self-Supervised Pose-Free
SelfSplat	18.301	0.568	0.408	21.375	0.676	0.314	25.219	0.792	0.214	22.089	0.694	0.298
PF3plat	18.112	0.537	0.376	20.732	0.615	0.307	23.607	0.710	0.228	21.206	0.632	0.293
SPFSplat	22.667	0.665	0.262	25.620	0.773	0.192	28.607	0.856	0.136	26.070	0.781	0.186
SPFSplat^*	23.676	0.708	0.243	26.351	0.801	0.182	29.170	0.870	0.131	26.796	0.807	0.176

Table 2. Performance comparison of NVS on the ACID dataset. The best and second best results are highlighted. * denotes evaluation-time pose alignment (EPA) strategy.

Method	ACID			DTU
Method	PSNR↑	SSIM↑	LPIPS↓	PSNR↑	SSIM↑	LPIPS↓
pixelSplat	25.477	0.770	0.207	15.067	0.539	0.341
MVSplat	25.525	0.773	0.199	14.542	0.537	0.324
NoPoSplat^*	25.764	0.776	0.199	17.899	0.629	0.279
SelfSplat	22.204	0.686	0.316	13.249	0.434	0.441
PF3plat	20.726	0.610	0.308	12.972	0.407	0.464
SPFSplat	25.965	0.781	0.190	16.550	0.579	0.270
SPFSplat^*	26.697	0.806	0.181	18.297	0.660	0.255

Table 3. Cross-dataset generalization on the NVS task. All methods are trained on RE10K and evaluated in a zero-shot setting on ACID and DTU.

The best and second best results are highlighted. * denotes evaluation-time pose alignment (EPA) strategy.

Method	RE10K			ACID
Method	5° ↑	10° ↑	20° ↑	5° ↑	10° ↑	20° ↑
SP + SG	0.234	0.406	0.569	0.228	0.363	0.500
DUSt3R	0.336	0.541	0.702	0.118	0.279	0.470
MASt3R	0.281	0.494	0.671	0.138	0.312	0.507
NoPoSplat	0.571	0.728	0.833	0.335	0.497	0.645
SelfSplat	0.207	0.392	0.576	0.205	0.363	0.531
PF3plat	0.187	0.398	0.636	0.203	0.353	0.541
SPFSplat (PnP)	0.613	0.754	0.845	0.355	0.516	0.658
SPFSplat	0.617	0.755	0.845	0.364	0.520	0.662

Table 4.Pose estimation performance on RE10K and ACID datasets. We evaluate on ACID using the model trained only on RE10K for all splat-based methods.

The best and second best results are highlighted.

Qualitative Results

Qualitative comparison on RE10K (top three rows) and ACID (bottom row).

Qualitative comparison on RE10K → ACID (top two rows) and RE10K → DTU (bottom two rows).

Qualitative comparisons of 3D Gaussians and rendered results.

BibTeX

@article{huang2025spfsplat,
      title={No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views},
      author={Huang, Ranran and Mikolajczyk, Krystian},
      journal={arXiv preprint arXiv: 2508.01171},
      year={2025}
    }