Recently, integrating the efficient feed-forward scheme into 3D Gaussian Splatting (3DGS) has been actively explored. However, existing methods mostly focus on sparse view reconstruction of small regions and cannot produce eligible whole scene reconstruction results in terms of either quality or efficiency. In this paper, we propose FreeSplat++, which focuses on extending the generalizable 3DGS to become an alternative approach to large-scale indoor scene reconstruction, which has the potential of significantly accelerating the reconstruction speed and improving the geometric accuracy. We initially propose a Low-cost Cross-View Aggregation framework designed to efficiently process extremely long input sequences, thereby facilitating whole scene reconstruction. Subsequently, we introduce a carefully designed pixel-wise triplet fusion method to incrementally aggregate the overlapping 3D Gaussians from multiple views, in order to adaptively mitigate the redundancy of 3D Gaussians. Furthermore, given the fused 3D gaussians with accumulated weights after the fusion step, we propose a weighted floater removal strategy that can effectively reduce floaters, which serves as an explicit depth fusion approach that is tailored for generalizable 3DGS methods and becomes crucial in whole scene reconstruction. To further enhance performance, we have investigated a depth-regularized per-scene fine-tuning process. This process leverages the dense, multi-view consistent depth maps obtained during the feed-forward prediction phase. The aim is to simultaneously improve rendering quality and maintain geometric accuracy. Empirical evidence suggests that our FreeSplat++ significantly outperforms existing generalizable 3DGS methods especially in whole scene reconstructions, and the per-scene fine-tuned results demonstrate substantial improvements in reconstruction accuracy and a notable reduction in training time relative to conventional per-scene optimized 3DGS approaches.
Framework of FreeSplat++. The high-level design of FreeSplat++ includes: (a) Feed-Forward Gaussians Initialization. (b) Weighted Floater Removal. (c) Depth-Regularized Fine-tuning.