Ex4DGS

Fully Explicit Dynamic Gaussian Splatting


NeurIPS 2024


Junoh Lee     Changyeon Won     HyunJun Jung     Inhwan Bae     Hae-Gon Jeon1,2
Gwangju Institute of Science and Technology    

Summary: 4D Gaussian Splatting with static & dynamic separation using an incrementally extensible, keyframe-based model

Abstract

3D Gaussian Splatting has shown fast and high-quality rendering results in static scenes by leveraging dense 3D prior and explicit representations. Unfortunately, the benefits of the prior and representation do not involve novel view synthesis for dynamic motions. Ironically, this is because the main barrier is the reliance on them, which requires increasing training and rendering times to account for dynamic motions. In this paper, we design a \Edited{Explicit 4D Gaussian Splatting(Ex4DGS)}. Our key idea is to firstly separate static and dynamic Gaussians during training, and to explicitly sample positions and rotations of the dynamic Gaussians at sparse timestamps. The sampled positions and rotations are then interpolated to represent both spatially and temporally continuous motions of objects in dynamic scenes as well as reducing computational cost. Additionally, we introduce a progressive training scheme and a point-backtracking technique that improves Ex4DGS's convergence. We initially train Ex4DGS using short timestamps and progressively extend timestamps, which makes it work well with a few point clouds. The point-backtracking is used to quantify the cumulative error of each Gaussian over time, enabling the detection and removal of erroneous Gaussians in dynamic scenes. Comprehensive experiments on various scenes demonstrate the state-of-the-art rendering quality from our method, achieving fast rendering of 62 fps on a single 2080Ti GPU.

Model Pipiline

The proposed method begins by initializing 3D Gaussians as static and models their motion linearly. During the optimization process, static and dynamic objects are automatically separated based on the their magnitude of motion. For dynamic Gaussians, we use a keyframe-based interpolation strategy. the positions and rotations of the 3D Gaussians are interpolated between keyframes to enable efficient modeling over time. The progressive training scheme ensures adaptability to varying durations, while the optimization process incorporates pruning and point backtracking to refine rendering results.

Results

Separation of Static and Dynamic Gaussians

Our model processes static and dynamic objects separately. This separation is performed automatically during the training process, without any prior input, such as a mask. The following video demonstrates the rendering results of the static and dynamic parts individually.

Separation result of static and dynamic Gaussians on the Coffee Martini scene

Separation result of static and dynamic Gaussians on the Fabien scene

Comparisons

The separation of the static and dynamic regions, along with the keyframe-based model, enhances the stability of the training process. The following image demonstrates that our model outperforms others, even when using a sparse point cloud.

BibTeX

@inproceedings{lee2024ex4dgs,
      title     = {Fully Explicit Dynamic Guassian Splatting},
      author    = {Lee, Junoh and Won, ChangYeon and Jung, Hyunjun and Bae, Inhwan and Jeon, Hae-Gon},
      booktitle = {Proceedings of the Neural Information Processing Systems},
      year      = {2024}
}

References