EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (2024)

(eccv) Package eccv Warning: Package ‘hyperref’ is loaded with option ‘pagebackref’, which is *not* recommended for camera-ready version

11institutetext: 11email: sgirish@cs.umd.edu22institutetext: 22email: kamalgupta308@gmail.com33institutetext: 33email: abhinav@cs.umd.edu
University of Maryland, College Park

Sharath Girish11sgirish@cs.umd.edu  Kamal Gupta22kamalgupta308@gmail.com  Abhinav Shrivastava3
University of Maryland, College Park3abhinav@cs.umd.edu11sgirish@cs.umd.edu22kamalgupta308@gmail.com3
University of Maryland, College Park3abhinav@cs.umd.edu

Abstract

Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce per-point memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach develops a pruning stage which results in scene representations with fewer Gaussians, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce storage memory by more than an order of magnitude all while preserving the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20×\times× less memory and faster training/inference speed. Project page and code is available here.

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (1)

1 Introduction

Neural Radiance Fields[29] (NeRF) have become widespread in their use as 3D scene representations achieving high visual quality by training implicit neural networks via differentiable volume rendering. They however come at the cost of high training and rendering costs. While more recent works such as Plenoxels[14] or Multiresolution Hashgrids[30] have significantly reduced the training times, they are still slow to render for high resolution scenes and do not reach the same visual quality as NeRF methods such as[3, 4]. To overcome these issues, 3D Gaussian splatting[22] (3D-GS) proposed an approach to learn 3D gaussian point clouds as scene representations. Unlike the slow volume rendering of NeRFs, they utilize a fast differentiable rasterizer, to project the points on the 2D plane for rendering views. Theyachieve state-of-the-art (SOTA) reconstruction quality while still obtaining similar training times as the efficient NeRF variants. Through their fast tile-based rasterizer, they also achieve real-time rendering speeds at 1080p scene resolutions, significantly faster than NeRF approaches.

While 3D-GS has several advantages over NeRFs for novel view synthesis, they come at the cost of high memory usage. Each high resolution scene is represented with several millions of Gaussians in order to achieve high quality view reconstructions. Each point consists of several attributes such as position, color, rotation, opacity and scaling. This leads to representations of each scene requiring high amounts of memory for storage (>1absent1{>}1> 1GB). The GPU runtime memory requirements during training and rendering is also much higher compared to standard NeRF methods, requiring almost 20GB of GPU RAM for several high-resolution scenes. They are thus not very practical for graphic systems with strong memory-constraints of storage or runtime memory or in low-bandwidth applications.

Our approach aims to decrease both storage and runtime memory costs while enhancing both training and rendering speeds, and maintaining view synthesis quality on par with the SOTA, 3D-GS. The color attribute, represented by spherical harmonic (SH) coefficients, and the rotation attribute, represented by covariance matrices, utilize more than 80% of the memory cost of all attributes. Our approach significantly reduces the memory usage of each Gaussian by compressing the color and rotation attributes via a latent quantization framework. We also quantize the opacity coefficients of the Gaussians improving the optimization and leading to fewer floaters or visual artifacts in novel view reconstructions. Additionally, we propose a coarse-to-fine training strategy which improves the training stability and convergence speed while also obtaining better reconstructions. Finally, to reduce the number of redundant Gaussians resulting from frequent densification (via cloning and splitting), we utilize a pruning stage identifying Gaussians with the least influence in the full reconstruction. This further reduces the memory cost of the scene representation while improving the rendering and training speed due to faster rasterization. To summarize, our contributions are as follows:

  • We propose a simple yet powerful approach for compressing 3D Gaussian point clouds by quantizing per-point attributes leading to lower storage memory.

  • We further improve the optimization of the Gaussians by quantizing the opacity coefficients and utilizing a progressive training strategy while controlling the number of Gaussians with a pruning stage.

  • We provide ablations of the different components of our approach to show their effectiveness in producing efficient 3D Gaussian representations. We evaluate our approach on a variety of datasets achieving comparable quality as 3D-GS while being faster and more efficient.

2 Related Work

Neural fields or Implicit Neural Representations (INRs) have recently become a dominant representation for not just 3D objects[29, 30], but also audio[27, 38], images[11, 38, 39], and videos[6, 28]. Consequently, there is a big focus on improving the speed and efficiency of this line of methods. Since neural fields essentially use a neural network to represent a physical field, a number of works have been inspired by and have borrowed from the neural network compression techniques that we discuss first.

Compression for neural networks. Since the explosion of neural networks and their proliferation in the industry and applications, neural network compression and efficiency has gained a lot of attention. A typical compression scheme used for neural networks is quantization or discretization of the parameters to a smaller, finite precision and using entropy coding or other lossless compression methods to further store the parameters. While some approaches directly train binary or finite precision networks[9, 25, 33, 10], others attempt to quantize the network using non-uniform scalar quantization[15, 45, 2, 31], or vector quantization[7, 8, 18]. Advantage of former techniques is typically cheaper setup cost and training time, however they can often result in sub-optimal network performance at the inference time. Another line of work attempt to prune the networks either during the training[24, 34, 19] or in a post-hoc optimization step[12, 13, 35, 16] which may require retraining the entire network. While pruning can be often a good compression strategy, these method may require a lot more training to reach a competitive performance as an unpruned network.

Compression for neural fields. Several neural field compression approaches[37, 42, 39] propose a meta learning approach that learns a network on auxiliary datasets which can provide a good initialization for the downstream network. While our method can benefit from meta-learning as well, we restrict our current approach to compressing a single scene for brevity. VQAD[40] propose a vector quantization for a hierarchical feature grids used in NGLOD[41]. Their method is able to achieve higher compression as compared to other feature-grid methods such as Instant NGP[30] however its training can be memory intensive and it struggles to achieve the same quality of reconstructions as compared to some other NeRF variants such as MipNeRF. [26] propose a similar compression approach using voxel pruning and codebook quantization. Scalar quantization approaches[17, 5] reparameterize the network weights with integers and apply further entropy regularization to compress the scene even further. While these approachesrequire lower training memory as compared to [41], they are sensitive to hyperparameters and the reconstruction efficacy of these approaches remain lower as compared to MipNeRF360 or Gaussian Splatting.

In this work, we show for the first time, that it is possible to compress 3D Gaussian point cloud representations which can retain high reconstruction quality with much smaller memory and higher FPS for inference.

3 Background

3D Gaussian splatting consists of a Gaussian point cloud representation in 3D space. Each Gaussian consists of various attributes such as the position (for mean), scaling and rotation coefficients (for covariance), opacity and color. These Gaussians represent a 3D scene and are used for rendering images from certain viewpoints by anisotropic volumetric “splatting"[46, 47] of 3D Gaussians onto a 2D plane. This is done by projecting the 3D points to 2D and then using a differentiable tile-based rasterizer for blending together different Gaussians.

3D Gaussians with a mean 3D position vector 𝒙𝒙\boldsymbol{x}bold_italic_x and covariance matrix ΣΣ\Sigmaroman_Σ can be defined as

G(𝒙)=e12𝒙TΣ1𝒙𝐺𝒙superscript𝑒12superscript𝒙𝑇superscriptΣ1𝒙G(\boldsymbol{x})=e^{-\frac{1}{2}\boldsymbol{x}^{T}\Sigma^{-1}\boldsymbol{x}}italic_G ( bold_italic_x ) = italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_x end_POSTSUPERSCRIPT(1)

The 3D covariance matrix is in turn defined using a scale matrix S (represented using a 3D scale vector s𝑠sitalic_s) and rotation matrix R (represented using a 4D rotation vector r𝑟ritalic_r) as

Σ=RSSTRTΣ𝑅𝑆superscript𝑆𝑇superscript𝑅𝑇\Sigma=RSS^{T}R^{T}roman_Σ = italic_R italic_S italic_S start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT(2)

For a camera viewpoint with a projective transform P (world-to-camera matrix) and J as the Jacobian of the affine approximation of the projective transform, the corresponding covariance matrix projection[21] to 2D is written as:

Σ=JPΣPTJTsuperscriptΣ𝐽𝑃Σsuperscript𝑃𝑇superscript𝐽𝑇\Sigma^{\prime}=JP\Sigma P^{T}J^{T}roman_Σ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_J italic_P roman_Σ italic_P start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_J start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT(3)

The color of a pixel C is then computed using 𝒩𝒩\mathcal{N}caligraphic_N Gaussian points overlapping the pixel. The points are sorted based on their depth values and blended as:

C=i𝒩𝒄iαij=1i1(1αj)𝐶subscript𝑖𝒩subscript𝒄𝑖subscript𝛼𝑖superscriptsubscriptproduct𝑗1𝑖11subscript𝛼𝑗C=\sum_{i\in\mathcal{N}}\boldsymbol{c}_{i}\alpha_{i}\prod_{j=1}^{i-1}(1-\alpha%_{j})italic_C = ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_N end_POSTSUBSCRIPT bold_italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT )(4)

where αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is computed by computing the 2D Gaussian at the pixel location multiplied with a scalar opacity value. The color cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of each Gaussian is then computed using spherical harmonic coefficients[36].

The Gaussians are initialized using the sparse point clouds created by Structure from Motion (SfM)[43]. Further optimization of the attributes is then done using Stochastic Gradient Descent as the rendering process is fully differentiable. For each view sampled from the training dataset, the corresponding image is projected and rasterized with the forward process explained above. The reconstruction loss is then computed by combining 1subscript1\mathcal{L}_{1}caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with SSIM loss as

=(1λ)1+λSSIM1𝜆subscript1𝜆subscriptSSIM\mathcal{L}=(1-\lambda)\mathcal{L}_{1}+\lambda\mathcal{L}_{\text{SSIM}}caligraphic_L = ( 1 - italic_λ ) caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_λ caligraphic_L start_POSTSUBSCRIPT SSIM end_POSTSUBSCRIPT(5)

with λ𝜆\lambdaitalic_λ set to 0.2.

Another key step in the optimization of the Gaussians is controlling the number of Gaussians. After a warmup-phase, Gaussians with a low opacity value α𝛼\alphaitalic_α below a threshold are removed every 100 iterations. Additionally, large Gaussians (bigger than the corresponding geometry) are split while small Gaussians are cloned in order to better fit the underlying geometric shape. Only Gaussians with positional gradients above a threshold τthreshsubscript𝜏𝑡𝑟𝑒𝑠\tau_{thresh}italic_τ start_POSTSUBSCRIPT italic_t italic_h italic_r italic_e italic_s italic_h end_POSTSUBSCRIPT after every 100 iterations are split or cloned.

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (2)

4 Method

4.1 Attribute quantization

Each Gaussian point consists of a position vector 𝒑3𝒑superscript3\boldsymbol{p}\in\mathbb{R}^{3}bold_italic_p ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, scaling coefficient 𝒔3𝒔superscript3\boldsymbol{s}\in\mathbb{R}^{3}bold_italic_s ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, rotation quarternion vector 𝒓4𝒓superscript4\boldsymbol{r}\in\mathbb{R}^{4}bold_italic_r ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, opacity scalar o𝑜o\in\mathbb{R}italic_o ∈ blackboard_R and spherical harmonics coefficients 𝒄d𝒄superscript𝑑\boldsymbol{c}\in\mathbb{R}^{d}bold_italic_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, with d=3f2𝑑3superscript𝑓2d=3*f^{2}italic_d = 3 ∗ italic_f start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where f𝑓fitalic_f corresponds to the harmonics degree. Thus, for a degree of 4 (as is used in [22]), the color coefficients make up more than 80%percent8080\%80 % of the dimensions of the full attribute vector. 3D-GS typically requires millions of Gaussians for representing the scene with high quality. However, a set of 1 million Gaussians consume around 236 MB of disk space when storing the full attribute vector with a 32-bit floating point. Thus, to reduce the memory required for storing each attribute vector, we propose to use a set of quantized representations. A visualization of the various components of our approach is provided in Fig.2.

For any given attribute, we maintain a quantized latent vector 𝒒l𝒒superscript𝑙\boldsymbol{q}\in\mathbb{Z}^{l}bold_italic_q ∈ blackboard_Z start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT with dimension l𝑙litalic_l, consisting of integer values. We then use an MLP decoder D:lk:𝐷superscript𝑙superscript𝑘D:\mathbb{Z}^{l}{\rightarrow}\mathbb{R}^{k}italic_D : blackboard_Z start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT to decode the latents and obtain the attributes. As quantized vectors are not differentiable, we maintain continuous approximations 𝒒^^𝒒\widehat{\boldsymbol{q}}over^ start_ARG bold_italic_q end_ARG during training and use the Straight-Through Estimator (STE) which rounds off 𝒒^^𝒒\widehat{\boldsymbol{q}}over^ start_ARG bold_italic_q end_ARG to the nearest integer and directly passes the gradient during backpropagation. We get

𝒂=D(STE(𝒒^))𝒂𝐷𝑆𝑇𝐸^𝒒\boldsymbol{a}=D(STE(\widehat{\boldsymbol{q}}))bold_italic_a = italic_D ( italic_S italic_T italic_E ( over^ start_ARG bold_italic_q end_ARG ) )(6)

The latents are thus trained end-to-end similar to the standard procedure of 3D-GS. Post training, we round 𝒒^^𝒒\widehat{\boldsymbol{q}}over^ start_ARG bold_italic_q end_ARG to the nearest integer and use entropy coding for efficiently storing the latents along with the decoder D𝐷Ditalic_D. While each vector in the attribute set 𝔸={𝒑,𝒔,𝒓,𝒄,o}𝔸𝒑𝒔𝒓𝒄𝑜\mathbb{A}=\{\boldsymbol{p},\boldsymbol{s},\boldsymbol{r},\boldsymbol{c},o\}blackboard_A = { bold_italic_p , bold_italic_s , bold_italic_r , bold_italic_c , italic_o } can be quantized, we do not encode the base band color SH coefficient, the scaling coefficients and the position vector as they are sensitive to initialization and result in large performance drops when quantized. While it is possible to improve feature compression with additional tools such as complex decoders, learnable probability models[1] or Gumbel annealing[44] and so on, they introduce a large overhead in various metrics such as runtime GPU memory and training speed. We aim to utilize an approach which quantizes per-point attributes with little to no cost to these efficiency metrics while still maintaining the reconstruction quality.

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (3)
EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (4)
EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (5)

Opacity quantization for improved optimization.

While the color and rotation attributes are quantized to reduce the memory footprint, quantizing the opacity coefficients not only reduces memory but improves the optimization process resulting in lesser artifacts in the rendered views. In Fig.3 (left), we visualize the histogram of opacity coefficients of all Gaussian points, with and without quantization. We see that most points converge to 0 or 1 without quantization which is primarily due to large magnitude gradients (top right). While a large negative gradient reduces opacity which can be pruned, a large positive gradient saturates the opacity to 1 leading to artifacts in the rasterization process which is not removed. In contrast, the gradient distribution with quantized opacity coefficients (bottom right) shows fewer outlier gradients and produces a relatively more uniform set of opacities (left). Quantization acts as a soft regularizer as it requires more gradient updates to increase the opacity value from one quantization bin to the next higher bin, thus preventing opacity saturation. In Sec.5.3, we show how opacity quantization has the added benefit of removing artifacts normally present in 3D-GS.

4.2 Progressive training

Standard training of the Gaussians proceeds by computing the loss over the full image resolution. This results in a more complex loss landscape as the Gaussians are forced to fit to fine features of the scene early in the training. As the SfM initialization is only sparse and several attributes are initialized with rough estimates, the optimization can be suboptimal and result in floating artifacts from Gaussians which cannot be removed during the optimization. We thus propose a coarse-to-fine training strategy by initially rendering at a small scene resolution and gradually increasing the size of the rendered image views over a period of the training iterations until reaching the full resolution. By starting with small images, the Gaussian points easily converge to a good loss minima. This produces better initializations for the creation of further Gaussians through the densification process of cloning and splitting. As the render resolution increases, more Gaussians can be fit to better reconstruct the finer features of the scene. Such a progressive training procedure also helps remove artifacts typically obtained from the rasterization of ill-optimized Gaussians as we show in Sec.5.3. This serves as a soft regularization scheme for the creation and deletion of Gaussians. Another added benefit of progressive training is that fewer Gaussians are required to represent coarser scenes while also rendering fewer pixel locations, thereby leading to faster rendering and backpropagation during training. This directly leads to lower training times while still improving the reconstruction quality of the scene upon convergence.

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (6)

4.3 Influence Pruning

The densification process of cloning and splitting occurs every 100 iterations. This, however, leads to an explosion in the number of Gaussians as a large number of Gaussians exceed the gradient threshold and are either cloned or split. While this can allow for representing finer details in the scene, a significant fraction of the Gaussians are redundant and lead to large training, rendering times as well as memory usage. 3D-GS utilizes an opacity reset stage to remove transparent Gaussians. However, the Gaussian points can have high opacity values while still not influencing the rasterization process due to occlusions (when transmittance T reaches 0 after rendering previous Gaussians in the depth order). The points can also have small scale values influencing very few pixels. To identify the Gaussians most important for reconstructing the full scene, we utilize an influence metric during the rasterization process. More specifically, for the ithsuperscript𝑖thi^{\textnormal{th}}italic_i start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT Gaussian to be rendered at pixel location p𝑝pitalic_p, we define its influence on the pixel and its influence on the full scene as

Wi,p=αiTi=αij=1i1(1αj),Wi=pWi,pformulae-sequencesubscript𝑊ipsubscript𝛼𝑖subscript𝑇𝑖subscript𝛼𝑖superscriptsubscriptproduct𝑗1𝑖11subscript𝛼𝑗subscript𝑊isubscript𝑝subscript𝑊ipW_{\textnormal{i},\textnormal{p}}=\alpha_{i}T_{i}=\alpha_{i}\prod_{j=1}^{i-1}(%1-\alpha_{j}),\quad W_{\textnormal{i}}=\sum_{p}W_{\textnormal{i},\textnormal{p}}italic_W start_POSTSUBSCRIPT i , p end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_W start_POSTSUBSCRIPT i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT i , p end_POSTSUBSCRIPT(7)

where Tisubscript𝑇𝑖T_{i}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT measures the transmittance upto Gaussian i𝑖iitalic_i. We thus obtain a weight vector 𝐖𝐖\mathbf{W}bold_W, with each element representing the importance of the corresponding Gaussian for rendering in the full scene. Gaussians with small scale values or low opacity values influence fewer pixels and have lower weight values when summed across all pixel locations. Additionally, Gaussians which do not influence the rasterization process (Ti=0subscript𝑇𝑖0T_{i}=0italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0) have a weight value of zero. This is further visualized inFig.4, where for a given view render and a set of Gaussians (left), we obtain nearly identical reconstruction quality (center) with fewer Gaussians. On the right, we visualize the pruned Gaussians which correspond to either highly saturated regions with low transmittance or very small Gaussians with low scale.This metric has no computational overhead as the weight values are calculated directly during the rasterization process inEq.4. We thus obtain weight vectors for each iteration and accumulate the weight values over N iterations (set as a hyperparameter) to account for all training views of the full scene. After computing the weight vector, we identify a percentage of the Gaussians with lowest weights and prune them and continue the training process. We show further ablations in Sec.5.3 to show the effect of the pruning stage in reducing the number of Gaussians while maintaining reconstruction quality. The proposed pruning stage thus removes the Gaussians with the least footprint for scene rendering while the densification process allows for increasing Gaussians to fit to finer scene details.

5 Experiments

5.1 Implementation and evaluation

We implemented our method by building on [22] which uses a PyTorch framework[32] with a CUDA backend for the rasterization operation.

A full list of the hyperparameters (learning rates, architecture, initialization of the latents) is provided in the supplementary material. For the progressive scaling, we start with a scale factor of 0.3 while increasing to 1.0 in a cosine schedule. We provide sensitivity analysis of this scale factor in Sec.5.3. We perform the scaling schedule for 70% of the total iterations after which training continues at the full resolution. We fix the opacity reset interval to be every 2500 iterations and the densification frequency to be 175 iterations and the pruning stage for every 5000 iterations until 25000 iterations. At each stage we remove 15%percent1515\%15 % of the Gaussians although a higher value can lead to even larger reductions at the cost of reconstruction quality. We optimize for 30000 iterations but can be controlled based on the time and memory budget for training. We fix the SH degree to be 3 for the color attribute as higher values result in little performance gain for a large increase in memory cost, even with quantization. We use this configuration of hyperparameters for all of our experiments unless mentioned otherwise.

We provide results on 9 scenes from the Mip-Nerf360 dataset[4], and 2 scenes each from Tanks&Temples[23], Deep Blending[20] for a total of 13 scenes. These datasets correspond to real-world high resolution scenes which can be unbounded and provide a challenging scenario with parts of the scene scarcely seen during training. We follow the methodology of [22, 4] with every 8th view used for evaluation and the rest for training. We evaluate the quality of reconstructions primarily with PSNR, and also with the SSIM and LPIPS metrics. We calculate memory of all quantized and non-quantized parameters of the Gaussians for the storage size. The rendering and training memory measures the peak GPU RAM for the full training/rendering phase. We measure the frame rate or Frames Per Second (FPS) based on the time taken to render from all cameras in the scene dataset. Before measuring FPS, we decode all latent attributes using our decoder which is a one-time amortized cost of loading the parameters. For a fair benchmark, the quantitative results comparison of other works in Tab.1 are provided by using the numbers reported in [22], unless mentioned otherwise. The qualitative results are from our own runs of the respective methods.

5.2 Benchmark comparison

For NeRFs, we compare against the SOTA method MipNerf360[4] and two recent fast NeRF approaches of INGP[30], and Plenoxels[14]. For our primary baseline, 3D-GS, we provide numbers as reported in [22] and also from our own runs. We show results of our approach for 3 variants: a) training for 30K iterations which is until convergence b) a smaller configuration corresponding to more pruning c) training for 21K iterations which is the end of the progressive training schedule. We summarize the results on all 3 datasets in Tables1 and2.

Dataset (\rightarrow)Mip-NeRF360Tanks&Temples
Method PSNR\uparrow SSIM\uparrow LPIPS\downarrow StorageMem \downarrow FPS\uparrow TrainTime \downarrow PSNR\uparrow SSIM\uparrow LPIPS\downarrow StorageMem \downarrow FPS\uparrow TrainTime \downarrow
Plenoxels23.080.630.462.1GB725m49s21.080.720.382.3GB1325m5s
INGP25.590.700.3348MB97m30s21.920.750.3148MB146m59s
M-NeRF36027.690.790.249MB0.0648h22.220.760.269MB0.1448h
3D-GS27.210.820.21734MB13441m33s23.610.840.18411MB15426m54s
3D-GS*27.450.810.22745MB11023m20s23.630.850.18430MB15712m5s
EAGLES (Ours)27.230.810.2454MB13121m34s23.370.840.2029MB22711m39s
\cdashline1-13EAGLES-Small26.940.800.2547MB16617m3s23.100.820.2219MB27210m7s
EAGLES-Fast26.990.810.2371MB11116m24s23.020.830.2038MB1908m43s

Dataset (\rightarrow)Deep Blending
Method PSNR\uparrow SSIM\uparrow LPIPS\downarrow StorageMem \downarrow FPS\uparrow TrainTime \downarrow
Plenoxels23.060.800.512.7GB1127m49s
INGP24.960.820.3948MB38m
M-NeRF36029.400.900.258.6MB0.0948h
3D-GS29.410.900.24676MB13736m2s
3D-GS*29.550.900.25656MB12323m5s
EAGLES (Ours)29.860.910.2552MB13021m50s
EAGLES-Small29.920.900.2533MB16017m40s
EAGLES-Fast29.850.910.2563MB10816m30s


EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (7)
EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (8)

We outperform the voxel-grid based method of Plenoxels on all datasets and metrics. Compared to INGP[30], another fast Nerf-based method, our approach at 21K iterations, obtains better quality reconstructions at comparable training times on Deep Blending, Tanks&Temples but higher training times for Mip-NeRF360 dataset. We bridge the gap between NeRF-based methods and Gaussian Splatting in terms of storage memory, by obtaining smaller sizes than INGP (Ours-Small configuration) while still obtaining better reconstruction metrics. We also obtain much higher rendering speeds (>15×{>}15{\times}> 15 ×) compared to INGP on all datasets paving the way for compact 3D representations with high quality reconstructions and real-time rendering. Against the Mip-NeRF360 approach, we perform competitively in terms of PSNR with a 0.45 dB drop on their dataset and 1.15dB,0.56dB gain on Tanks&Temples, Deep Blending respectively. While their model is compact in terms of number of parameters they are extremely slow to train (48similar-toabsent48{\sim}48∼ 48h) and render (<1absent1{<}1< 1FPS). Finally, our reconstructions are on par with 3D-GS achieving minimal performance drops of 0.22dB, 0.26dB PSNR on the Mip-NeRF360, Tanks&Temples datasets respectively while gaining 0.31dB on Deep Blending. We reduce storage size by 15×{\sim}15{\times}∼ 15 × making the representation suitable for devices with limited memory budgets. Additionally, we accelerate training and rendering compared to 3D-GS obtaining higher FPS and lower train times on all scenes. We additionally see that our approach reaches close to convergence with good visual quality at 21K iterations (marking the end of the progressive scaling period). Note that a fair amount of time is spent on training after 21K iterations due to the full scale render resolution.

We show qualitative results of our approach and other baselines on unseen test views from indoor and outdoor scenes in Fig.6. Mip-NeRF360 exhibits blurry artifacts such as the grass in the Stump scene (2nd from left) or even incorrect artifacts as seen in the edges of the leaf in Kitchen (right). We obtain reconstructions with quality on-par with 3D-GS or even better reconstructions close to scene boundaries such as the branches in Bicycle (left), grass in Stump (2nd from left). Notably, 3D-GS tends to exhibit numerous floaters at the edges, especially in areas not frequently observed during training. We provide additional visualizations of this in Fig.5, showcasing notably smoother reconstructions at scene boundaries, such as the Room’s ceiling (3rd from left). This points to a more refined optimization of the point cloud using our approach. We ablate the different components of our approach and analyze the effects of each component in Sec.5.3 below.

 MethodTrain (Tanks & Temples)Playroom (Deep Blending)Bicycle (Mip-NeRF360)
PSNR StorageMem Num.GaussiansFPSPSNR StorageMem Num.GaussiansFPSPSNR StorageMem Num.GaussiansFPS
Vanilla21.94262MB1.11M17730.07542MB2.29M14425.131254MB5.31M61
+ Quantization21.6046MB1.03M17930.4882MB1.81M14224.86192MB4.19M65
+ Progressive21.6338MB0.85M19430.3975MB1.67M14025.07190MB4.19M71
+ Densification21.6229MB0.64M20230.4054MB1.20M14625.02142MB3.11M82
+ Pruning21.6521MB0.46M23430.3836MB0.80M16925.04104MB2.26M87

5.3 Ablations

For a deeper understanding of our approach, we provide qualitative visualizations for the Train scene and quantitative results for three scenes from each of the 3 datasets, gradually incorporating each component step by step. Results are summarized in Table3 and Fig.7. "Vanilla" effectively corresponds to the baseline 3D-GS. First, we quantize the color, rotation and opacity attributes for each Gaussian. We get a significant reduction in storage memory with a small drop in PSNR or reconstruction quality. Note that the bulk of the memory post quantization is from the non quantized attributes of scale, position, base color. The quantized attributes are compressed from 220 MB, 452 MB, 1046 MB to 6 MB, 12 MB, 28 MB for the 3 scenes respectively achieving 2030×{\sim}20{-}30{\times}∼ 20 - 30 × memory reduction. We visualize the effect of color and rotation quantization for a single unseen view from the "Train" scene in Fig.7. Notice the floaters/rendering artifacts at the top left of the scene as it has little overlap with training views for the vanilla configuration. Quantizing color and rotation does not directly remove these artifacts but opacity quantization significantly improves the visual quality of the rendering as erroneous Gaussians do not saturate quickly.

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (9)
EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (10)EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (11)
  (a)  (b)

We then include progressive scaling increasing the rendering resolution in a cosine schedule. We achieve gains in PSNR with fewer floating artifacts due to a more stable optimization while significantly reducing training time as we show in Tab.4. Progressive scaling also provides a better optimization of the loss landscape removing any remaining foggy artifacts as seen in Fig.7. Finally, increasing the densification interval to 175 leads to fewer Gaussians without loss in reconstruction quality (penultimate row). Beyond this value, we observe a sharp drop off in reconstruction quality. However, thepruning stage continues to decrease the number of Gaussians resulting in lower storage memory, training time, and higher FPS without sacrificing on reconstruction quality in terms of PSNR. This is depicted in the views at Figs.4 and7 where the reconstruction quality is similar to without pruning although it reintroduces minor artifacts.

To further analyze the strength of progressive training, we vary the resize scale and visualize the PSNR-model size tradeoff and the convergence speed as well in Fig.8(a),(b). We run experiments on the Truck scene and average over 3 random seeds with error intervals reported. From (a), we see that decreasing the scale upto 0.3 has no effect on PSNR but reduces the number of Gaussians needed to represent that scene. Beyond this value, dropoffs in PSNR is observed for lower storage memory. In (b), we analyze the convergence speed in terms of the iteration time over the course of training for various scaling values. As expected, we consistently obtain lower iteration time for lower scale values even with no loss in PSNR as seen in (a).

5.4 Progressive scaling variants

As explained previously, progressive scaling of the scene while training provides stable optimizations. We now analyze the effect of applying different types of filters to the image as part of the coarse-to-fine training procedure. Results are summarized in Table4. We try different types of strategies such as a) the mean filter which corresponds to downsampling and re-upsampling the image with bilinear interpolation b) a gaussian filter c) the standard downsampling procedure as used for our experiments and d) with no type of progressive training. For downsampling and mean filtering, we start with a scale of 0.3 and end at 1.0 which corresponds to resizing the image to 30% of its dimensions and scaling upto its original size gradually for a period of 70% of the iterations. For Gaussian filtering, we progressively decrease the filter size from the initial value specified in the table down to 1×1111\times 11 × 1, which essentially equates to no filtering. Compared to the no filter case, all other types of filters result in fewer Gaussians leading to lower memory, training time and higher FPS. Both Gaussian and Mean filters provide large gains in terms of efficiency metrics with little to no drops in PSNR. The Gaussian filter naturally provides a coarse-to-fine schedule for training Gaussian points. Nonetheless, the training still proceeds at full resolution and the largest gains in terms of training time is produced with downsampling. The 5×5555\times 55 × 5 Gaussian filter produces similar results as downsampling albeit with higher training times while we observe a larger Gaussian filter 15×15151515\times 1515 × 15 leads to much higher efficiency at the cost of PSNR.

Filter TypePSNR StorageMem Num.GaussiansFPS TrainingTime
None23.34dB43MB0.95M21113m27s
Mean23.31dB27MB0.61M28011m41s
Gaussian (5×5)55(5{\times}5)( 5 × 5 )23.41dB34MB0.74M24812m8s
Gaussian (7×7)77(7{\times}7)( 7 × 7 )23.36dB28MB0.61M27611m32s
Gaussian (15×15)1515(15{\times}15)( 15 × 15 )23.17dB21MB0.46M32110m51s
Downsample23.41dB34MB0.75M2449m49s

Method BicycleTruckPlayroom
TrainRenderTrainRenderTrainRender
3D-GS17.4G9.5G8.5G4.8G9.6G6.0G
EAGLES10G7.4G5.3G3.6G7.1G5.3G

5.5 Training and Rendering Memory

In this section, we show the memory consumption of our approach and 3D-GS on the 3 datasets in Table5. We measure peak GPU memory used during the training or rendering phase by our approach and 3D-GS. We see that we require much lesser memory during training even with latents and decoders. Since our quantization decodes the latents to floating point values before a forward or backward pass, no gains are obtained in terms of runtime memory consumption for each Gaussian. However, with progressive training and the pruning stage, we obtain significantly lower number of Gaussians leading to lower runtime memory during training/rendering. For the Bicycle scene especially, compared to the 17G required by 3D-G, we consume only 10G GPU RAM during training making it practical for many consumer GPUs with 12G RAM.

6 Conclusion

In this work, we proposed a simple yet powerful approach for 3D reconstruction and novel view synthesis. We build upon the seminal work on 3D Gaussian Splatting[22], and propose major improvements that not only reduces the storage requirements for each scene by 10-20×\times×, but also achieves it with lower training cost, faster inference time, and on par reconstruction quality. We achieve this by 3 major improvements over the prior work - attribute quantization for per-point compression, progressive training for faster training and better reconstruction, and a pruning stage for reducing number of points for the scene representation. Our extensive quantitative and qualitative analyses shows the efficacy of our approach in 3D representation.

Acknowledgements: This work was partially supported by IARPA via Department of Interior/Interior Business Center (DOI/IBC) contract number 140D0423C0076. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The authors acknowledge UMD’s supercomputing resources made available for conducting this research. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government. We also thank Jon Barron for providing additional scenes from the Mip-NeRF360 dataset for our experiments.

References

  • [1]Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018)
  • [2]Banner, R., Nahshan, Y., Hoffer, E., Soudry, D.: Post-training 4-bit quantization of convolution networks for rapid-deployment. arXiv preprint arXiv:1810.05723 (2018)
  • [3]Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5855–5864 (2021)
  • [4]Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5470–5479 (2022)
  • [5]Bird, T., Ballé, J., Singh, S., Chou, P.A.: 3d scene compression through entropy penalized neural representation functions. In: 2021 Picture Coding Symposium (PCS). pp.1–5. IEEE (2021)
  • [6]Chen, H., He, B., Wang, H., Ren, Y., Lim, S.N., Shrivastava, A.: Nerv: Neural representations for videos. Advances in Neural Information Processing Systems 34, 21557–21568 (2021)
  • [7]Chen, W., Wilson, J., Tyree, S., Weinberger, K., Chen, Y.: Compressing neural networks with the hashing trick. In: International conference on machine learning. pp. 2285–2294. PMLR (2015)
  • [8]Chen, W., Wilson, J., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing convolutional neural networks in the frequency domain. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1475–1484 (2016)
  • [9]Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: Training deep neural networks with binary weights during propagations. In: Advances in neural information processing systems. pp. 3123–3131 (2015)
  • [10]Dettmers, T., Lewis, M., Belkada, Y., Zettlemoyer, L.: Llm. int8 (): 8-bit matrix multiplication for transformers at scale. arXiv preprint arXiv:2208.07339 (2022)
  • [11]Dupont, E., Goliński, A., Alizadeh, M., Teh, Y.W., Doucet, A.: Coin: Compression with implicit neural representations. arXiv preprint arXiv:2103.03123 (2021)
  • [12]Frankle, J., Carbin, M.: The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018)
  • [13]Frankle, J., Dziugaite, G.K., Roy, D.M., Carbin, M.: Pruning neural networks at initialization: Why are we missing the mark? arXiv preprint arXiv:2009.08576 (2020)
  • [14]Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: Radiance fields without neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5501–5510 (2022)
  • [15]Girish, S., Gupta, K., Singh, S., Shrivastava, A.: Lilnetx: Lightweight networks with extreme model compression and structured sparsification. arXiv preprint arXiv:2204.02965 (2022)
  • [16]Girish, S., Maiya, S.R., Gupta, K., Chen, H., Davis, L.S., Shrivastava, A.: The lottery ticket hypothesis for object recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 762–771 (2021)
  • [17]Girish, S., Shrivastava, A., Gupta, K.: Shacira: Scalable hash-grid compression for implicit neural representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 17513–17524 (2023)
  • [18]Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
  • [19]Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626 (2015)
  • [20]Hedman, P., Philip, J., Price, T., Frahm, J.M., Drettakis, G., Brostow, G.: Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (ToG) 37(6), 1–15 (2018)
  • [21]Hoaglin, D.C., Welsch, R.E.: The hat matrix in regression and anova. The American Statistician 32(1), 17–22 (1978)
  • [22]Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG) 42(4), 1–14 (2023)
  • [23]Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36(4), 1–13 (2017)
  • [24]LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in neural information processing systems. pp. 598–605 (1990)
  • [25]Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
  • [26]Li, L., Shen, Z., Wang, Z., Shen, L., Bo, L.: Compressing volumetric radiance fields to 1 mb. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4222–4231 (2023)
  • [27]Luo, A., Du, Y., Tarr, M., Tenenbaum, J., Torralba, A., Gan, C.: Learning neural acoustic fields. Advances in Neural Information Processing Systems 35, 3165–3177 (2022)
  • [28]Maiya, S.R., Girish, S., Ehrlich, M., Wang, H., Lee, K.S., Poirson, P., Wu, P., Wang, C., Shrivastava, A.: Nirvana: Neural implicit representations of videos with adaptive networks and autoregressive patch-wise modeling. arXiv preprint arXiv:2212.14593 (2022)
  • [29]Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
  • [30]Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41(4), 1–15 (2022)
  • [31]Oktay, D., Ballé, J., Singh, S., Shrivastava, A.: Scalable model compression by entropy penalized reparameterization. arXiv preprint arXiv:1906.06624 (2019)
  • [32]Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., etal.: Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)
  • [33]Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. In: European conference on computer vision. pp. 525–542. Springer (2016)
  • [34]Reed, R.: Pruning algorithms-a survey. IEEE transactions on Neural Networks 4(5), 740–747 (1993)
  • [35]Savarese, P., Silva, H., Maire, M.: Winning the lottery with continuous sparsification. Advances in Neural Information Processing Systems 33, 11380–11390 (2020)
  • [36]Seeley, R.T.: Spherical harmonics. The American Mathematical Monthly 73(4P2), 115–121 (1966)
  • [37]Sitzmann, V., Chan, E., Tucker, R., Snavely, N., Wetzstein, G.: Metasdf: Meta-learning signed distance functions. Advances in Neural Information Processing Systems 33, 10136–10147 (2020)
  • [38]Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. Advances in neural information processing systems 33, 7462–7473 (2020)
  • [39]Strümpler, Y., Postels, J., Yang, R., Gool, L.V., Tombari, F.: Implicit neural representations for image compression. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI. pp. 74–91. Springer (2022)
  • [40]Takikawa, T., Evans, A., Tremblay, J., Müller, T., McGuire, M., Jacobson, A., Fidler, S.: Variable bitrate neural fields. In: ACM SIGGRAPH 2022 Conference Proceedings. pp.1–9 (2022)
  • [41]Takikawa, T., Litalien, J., Yin, K., Kreis, K., Loop, C., Nowrouzezahrai, D., Jacobson, A., McGuire, M., Fidler, S.: Neural geometric level of detail: Real-time rendering with implicit 3d shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11358–11367 (2021)
  • [42]Tancik, M., Mildenhall, B., Wang, T., Schmidt, D., Srinivasan, P.P., Barron, J.T., Ng, R.: Learned initializations for optimizing coordinate-based neural representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2846–2855 (2021)
  • [43]Ullman, S.: The interpretation of structure from motion. Proceedings of the Royal Society of London. Series B. Biological Sciences 203(1153), 405–426 (1979)
  • [44]Yang, Y., Bamler, R., Mandt, S.: Improving inference for neural image compression. Advances in Neural Information Processing Systems 33, 573–584 (2020)
  • [45]Zhang, D., Yang, J., Ye, D., Hua, G.: Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In: Proceedings of the European conference on computer vision (ECCV). pp. 365–382 (2018)
  • [46]Zwicker, M., Pfister, H., VanBaar, J., Gross, M.: Ewa volume splatting. In: Proceedings Visualization, 2001. VIS’01. pp. 29–538. IEEE (2001)
  • [47]Zwicker, M., Pfister, H., VanBaar, J., Gross, M.: Surface splatting. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques. pp. 371–378 (2001)

Supplementary - EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS

Sharath Girish Kamal Gupta Abhinav Shrivastava

7 Hyperparameters

We compress the color, rotation and opacity attributes of each Gaussian as explained in the main paper. Each attribute consists of several hyperparameters; mainly latent dimension, decoder parameter learning rate, latent learning rate, decoder initialization. The decoder parameters are initialized using a normal distribution with a standard deviation. As the uncompressed attributes 𝒂𝒂\boldsymbol{a}bold_italic_a are initialized using SfM for 3D-GS[22], we obtain the latent initialization (with continuous approximations 𝒒^^𝒒\widehat{\boldsymbol{q}}over^ start_ARG bold_italic_q end_ARG) by inverting the decoder D𝐷Ditalic_D.

𝒒^=D1(𝒂)^𝒒superscript𝐷1𝒂\widehat{\boldsymbol{q}}=D^{-1}(\boldsymbol{a})over^ start_ARG bold_italic_q end_ARG = italic_D start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_a )(8)

For a decoder which is only a linear layer, a least square approximation provides the latent values. The learning rate of the latents is obtained by scaling the original attribute learning rate with a scale factor and divided by the norm of the decoder (for a linear layer). This improves training stability and convergence when decoder norm is either too high or too low. Values used for all the compressible attributes are provided in Tab.6. We use these values for all of our experiments and find it to be stable across various datasets. All other hyperaparameter values are used as is the default in[22].

Attribute LatentDimension DecoderLR DecoderStd. LatentLR Scale
Color160.00010.00051.0
Rotation80.00010.011.0
Opacity10.00010.51.0

8 Per scene metrics

We provide metrics for each scene across the 3 datasets of Mip-NeRF360, Tanks&Temples, and Deep Blending in Table7, Table8, Table9 respectively.

SceneMethodPSNRSSIMLPIPS StorageMemFPS TrainTime Num.Gaussians
BicycleOurs25.040.750.24104MB8724m 53s2.26M
3D-GS25.130.750.241254MB6128m 44s5.31M
BonsaiOurs31.320.940.1929MB17717m 9s0.64M
3D-GS32.190.950.18295MB18718m 11s1.25M
CounterOurs28.400.900.2025MB13819m 55s0.56M
3D-GS29.110.910.18276MB13921m 29s1.17M
FlowersOurs21.290.580.3760MB14418m 48s1.33M
3D-GS21.370.590.36818MB10522m 14s3.47M
GardenOurs26.910.840.1574MB11923m 7s1.65M
3D-GS27.320.860.121343MB6528m 57s5.69M
KitchenOurs30.770.930.1345MB11625m 5s1.00M
3D-GS31.530.930.12417MB10924m 57s1.77M
RoomOurs31.470.920.2030MB12321m 38s0.67M
3D-GS31.590.920.20353MB13121m 37s1.50M
StumpOurs26.780.770.24100MB12820m 2s2.22M
3D-GS26.730.770.241042MB9722m 8s4.42M
TreehillOurs22.690.640.3472MB12921m 49s1.60M
3D-GS22.610.640.35807MB10221m 46s3.42M
AverageOurs27.230.810.2454MB13121m 34s1.33M
3D-GS27.450.810.22745MB11023m 20s3.11M

SceneMethodPSNRSSIMLPIPS StorageMemFPS TrainTime Num.Gaussians
TrainOurs21.650.800.2421MB23411m 27s0.46M
3D-GS21.940.810.20262MB17711m 43s1.11M
TruckOurs25.090.870.1638MB22011m 50s0.83M
3D-GS25.310.880.15599MB13912m 27s2.54M
AverageOurs23.370.840.2029MB22711m 39s0.65M
3D-GS23.630.850.18430MB15712m 5s1.83M

SceneMethodPSNRSSIMLPIPS StorageMemFPS TrainTime Num.Gaussians
DrjohnsonOurs29.350.900.2469MB9225m 47s1.57M
3D-GS28.770.900.25769MB10225m 9s3.26M
PlayroomOurs30.380.910.2536MB16917m 52s0.80M
3D-GS30.070.900.25542MB14421m 0s2.29M
AverageOurs29.860.910.2552MB13021m 50s1.19M
3D-GS29.420.900.25656MB12323m 5s2.78M

EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Amb. Frankie Simonis

Last Updated:

Views: 6028

Rating: 4.6 / 5 (56 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Amb. Frankie Simonis

Birthday: 1998-02-19

Address: 64841 Delmar Isle, North Wiley, OR 74073

Phone: +17844167847676

Job: Forward IT Agent

Hobby: LARPing, Kitesurfing, Sewing, Digital arts, Sand art, Gardening, Dance

Introduction: My name is Amb. Frankie Simonis, I am a hilarious, enchanting, energetic, cooperative, innocent, cute, joyous person who loves writing and wants to share my knowledge and understanding with you.