INFER: Learning Implicit Neural Frequency Response Fields for Confined Car Cabin

Abstract

Accurate modeling of spatial acoustics is critical for immersive and intelligible audio in confined, resonant environments such as car cabins. Current tuning methods are manual, hardware-intensive, and static, failing to account for frequency selective behaviors and dynamic changes like passenger presence or seat adjustments. To address this issue, we propose INFER (Implicit Neural Frequency Response fields), a frequency-domain neural framework that is jointly conditioned on source and receiver positions, orientations to directly learn complex-valued frequency response fields inside confined, resonant environments like car cabins.

We introduce three key innovations over current neural acoustic modeling methods: (1) novel end-to-end frequency-domain forward model that directly learns the frequency response field and frequency-specific attenuation in 3D space; (2) perceptual and hardware-aware spectral supervision that emphasizes critical auditory frequency bands and deemphasizes unstable crossover regions; and (3) a physics-based Kramers–Kronig consistency constraint that regularizes frequency-dependent attenuation and delay.

We evaluate our method over real-world data collected in multiple car cabins. Our approach significantly outperforms time- and hybrid-domain baselines on both simulated and real-world automotive datasets, cutting average magnitude and phase reconstruction errors by over 39% and 51%, respectively. INFER sets a new state-of-the-art for neural acoustic modeling in automotive spaces.

Method Overview

INFER method architecture showing frequency-domain forward modeling with TOF-based phase compensation, Kramers-Kronig consistent complex attenuation, and perceptual frequency weighting — INFER's frequency-domain rendering pipeline. The model predicts complex frequency responses and attenuation fields at sampled voxels. TOF-based phase shifts and material-based absorption are accumulated along rays, producing the final frequency response with perceptual and hardware-aware spectral supervision.

Results

Spatial acoustic field reconstruction at multiple frequencies (180Hz to 2880Hz) showing magnitude and phase plots comparing ground truth with INFER predictions — Qualitative results showing spatial acoustic field reconstruction across frequency bands. INFER accurately captures both magnitude and phase patterns from low-frequency standing waves (180 Hz) to complex high-frequency interference patterns (2880 Hz), demonstrating robust performance across the entire spectrum.

Evaluation on Real World Room Scale Datasets

Citation

@inproceedings{takawale2026infer,
  title     = {INFER: Learning Implicit Neural Frequency Response Fields for Confined Car Cabin},
  author    = {Takawale, Harshvardhan and Roy, Nirupam and Brown, Phil},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2026}
}