Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations

Xunzhi Zheng     Dan Xu
Department of Computer Science and Engineering, HKUST
CVPR 2025

Given an unposed image sequence, our Flow-NeRF model can simultaneously infer novel-view image, novel-view depth, and long-range novel-view flow. In this figure, t+8 and t-8 denote novel-view forward and backward flow, respectively, with a frame interval of 8.


Abstract

Learning accurate scene reconstruction without pose priors in neural radiance fields is challenging due to inherent geometric ambiguity. Recent development either relies on correspondence priors for regularization or uses off-the-shelf flow estimators to derive analytical poses. However, the potential for jointly learning scene geometry, camera poses, and dense flow within a unified neural representation remains largely unexplored. In this paper, we present Flow-NeRF, a unified framework that simultaneously optimizes scene geometry, camera poses, and dense optical flow all on-the-fly. To enable the learning of dense flow within the neural radiance field, we design and build a bijective mapping for flow estimation, conditioned on pose. To make the scene reconstruction benefit from the flow estimation, we develop an effective feature enhancement mechanism to pass canonical space features to world space representations, significantly enhancing scene geometry. We validate our model across four important tasks, i.e., novel view synthesis, depth estimation, camera pose prediction, and dense optical flow estimation, using several datasets. Our approach surpasses previous methods in almost all metrics for novel-view view synthesis and depth estimation and yields both qualitatively sound and quantitatively accurate novel-view flow.

Method


Our method takes a sequence of images as input and jointly learns camera poses, scene geometry, and dense optical flow with a unified neural representation framework. We propose a shared points sampling mechanism to ensure the feature consistency between the geometry and flow branches. We build a bijective mapping to query per-pixel motion given sampled points as input, conditioned on pose. Leveraging the complementary nature of features between the world space and the 3D canonical volume, we enhance the feature representation of the geometry branch by message passing. We also develop effective loss functions to simultaneously learn flow and scene reconstruction, while imposing constraints on relative poses.

Novel-view synthesis

The proposed flow-enhanced NeRF can generate more photo-realistic novel view images.


Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours

Depth estimation

Our model can produce smoother depth map with lesser artifact.


Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours
Nope-NeRF
Ours

Long-range flow prediction

Our model achieves holistic scene modelling by simultaneously inferring 2D appearance and dense novel-veiw flow.


RGB
t+8 flow
RGB
t+8 flow
RGB
t+8 flow
RGB
t+8 flow
RGB
t+8 flow
RGB
t+8 flow
RGB
t+8 flow
RGB
t+8 flow

Video demonstration on the novel-view image and the corresponding t+8 novel-view flow predictions.

BibTeX

@inproceedings{zheng2025flownerf,
        author    = {Xunzhi Zheng and Dan Xu},
        title     = {Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations},
        journal   = {CVPR},
        year      = {2025},
      }