FloED: Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion

  • Bohai Gu1,2
  • ,
  • Hao Luo2*
  • ,
  • Song Guo1*
  • ,
  • Peiran Dong1
1 Hong Kong University of Science and Technology
2 Alibaba Group
* Co-Corresponding authors

Abstract

Recently, diffusion-based methods have achieved great improvements in the video inpainting task. However, these methods still face many challenges, such as maintaining temporal consistency and the time-consuming issue. This paper proposes an advanced video inpainting framework using optical Flow-guided Efficient Diffusion, called FloED. Specifically, FloED employs a dual-branch architecture, where a flow branch first restores corrupted flow and a multi-scale flow adapter provides motion guidance to the main inpainting branch. Additionally, a training-free latent interpolation method is proposed to accelerate the multi-step denoising process using flow warping. Further introducing a flow attention cache mechanism, FLoED efficiently reduces the computational cost brought by incorporating optical flow. Comprehensive experiments in both background restoration and object removal tasks demonstrate that FloED outperforms state-of-the-art methods from the perspective of both performance and efficiency.

Method Overview

Our method employs a dual-branch architecture implemented through a two-stage training approach:

  1. we first focus on the upper branch, optimizing the motion layer to adapt specifically to the video inpainting domain.
  2. we use a dedicated flow branch complemented by a multi-scale flow adapter, which provides flow guidance covering upblocks of primary UNet. During the inference phase, we enhance efficiency by integrating the flow attention cache.
  3. We introduce a training-free latent interpolation technique that leverages optical flow to speed up the multi-step denoising process. Complemented by a flow attention cache mechanism, FloED efficiently reduces the additional computational costs introduced by the flow.

Qualitative Results on Object Removal

"Forest with a stream running through it." "Fire burning in a fireplace, with a log burning."

source

source

source

source

source

source

"A series of staircases, 8K."

"A living room with the white tall bookshelf."

"A body of sea with a setting sun.“

“Billowing dust and sandy terrain.”

“A green lake with sparkling surface.”

“A large outdoor area with a dirt track.”

Qualitative Results on Background Restoration

source

source

source

source

source

source

source

“The golden lake surface at sunset.“

“The blue sky filled with huge clouds.“

“Sea waves crashing against the cliffs.“

“Water appers to be flowing with iced rock.“

“Large fire burning on logs in the fireplace.“

"Mist draping the mountains like snow.“

“Beautiful starry sky accompanied by a shooting star“

Qualitative Comparison on Background Restoration

"Water appears to be flowing, the rock is covered in ice."

Qualitative Comparison on Object Removal

"A series of staircases, 8K."

VideoComposer

CoCoCo

Propainter

Ours

BibTeX

 
@article{gu2024advanced,
  title={Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion},
  author={Gu, Bohai and Luo, Hao and Guo, Song and Dong, Peiran},
  journal={arXiv preprint arXiv:2412.00857},
  year={2024}
}