Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos

Israr Ul Haq, Tomoharu Iwata, Yoshinobu Kawahara

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


Extracting the underlying dynamics of objects in image sequences is one of the challenging problems in computer vision. Besides, dynamic mode decomposition (DMD) has recently attracted attention as a method for obtaining modal representations of nonlinear dynamics from general multivariate time-series data without explicit prior information about the dynamics. In this paper, we propose a convolutional autoencoder (CAE)-based DMD (CAE-DMD) to perform accurate modeling of underlying dynamics in videos. We develop a modified CAE model that encodes images to latent vectors and incorporated DMD on the latent vectors to extract DMD modes. These modes are split into background and foreground modes for foreground modeling in videos, or used for video classification tasks. And the latent vectors are mapped so as to recover the input image sequences through a decoder. We perform the network training in an end-to-end manner, i.e., by minimizing the mean square error between the original and reconstructed images. As a result, we obtain accurate extraction of underlying dynamic information in the videos. We empirically investigate the performance of CAE-DMD in two applications background foreground extraction and video classification on synthetic and publicly available datasets.

Original languageEnglish
Article number103355
JournalComputer Vision and Image Understanding
Publication statusPublished - Feb 2022

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Dynamic mode decomposition via convolutional autoencoders for dynamics modeling in videos'. Together they form a unique fingerprint.

Cite this