Analysis of Adapter in Attention of Change Detection Vision Transformer

Ryunosuke Hamada, Tsubasa Minematsu, Cheng Tang, Atsushi Shimada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Vision Transformer (ViT) contributes to accurate change detection with robustness to background changes. However, retraining ViT requires a large amount of computation to adapt to unlearned scenes. This study investigates the addition of learnable parameters into change detection ViT to reduce the computational complexity of retraining. We introduce MLP as an adapter as an addition to the attention output and the residual connection of the change detection ViT and apply LoRA method to the change detection ViT. We evaluate the retraining of additional parameter models for various background changes and analyze proper setting of additional parameters to adapt the target scenes. Introducing MLP and LoRA to change detection ViT improves the accuracy for the target scenes without competition between two additional parameter methods.

Original languageEnglish
Title of host publicationComputer Vision – ACCV 2024 Workshops - 17th Asian Conference on Computer Vision, Revised Selected Papers
EditorsMinsu Cho, Ivan Laptev, Du Tran, Angela Yao, Hong-Bin Zha
PublisherSpringer Science and Business Media Deutschland GmbH
Pages36-51
Number of pages16
ISBN (Print)9789819626403
DOIs
Publication statusPublished - 2025
Event17th Asian Conference on Computer Vision, ACCV 2024 - Hanoi, Viet Nam
Duration: Dec 8 2024Dec 12 2024

Publication series

NameLecture Notes in Computer Science
Volume15482 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th Asian Conference on Computer Vision, ACCV 2024
Country/TerritoryViet Nam
CityHanoi
Period12/8/2412/12/24

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Analysis of Adapter in Attention of Change Detection Vision Transformer'. Together they form a unique fingerprint.

Cite this