CTC2: End-to-End Drum Transcription Based on Connectionist Temporal Classification With Constant Tempo Constraint

Daichi Kamakura, Eita Nakamura, Kazuyoshi Yoshii

研究成果: 書籍/レポート タイプへの寄稿会議への寄与

1 被引用数 (Scopus)

抄録

This paper describes end-to-end automatic drum transcription for directly estimating a drum score from an audio signal of popular music using non-aligned paired data. We aim to convert a sequence of frame-level acoustic features into a sequence of tatum-level score fragments (three-dimensional multi-hot vectors) representing the presence or absence of the onsets of the bass and snare drums and the hi-hats. The main challenge of this task lies in estimating the correct number of inactive tatums having no onset between active tatums. One may use the connectionist temporal classification (CTC) for end-to-end training of a deep neural network (DNN) that infers a frame-level state sequence (alignment path) including the special “blank” states representing the tatum boundaries. At run-time, a drum score is obtained by annexing repeated states and removing all blank states from the most likely frame-level state sequence. This approach, however, tends to yield a shortened drum score in which repeated inactive tatums are annexed mistakenly because the blank state (tatum boundary) cannot be distinguished acoustically from the inactive state (onset absence) at the frame level. In this paper, we propose a sophisticated version of the CTC with constant tempo constraint, CTC2 in short, that encourages each tatum to be aligned with almost the same number of frames. Although the loss function can be computed efficiently as in the basic CTC, the backpropagation over the huge computation graph made through the forward algorithm is computationally prohibitive. To solve this problem, we propose to perform the backpropagation with only an alignment path stochastically drawn with Gibbs sampling. The experiment showed that the proposed method worked well as expected.

本文言語英語
ホスト出版物のタイトル2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
出版社Institute of Electrical and Electronics Engineers Inc.
ページ158-164
ページ数7
ISBN(電子版)9798350300673
DOI
出版ステータス出版済み - 2023
外部発表はい
イベント2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 - Taipei, 台湾
継続期間: 10月 31 202311月 3 2023

出版物シリーズ

名前2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

会議

会議2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
国/地域台湾
CityTaipei
Period10/31/2311/3/23

!!!All Science Journal Classification (ASJC) codes

  • ハードウェアとアーキテクチャ
  • 信号処理
  • 人工知能
  • コンピュータ サイエンスの応用

フィンガープリント

「CTC2: End-to-End Drum Transcription Based on Connectionist Temporal Classification With Constant Tempo Constraint」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル