Text-Guided Diverse Scene Interaction Synthesis by Disentangling Actions From Scenes

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi Kawasaki, Katsushi Ikeuchi

研究成果: ジャーナルへの寄稿学術誌査読

抄録

Generating human motion within 3D scenes from textual descriptions remains a challenging task because of the scarcity of hybrid datasets encompassing text, 3D scenes, and motion. Existing approaches suffer from fundamental limitations: a lack of datasets that integrate text, 3D scenes, and motion, and a reliance on end-to-end methods, which constrain the diversity and realism of generated human-scene interactions. In this paper, we propose a novel method to generate motions of humans interacting with objects in a 3D scene given a textual prompt. Our key innovation focuses on decomposing the motion generation task into distinct steps: 1) generating key poses from textual and scene contexts and 2) synthesizing full motion trajectories guided by these key poses and path planning. This approach eliminates the need for hybrid datasets by leveraging independent text-motion and pose datasets, significantly expanding action diversity and overcoming the constraints of prior works. Unlike previous methods, which focus on limited action types or rely on scarce datasets, our approach enables scalable and adaptable motion generation. Through extensive experiments, we demonstrate that our framework achieves unparalleled diversity and contextually accurate motions, advancing the state-of-the-art in human-scene interaction synthesis.

本文言語英語
ページ(範囲)73818-73830
ページ数13
ジャーナルIEEE Access
13
DOI
出版ステータス出版済み - 2025

!!!All Science Journal Classification (ASJC) codes

  • コンピュータサイエンス一般
  • 材料科学一般
  • 工学一般

フィンガープリント

「Text-Guided Diverse Scene Interaction Synthesis by Disentangling Actions From Scenes」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル