Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges

Research output: Contribution to journalArticlepeer-review

Abstract

Generating realistic images is one of the important problems in the field of computer vision. Generating consistent images with a user’s input is called conditional image generation. Due to recent advances in generating high-quality images with Generative Adversarial Networks, many conditional image generation models have been proposed, such as text-to-image, scene-graph-to-image, and layout-to-image models. Among them, scene-graph-to-image models have the advantage of generating an image for a complex situation according to the structure of a scene graph. However, existing scene-graph-to-image models have difficulty in capturing positional relations among three or more objects since a binomial edge in a scene graph can only represent relations between two objects. In this paper, we propose a novel image generation model hsg2im which addresses this shortcoming by generating images from a hyper scene graph with trinomial edges. We validate the effectiveness of hsg2im on multiple types of trinomial hyperedges. In addition, we introduce loss functions for the relative position of objects to improve the accuracy of the positions of generated bounding boxes. Experimental validations on COCO-Stuff and Visual Genome datasets show that the proposed model generates more consistent images with user’s inputs than our previous model.

Original languageEnglish
Article number624
JournalSN Computer Science
Volume5
Issue number5
DOIs
Publication statusPublished - Jun 2024

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Computer Science Applications
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Computational Theory and Mathematics
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Image Generation from Hyper Scene Graph with Multiple Types of Trinomial Hyperedges'. Together they form a unique fingerprint.

Cite this