TY - GEN
T1 - Image Generation from Text and Segmentation
AU - Osugi, Masato
AU - Vargas, Danilo Vasconcellos
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Although Text-to-Image tasks have been developing in recent years, the controlled generation of images that represent the layout of multiple complex objects remains a challenging problem. Specifically, challenges for the GLIDE model of image generation from text are: controlling the number of objects, scale conversion, instability of generation (objects indicated in the text do not appear in the image), and unnatural object structure. In addition, the task of generating images from textual information alone is so flexible that it is difficult to derive the relevance and requires a large number of data for training. In fact, the training of current generative models often uses hundreds of millions of text-image pairs. The large scale of the generative model itself makes the training cost enormous. In this study, we propose and validate a new image generation method that uses segmentation and text as input. That is, image generation from text is assisted by segmentation information. This should solve issues that GLIDE had difficulty with, such as controlling the number of objects, scale conversion, unstable generation, and unnatural object structure. Furthermore, the goal is to reduce the number of data needed for training by making it easier to find the relationship between textual information and image generation. As a result of the verification, our model achieved FID-10k score of 17.12 after training with only about 120,000 training data, and it was confirmed that it is capable of handling complex layouts and maintaining natural object structure even with a large number of objects.
AB - Although Text-to-Image tasks have been developing in recent years, the controlled generation of images that represent the layout of multiple complex objects remains a challenging problem. Specifically, challenges for the GLIDE model of image generation from text are: controlling the number of objects, scale conversion, instability of generation (objects indicated in the text do not appear in the image), and unnatural object structure. In addition, the task of generating images from textual information alone is so flexible that it is difficult to derive the relevance and requires a large number of data for training. In fact, the training of current generative models often uses hundreds of millions of text-image pairs. The large scale of the generative model itself makes the training cost enormous. In this study, we propose and validate a new image generation method that uses segmentation and text as input. That is, image generation from text is assisted by segmentation information. This should solve issues that GLIDE had difficulty with, such as controlling the number of objects, scale conversion, unstable generation, and unnatural object structure. Furthermore, the goal is to reduce the number of data needed for training by making it easier to find the relationship between textual information and image generation. As a result of the verification, our model achieved FID-10k score of 17.12 after training with only about 120,000 training data, and it was confirmed that it is capable of handling complex layouts and maintaining natural object structure even with a large number of objects.
UR - http://www.scopus.com/inward/record.url?scp=85158884449&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85158884449&partnerID=8YFLogxK
U2 - 10.1109/CANDARW57323.2022.00041
DO - 10.1109/CANDARW57323.2022.00041
M3 - Conference contribution
AN - SCOPUS:85158884449
T3 - Proceedings - 2022 10th International Symposium on Computing and Networking Workshops, CANDARW 2022
SP - 206
EP - 211
BT - Proceedings - 2022 10th International Symposium on Computing and Networking Workshops, CANDARW 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th International Symposium on Computing and Networking Workshops, CANDARW 2022
Y2 - 21 November 2022 through 22 November 2022
ER -