TY - JOUR
T1 - A multi-label convolutional neural network for automatic image annotation
AU - Vallet, Alexis
AU - Sakamoto, Hiroyasu
N1 - Publisher Copyright:
© 2015 Information Processing Society of Japan.
PY - 2015/11/15
Y1 - 2015/11/15
N2 - Over the past few years, convolutional neural networks (CNN) have set the state of the art in a wide variety of supervised computer vision problems. Most research effort has focused on single-label classification, due to the availability of the large scale ImageNet dataset. Via pre-training on this dataset, CNNs have also shown the ability to outperform traditional methods for multi-label classification. Such methods, however, typically require evaluating many expensive forward passes to produce a multi-label distribution. Furthermore, due to the lack of a large scale multi-label dataset, little effort has been invested into training CNNs from scratch with multi-label data. In this paper, we address both issues by introducing a multi-label cost function adequate for deep CNNs, and a prediction method requiring only a single forward pass to produce multi-label predictions. We show the performance of our method on a newly introduced large scale multi-label dataset of animation images. Here, our method reaches 75.1% precision and 66.5% accuracy, making it suitable for automated annotation in practice. Additionally, we apply our method to the Pascal VOC 2007 dataset of natural images, and show that our prediction method outperforms a comparable model for a fraction of the computational cost.
AB - Over the past few years, convolutional neural networks (CNN) have set the state of the art in a wide variety of supervised computer vision problems. Most research effort has focused on single-label classification, due to the availability of the large scale ImageNet dataset. Via pre-training on this dataset, CNNs have also shown the ability to outperform traditional methods for multi-label classification. Such methods, however, typically require evaluating many expensive forward passes to produce a multi-label distribution. Furthermore, due to the lack of a large scale multi-label dataset, little effort has been invested into training CNNs from scratch with multi-label data. In this paper, we address both issues by introducing a multi-label cost function adequate for deep CNNs, and a prediction method requiring only a single forward pass to produce multi-label predictions. We show the performance of our method on a newly introduced large scale multi-label dataset of animation images. Here, our method reaches 75.1% precision and 66.5% accuracy, making it suitable for automated annotation in practice. Additionally, we apply our method to the Pascal VOC 2007 dataset of natural images, and show that our prediction method outperforms a comparable model for a fraction of the computational cost.
UR - http://www.scopus.com/inward/record.url?scp=84947245532&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84947245532&partnerID=8YFLogxK
U2 - 10.2197/ipsjjip.23.767
DO - 10.2197/ipsjjip.23.767
M3 - Article
AN - SCOPUS:84947245532
SN - 0387-5806
VL - 23
SP - 767
EP - 775
JO - Journal of information processing
JF - Journal of information processing
IS - 6
ER -