Convolutional Recurrent Neural Networks for Better Image Understanding

Alexis Vallet, Hiroyasu Sakamoto

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    3 Citations (Scopus)

    Abstract

    Although deep convolutional neural networks have brought basic computer vision tasks to unprecedented accuracy, the best models still struggle to produce higher level image understanding. Indeed, current models for tasks such as visual question answering, often based on recurrent neural networks, have difficulties surpassing baseline methods. We suspect that this is due in part to spatial information in the image not being properly leveraged. We attempt to solve these difficulties by introducing a recurrent unit able to keep and process spatial information throughout the network. On a simple task, we show that our method is significantly more accurate than alternative baselines which discard spatial information. We also demonstrate that higher resolution input performs better than lower resolution input to a surprising degree, even when the input features are less discriminative. Notably, we show that our approach based on higher resolution input is better able to detect details of the images such as the precise number of objects, and the presence of smaller objects, while being less sensitive to biases in the label distribution of the training set.

    Original languageEnglish
    Title of host publication2016 International Conference on Digital Image Computing
    Subtitle of host publicationTechniques and Applications, DICTA 2016
    EditorsAlan Wee-Chung Liew, Jun Zhou, Yongsheng Gao, Zhiyong Wang, Clinton Fookes, Brian Lovell, Michael Blumenstein
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    ISBN (Electronic)9781509028962
    DOIs
    Publication statusPublished - Dec 22 2016
    Event2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016 - Gold Coast, Australia
    Duration: Nov 30 2016Dec 2 2016

    Publication series

    Name2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016

    Other

    Other2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016
    Country/TerritoryAustralia
    CityGold Coast
    Period11/30/1612/2/16

    All Science Journal Classification (ASJC) codes

    • Computational Theory and Mathematics
    • Software
    • Computer Graphics and Computer-Aided Design
    • Computer Science Applications

    Fingerprint

    Dive into the research topics of 'Convolutional Recurrent Neural Networks for Better Image Understanding'. Together they form a unique fingerprint.

    Cite this