TY - GEN
T1 - Toward Three-Stage Automation of Annotation for Human Values
AU - Ishita, Emi
AU - Fukuda, Satoshi
AU - Oga, Toru
AU - Oard, Douglas W.
AU - Fleischmann, Kenneth R.
AU - Tomiura, Yoichi
AU - Cheng, An Shou
N1 - Funding Information:
This work has been supported in part by JSPS KAKENHI Grant Number JP18H03495.
Funding Information:
This work has been supported in part by JSPS KAKENHI Grant Number
PY - 2019
Y1 - 2019
N2 - Prior work on automated annotation of human values has sought to train text classification techniques to label text spans with labels that reflect specific human values such as freedom, justice, or safety. This confounds three tasks: (1) selecting the documents to be labeled, (2) selecting the text spans that express or reflect human values, and (3) assigning labels to those spans. This paper proposes a three-stage model in which separate systems can be optimally trained for each of the three stages. Experiments from the first stage, document selection, indicate that annotation diversity trumps annotation quality, suggesting that when multiple annotators are available, the traditional practice of adjudicating conflicting annotations of the same documents is not as cost effective as an alternative in which each annotator labels different documents. Preliminary results for the second stage, selecting value sentences, indicate that high recall (94%) can be achieved on that task with levels of precision (above 80%) that seem suitable for use as part of a multi-stage annotation pipeline. The annotations created for these experiments are being made freely available, and the content that was annotated is available from commercial sources at modest cost.
AB - Prior work on automated annotation of human values has sought to train text classification techniques to label text spans with labels that reflect specific human values such as freedom, justice, or safety. This confounds three tasks: (1) selecting the documents to be labeled, (2) selecting the text spans that express or reflect human values, and (3) assigning labels to those spans. This paper proposes a three-stage model in which separate systems can be optimally trained for each of the three stages. Experiments from the first stage, document selection, indicate that annotation diversity trumps annotation quality, suggesting that when multiple annotators are available, the traditional practice of adjudicating conflicting annotations of the same documents is not as cost effective as an alternative in which each annotator labels different documents. Preliminary results for the second stage, selecting value sentences, indicate that high recall (94%) can be achieved on that task with levels of precision (above 80%) that seem suitable for use as part of a multi-stage annotation pipeline. The annotations created for these experiments are being made freely available, and the content that was annotated is available from commercial sources at modest cost.
UR - http://www.scopus.com/inward/record.url?scp=85064044280&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064044280&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-15742-5_18
DO - 10.1007/978-3-030-15742-5_18
M3 - Conference contribution
AN - SCOPUS:85064044280
SN - 9783030157418
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 188
EP - 199
BT - Information in Contemporary Society - 14th International Conference, iConference 2019, Proceedings
A2 - Taylor, Natalie Greene
A2 - Christian-Lamb, Caitlin
A2 - Nardi, Bonnie
A2 - Martin, Michelle H.
PB - Springer Verlag
T2 - 14th International Conference on Information in Contemporary Society, iConference 2019
Y2 - 31 March 2019 through 3 April 2019
ER -