-- survey data available now --
The survey's results are presented at the Workshop on Active Learning for NLP at NAACL, June 2009:
Katrin Tomanek and Fredrik Olsson. A Web Survey on the Use of Active Learning to support Annotation of Text Data. Proc. Workshop on Active Learning for NLP at NAACL 2009, 2009.
The survey data can is available as PDF files here:
If you are interested in the raw survey data (which is in some kind of XLS format), please send a mail to Katrin Tomanek.
This survey aims at investigating the extent to which Active Learning (AL) is used in the context of Natural Language Processing (NLP), as well as addressing the reasons to why (or why not) AL has been found applicable to a specific task.
Supervised machine learning methods have been successfully applied to many NLP tasks in the last few decades. While these techniques have shown to work well, they require large amounts of labeled training data in order to achieve high performance. Such training material, however, is typically not available for specific domains or problems. Thus, one could say that the annotation of data is becoming a major bottleneck in the development of NLP systems. AL addresses this problem. The main idea is to give the control of which data points need to be annotated to the learning mechanism in order to achieve a high model performance with as little labeled examples as possible.
This survey targets participants who were or are currently involved in the annotation of training material for any kind of NLP task. It is not required that you are familiar with or have used AL. It will take approximately 10 minutes to fill out the survey. Results will be published online here.
The survey is completely for non-commercial, academic purposes. Any user-specific data will be kept in confidence and not passed to a third party. The survey is initiated by Katrin Tomanek (Jena University, Germany) and Fredrik Olsson (SICS, Sweden). If you have any question, please feel free to contact us.
(the survey was open in February 2009)