Applicant | Prof. dr. Frank Wijnen |
Developer | Mees van Stiphout Sheean Spoel |
Links | Download AuChAnn here via Pypi AuCHAnn library: TBA |
AuChAnn – Automatic CHAT Annotation tool – is a python library that can read a Dutch transcript and interpretation pair and generate a fitting CHAT annotation.
The analysis of spontaneous language transcripts is an important instrument in research into language development and language disorders. However, analyzing transcripts is a long and tedious process, and the longer a researcher has to analyze, the less accurate the analyses become. Thus, in order to help linguists create these analyses, the Digital Humanities Lab in cooperation with prof. dr. Jan Odijk, is developing SASTA (Semi-Automatic Analysis of Spontaneous Language). SASTA is a tool that generates automatic analyses of transcripts in a fraction of the time that human researchers can, improving the efficiency and accuracy of human researchers.
SASTA works most effectively with CHAT-annotated transcripts; CHAT is a transcription framework introduced by the CHILDES and TalkBank community that provides additional metalinguistic data on a transcript, such as whether an utterance contains an error, a token is a filler, or what certain utterances mean in context.
However, many linguists and researchers do not use the CHAT format: they prefer instead to use a literal transcript combined with a separate interpretation (or correction). Therefore, the Digital Humanities Lab developed AuChAnn (Automatic CHAT Annotation tool): a python library that can read a Dutch transcript and interpretation pair and generate a fitting CHAT annotation.
For example:
Transcript:
Hij ging eh loeihard do het start
Interpretation:
Hij ging loeihard door de stad
CHAT Annotation (produced by AuChAnn):
Hij gaatte [: ging] &-eh loeihard do(or) het [: de] [* s:r:gc:art] start [: stad]
AuChAnn consistently outperforms human annotators, and provides annotations in a fraction of the time, making it a useful improvement for SASTA, but also for any other linguists that want to use information-rich CHAT annotations for their research.