The goal of the shared task is to encourage the developers of NLP applications to adapt their tools and resources to the lemmatization of German Web pages and written German discourse in genres of computer-mediated communication (CMC). Examples for CMC genres are chats, forums, wiki talk pages, tweets, blog comments, social networks, SMS and WhatsApp dialogues.
The shared task is a follow-up to the EmpiriST 2015 shared task, which focused on tokenization and POS-tagging. The current task focuses on the next fundamental step in the NLP pipeline. Lemmatization is crucial for general corpus indexing purposes as well as for many applications in lexicography, text classification, discourse analysis, etc.
Participants will receive pre-tokenized and pre-tagged text files and will have to provide surface-oriented lemmata and/or normalized lemmata. Surface-oriented lemmata are mainly based on the inflectional suffixes of the token and retain, as far as possible, any non-standard orthographical features of the token. For normalized lemmata, on the other hand, obvious spelling errors are corrected and non-standard forms are treated as standard forms.
XD EMOASC XD du PPER du killst VVFIN killen mich PPER mich ! $. ! Soooo PTKIFG soooo herrlich ADJD herrlich xDD EMOASC xDD
XD EMOASC XD du PPER du killst VVFIN killen mich PPER mich ! $. ! Soooo PTKIFG so herrlich ADJD herrlich xDD EMOASC xDD
The shared task will be a pre-conference workshop of the Conference on Natural Language Processing (“Konferenz zur Verarbeitung natürlicher Sprache”, KONVENS) hosted on October 8, 2019 at FAU Erlangen-Nuremberg, see http://2019.konvens.org/.
Participants to the shared task need to register by sending an e-mail with the following information to firstname.lastname@example.org:
All participants and further interested parties are invited to register to our mailing list.
The training data were individually lemmatized by four student annotators according to our lemmatization guidelines. Unclear cases were decided in group meetings with the task organizers.
The shared task is organized by: