... | @@ -16,3 +16,5 @@ This work presents the corpus DROC (Deutsches ROman Corpus), comprised of 90 car |
... | @@ -16,3 +16,5 @@ This work presents the corpus DROC (Deutsches ROman Corpus), comprised of 90 car |
|
5. The speaker and addressee of each direct speech have been manually marked.
|
|
5. The speaker and addressee of each direct speech have been manually marked.
|
|
To the best of our knowledge there is no comparable corpus available to the academic community in the domain of literary texts, especially for German. DROC comprises about 393.000 annotated tokens with more than 50.000 labelled character references.
|
|
To the best of our knowledge there is no comparable corpus available to the academic community in the domain of literary texts, especially for German. DROC comprises about 393.000 annotated tokens with more than 50.000 labelled character references.
|
|
The paper is structured as follows: first, a brief overview of existing corpora for named entities and coreference resolution is given, followed by the description of the textual sources of the fragments. We continue with a detailed description of our annotation guidelines and the annotation process, including the inter-annotator agreement (IAA). We then explain the two formats in which we release our data and conclude with a brief description of the statistics found in our corpus.
|
|
The paper is structured as follows: first, a brief overview of existing corpora for named entities and coreference resolution is given, followed by the description of the textual sources of the fragments. We continue with a detailed description of our annotation guidelines and the annotation process, including the inter-annotator agreement (IAA). We then explain the two formats in which we release our data and conclude with a brief description of the statistics found in our corpus.
|
|
|
|
|
|
|
|
|