README.md 1.77 KB
Newer Older
Albin Zehe's avatar
Albin Zehe committed
1
2
3
4
5
6
# WueNLP



## Getting started

Albin Zehe's avatar
Albin Zehe committed
7
Install the library using `pip install git+https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/wuenlp.git` or by putting `git+https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/wuenlp.git` into your `requirements.txt` file.
zehe's avatar
zehe committed
8
You can also use the shorthand https://wuenlp.professor-x.de
Albin Zehe's avatar
Albin Zehe committed
9

Albin Zehe's avatar
Albin Zehe committed
10
Simple example script of using the library:  
Albin Zehe's avatar
Albin Zehe committed
11

Albin Zehe's avatar
Albin Zehe committed
12
You can load xmi-files using the WueNLP typesystem. As an example, you can download the DROC-corpus in this format here: https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release/-/tree/master/droc/DROC-wuenlp
Albin Zehe's avatar
Albin Zehe committed
13

Albin Zehe's avatar
Albin Zehe committed
14
As an example, we're going to use this file: https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release/-/raw/master/droc/DROC-wuenlp/Ahlefeld,-Charlotte-von__Erna.xmi.xmi.xmi
Albin Zehe's avatar
Albin Zehe committed
15

Albin Zehe's avatar
Albin Zehe committed
16
17
18
```python
from wuenlp.impl.UIMANLPStructs import UIMADocument
from pathlib import Path
Albin Zehe's avatar
Albin Zehe committed
19

Albin Zehe's avatar
Albin Zehe committed
20
doc = UIMADocument.from_xmi(Path("data/Ahlefeld,-Charlotte-von__Erna.xmi.xmi.xmi"))
Albin Zehe's avatar
Albin Zehe committed
21

Albin Zehe's avatar
Albin Zehe committed
22
23
24
# access all speeches in the file:
for speech in doc.speeches:
    print(speech.begin, speech.end, speech.text)
Albin Zehe's avatar
Albin Zehe committed
25

Albin Zehe's avatar
Albin Zehe committed
26
# access all character mentions in the file:
Albin Zehe's avatar
Albin Zehe committed
27
for mention in doc.character_references:
Albin Zehe's avatar
Albin Zehe committed
28
    print(mention.begin, mention.end, mention.text)
Albin Zehe's avatar
Albin Zehe committed
29

Albin Zehe's avatar
Albin Zehe committed
30
# add a new character mention starting at character offset 1 and ending at character offset 5 and add it to the document
Albin Zehe's avatar
Albin Zehe committed
31
doc.create_character_reference(begin=1, end=5, reference_type="Core", add_to_document=True)
Albin Zehe's avatar
Albin Zehe committed
32

Albin Zehe's avatar
Albin Zehe committed
33
34
35
# save the modified document
doc.serialize(Path("/tmp/new_doc.xmi"))
```
Albin Zehe's avatar
Albin Zehe committed
36

Albin Zehe's avatar
Albin Zehe committed
37
You can find an overview of all existing Types and their attributes/features here: https://wuenlp-docs.professor-x.de/wuenlp/impl/UIMANLPStructs.html
zehe's avatar
zehe committed
38
39

The typesystem can be extended to add custom types, as for example with the modules in https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/wuenlp-extensions.git.