-
Markus Krug authoredMarkus Krug authored
ATHEN Syntactic Parsing
If you plan to annotate your corpus with syntactic parsing information, ATHEN comes with 2 handy views to support you. Currently there is tool support for constituency parsing and for dependency parsing. This tutorial shows how to use them. Annotating syntactic parsing information is done in the analyzer part (this is the part located below the editor)
Opening a document in the analyzer
To open a document in the analyzer the first step is to open a document that has valid sentence annotations. In this tutorial we start with a blank .txt file and annotate everything that is required.
Step 1: Create a document called syntacticParsing.txt and add the following text to it
The dog saw a bone in the park.
The dog ate the bone.
Import the data in ATHEN. If you do not know how, please refer to the guide ATHEN -Importing data. Once the document is imported we can double click it and confirm that we want the document to be converted to .xmi. The editor should now contain our two sentences and look similar to this
On the right side, the annotation browser shows us that there is no annotation besides a document annotation (which can be used to store meta information such as language) in our document.
Step 2
Select the ConstituencyParseAnalyzer at the bottom of the editor
The Analyzer should appear empty.
Step 3 Create two valid sentence annotations in your document. To achieve this we need to right click in the editor and select the menu entry Configurate Styles. As a result a dialog with all types currently known to ATHEN will pop up. In the search field enter Sentence to filter out all other types. Select the type de.uniwue.kalimachos.coref.type.Sentence and double-click the Visible column. Make sure it displays true and click OK
We can now proceed to select our first sentence The dog saw a bone in the park. and click ENTER. Repeat this for the second sentence The dog ate the bone. You should now have 2 sentences as displayed below.
Step 4 Add the syntactic parsing information. This is done by selecting a sentence in the editor. Let us start with the first sentence. After clicking this sentence, the ConstituencyParseAnalyzer should start to display the sentence.
We can now start to mark the phrases in the sentence. First let us mark The dog as a nominal phrase (NP). For this we select the according snippet in the Analyzer and clicke ENTER A Bracket will appear which says NOT_SET
By double-clicking on the bracket you can now open the feature assigning dialog. The selected bracket should now appear in a blue font.
Write the text NP into the feature slot Phrasetype and Press OK. You should now have your first succesful created phrase. Continue to assign the sentence its correct phrases. The result should look as follows:
If you made a mistake during annotation you can simply select an annotation and delete it by pressing DEL on your keyboard or by using the feature assigning dialog. If you also want to annotate syntactic roles you can do so by filling the slow SyntacticRole in the feature assigning dialog.
Step 5 Click back in the main editor and save it by either clicking on the disk symbol or by pressing CTRL+S The document is now saved. You can now continue and click on our second sentence The dog ate the bone. Again this sentence should appear in the ConstituencyParseAnalyzer. You can annotate it accordingly.
Note: If during your annotation process you need annotation that have exactly the same span you can press ALT Gr +ENTER and ATHEN will create an annotation at the top of the already existing annotation. Deleting works without any special bindings. (This happens alot if you want to annotate POS-Tags as a part of the tree)
The Analyzer always updates its content based on the selected sentence in the editor. Editing annotations in the Analyzer will be reflected immediately in the editor and the complete document.
Step 6 - Annotate Dependencies
If you want to annotate dependencies you need to switch to the DependencyParseAnalyzer. After selecting a sentence in the editor, the text should appear in the Analyzer without any annotations shown.
The first step is to create the tokens in the analyzer. You do this by selecting the according span of the text and pressing Enter. After you marked every token in the sentence with an annotation you should get something like this:
By default every token is assigned to the root node. To change this you can now select a token by pressing into its box. The box should appear in a blue font and all arcs that do not start or end at this token will disappear.
After you have selected a token you can now start to move your mouse over the red dots below the annotations. By clicking the red dots you can assign a token to its new head. Let us select the token The and click below the red dot of dog. The result should look like this:
By clicking ENTER when you got a token selected you can assign the dependency relation. After assigning all tokens, depending on your label set the result might look like this:
NOTE: If you click in the blank space, all arcs are displayed, once you select a single token, only the incoming and outgoing arcs are displayed
NOTE 2: Do not forget to save the document in the editor or otherwise your changes will be lost!
NOTE 3: Assign a token to itself if you want it to be assigned to ROOT!