Skip to content
Snippets Groups Projects
ATHEN_ontologyBasedIE.md 11.50 KiB

ATHENs Rule-Based Ontology-Centered IE (Work in progress)

This tutorial walks you through the use of the ontology based Information Extraction, centered around a rule based information extraction algorithm. It is a special purpose tool tailored for information extraction in medical discharge letters. Its primary goal is to create a lightweight medical ontology for a given domain. Its workflow entangles the user in a loop, described below:

  1. The user gets access to a series of documents of a given domain used in the ontology creation process.
  2. The user separates those documents into a training folder and a testing folder.
  3. The user aggregates the documents from the training folder into a big, aggregated document.
  4. The user loads this aggregated document into the ATHEN editor.
  5. For additional ease of use, the user could create an index of her training documents to be able to quickly query the training corpus for additional insight into the documents.
  6. The user uses the semi-automatic component of ATHEN to generate an automatically derived base ontology or creates/loads one.
  7. The active ontology can be applied to the opened aggregated document to get an idea of the extraction within this document.
  8. The user refines this ontology manually until the extraction seems satisfactory for her.
  9. The user applies the ontology to the training data.
  10. The user opens the training data one-by-one to assert a correct extraction.
  11. The user accepts correct extractions or corrects wrong extractions to create gold data of the extraction process.
  12. Finally the user evaluates the extraction on the training and on the test data
  13. Remaining errors will be categorized to get a detailed evaluation with explanation of the error sources.

Most work is done within the steps 7-8, those are executed in a loop until a satisfactory result has been achieved.

Setting up the process in ATHEN

This section goes through all the steps described above and shows how to set up everything in ATHEN. It is assumed that you already tried and succeeded the tutorial ATHEN -First time use and have access to an amount of documents you want to use for your information extraction process.

If you succeeded and have a project ready, start ATHEN and switch to Information Extraction Perspective. For this, just click the shown symbol in the tool bar of ATHEN Drawing

The switch perspective dialog should pop up

Drawing

Select the information extraction perspective and the active views in the application should change. The resulting perspective should look like this: Drawing