Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
Athen
Athen
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 36
    • Issues 36
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • kallimachos
  • AthenAthen
  • Issues
  • #42

Closed
Open
Opened May 15, 2017 by Markus Krug@mak28maOwner

TEI Compatibility for ATHEN

Usually in digital humanities communities, the standard of encoding meta information is using TEI-XML. The format is extremely complicated in its entirety. This feature should enable ATHEn to read and write TEI documents.

TEI to XMI Conversion

A TEI document comes as a plain.xml. Whenever available, the user may be able to provide a xml schema additionally to the input document.

The process of converting into an .xmi would follow the following model:

  1. If a schema is provided, it is converted into a UIMA Typesystem (which is an xml itself), else, the typesystem is generated by analyzing all different sorts of xml elements and their attributes. Elements will be converted into types and attributes are converted into features.
  2. The resulting typesystem is merged with the existing typesystem of ATHEN
  3. All XML Elements alongside their attributes are removed from the text and stored as UIMA annotations with the correct span (this will fail if the xml document has no text at all.) All annotations will be of a special type TEI-XML-Type There are xml formats (such as the TueBa/DZ xml format) that store the entire text as attributes in the <word> tag
  4. In the first pass through the annotations, all TEI-XML-Type annotations are about to be converted into the according type of the UIMA typesystem, this mapping is logged and is saved by the application in order to guarantee reversibility.
  5. A second pass through the annotations is done and all features (which might be references to other annotations, this is why 2 passes are required) are interpreted and stored

The resulting document is stored alongside the mapping of TEI elements and attributes to UIMA Types and features.

Note to step 5

In this step it should be tried to find links to other annotations by either analyzing the schema or by comparing the attribute value to existing element ids.

Summarized it can be depicted here:

MVC of the editor and the views

XMI to TEI Conversion

The reverse process is a little bit harder:

  1. If a previously created mapping is available we can use this and revert the process (that is as long as a schema is avilable)
  2. If no mapping and no schema is available => ????

Special functionalities

  1. If we happen to have multiple documents with the same text, it should be possible to aggregate the information stored within those documents once after another and save it in a single xmi. This can then afterwards get converted to get an aggregate TEI (if problems regarding the backwards conversiona re solved)

  2. Convenience method to convert annotation in athen into other types. The user needs to be able to create a mapping of types and features that get converted into each other. This would e.g. enable to primitively parse TEI documents and then convert the resulting annotations into existing types (such as the ones used in DKPro or the ones used in ATHEN)

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: kallimachos/Athen#42