Skip to content
Snippets Groups Projects

The Lucene View

Athen contains a window called LuceneView where you can index xmi-documents. Afterwards you can search these indexed documents not only for plain text but also for UIMA Annotations and their Features, too (as long as those Features are "primitive", e.g. Strings oder Integers).

Opening the Lucene View

The Lucene View is opened via Utility -> Show Lucene View in the main menu bar.

Drawing

Overview over the Lucene View

The following picture depicts a newly opened LuceneView where no index has been created or loaded yet:

Drawing

The Lucene View contains the following elements:

  1. A label that displays the location of the current index (there is no index selected in the picture above therefore the label displays "-")
  2. A spinner where you can set the number of results that are displayed in the table after a query
  3. A button that opens the wizard for the creation of a new index
  4. A button that allows you to load an existing index
  5. A button that opens a dialog where you can select a directory. The files of this directory are then added to the index. This button is deactivated when there is no current index.
  6. A button that opens a window which shows information about the current index. This button is deactivated as well when there is no index selected.
  7. A text field where you can enter queries. The queries are executed when the ENTER key is pressed.

Below these elements is the table which contains the results of the last query. Each row represents one "lucene document" (during the creation of an index you choose which Annotation a "lucene document" corresponds to).

In the left column of the table the name of the file in which the "lucene document" is located is displayed. The right column contains the text of the "lucene document" with the part that is considered a result to the query highlighted (red text).

A double click onto a "lucene document" opens the file it is contained in in ATHEN's editor.

The Index Creation Wizard

The Index Creation Wizard guides you through the process of creating a new Lucene Index and consists of two pages.

On the first page you can select the Annotation Types and Features that you want to have indexed (at least one Type has to be selected because it is needed on the second page). The Types and Features you find here are those contained in the Application's TypeSystem (Features have to be "primitive", e.g. Strings or Integers, to be shown here). Since the default TypeSystem is quite large you can filter the Types and Features via the text field to quickly find those that you need.

Note: Selecting a Feature automatically selects the Type it belongs to, deselecting a Type deselects all of its Features.

Drawing

On the second page you have to make the following selections:

  1. The Token-Type from the selected Annotation Types of the first page. It is used for Tokenizing the documents (a token is the smallest entity of a document, e.g. a POS-Tag). Parts of the document that are not covered by this Token-Type can not be found by queries later!
  2. The Document-Type from the selected Annotation Types of the first page. Each part of a document that is covered by an Annotation of this Type will be a separate document in the index. You should make sure that the whole document is covered by Annotations of the Type you select here (e.g. Sentences). Parts of the document that are not covered by this Document-Type can not be found by queries later!
  3. A directory as the Document Folder. This is where the documents you want to have indexed are located.
  4. A directory as the Index Folder. This is where the new index will be stored.
Drawing

When everything is selected on the second page you can press Finish. This will create your new index at the location you have specified as the Index Folder and add the documents found in the Document Folder to it.

Information window

This window displays the following information about the current index:

  • The Type that is used as Token-Type
  • The Type that is used as Document-Type
  • The Types and Features that are indexed
Drawing

Queries

As stated at the beginning you can not only search for plain text but also for Annotation Types and Features.

If you want to search for an Annotation Type you can simply use its short name (the part of the name after the last '.'). The parts in the documents that are covered by an Annotation of this Type will be the results and highlighted in the table.

The scheme for a Feature query is slightly different since it has a value. In this case you have to enter the Feature's short name followed by '=' and the value you are looking for: =. If you have the Feature POSTag for example and NN is the value for nouns, then the query "POSTag=NN" will give you all nouns in the indexed documents.

In case you want to construct more complex queries you can look at the QueryParser documentation for example (queries to other fields are not supported).

Examples

The Query POSTag=NN gives you all the nouns in the document. For an index containing an example document the results look like this:

Drawing

If you only want only nouns which follow the word "der" the Query "der POSTag=NN" will return the following results instead:

Drawing