README.md 1.27 KB
Newer Older
norbertf's avatar
norbertf committed
1
2
3
4
5
6
7
8
9
10
11
12
# Paperboat

## Project Description
Paperboat is an application and library that unifies several different projects (Baseline Recognition, Binarization, Calamari OCR) into one accessible package.
In addition, paperboat delivers a flexible framework to integrate new processing steps in a simple way into the existing framework.


## Installation

```
$ git clone "https://gitlab2.informatik.uni-wuerzburg.de/nof36hn/paperboat"
$ cd paperboat
norbertf's avatar
norbertf committed
13
$ python3.8 -m venv venv
norbertf's avatar
norbertf committed
14
15
$ source venv/bin/activate
$ pip install -r requirements.txt
norbertf's avatar
norbertf committed
16
$ pip setup.py install
norbertf's avatar
norbertf committed
17
18
```

norbertf's avatar
norbertf committed
19
20
21
22
23
24
## Running Segmentation + OCR

To run the whole pipeline (Binarization + Segmentation + OCR + PageXML output) run:
```
$ paperboat -o output_directory -- image_file.png
```
norbertf's avatar
norbertf committed
25
26
27
28
29
30
31
32
There are different presets available:
	- default (for historical documents)
	- modern (for modern documents)

Set the preset using the -p switch:
```
$ paperboat -p modern img1.png
```
norbertf's avatar
norbertf committed
33
34
35
36
37
38
39
40
41
42
43
44

It's possible to specify multiple images files to run at once.

Different models for segmentation or OCR can be specified using the cmdline-arguments.

```
--Dcalamari.model new_model.ckpt.json   # set calamari model
--Dsegmentation.model new_model.torch   # set segmentation model
```



norbertf's avatar
norbertf committed
45
46
47
48
49
50
51




## Usage as an Library
You can use Paperboat as a library from python in order to 

norbertf's avatar
norbertf committed
52
## Usage
norbertf's avatar
norbertf committed
53