mirror of
https://gitlab.ub.uni-bielefeld.de/sfb1288inf/ocr.git
synced 2024-12-26 17:34:18 +00:00
Update README
This commit is contained in:
parent
18b659684a
commit
ca4f218d2a
11
README.md
11
README.md
@ -35,9 +35,7 @@ docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/ocr:latest
|
||||
mkdir -p /<mydatalocation>/files_for_ocr /<mydatalocation>/files_from_ocr
|
||||
```
|
||||
|
||||
2. Place your files inside the `/<mydatalocation>/files_for_ocr` directory. Files can either be
|
||||
multipage TIFF (.tiff, .tif) or PDF (.pdf) files. Files should all contain text
|
||||
of the same language.
|
||||
2. Place your files inside the `/<mydatalocation>/files_for_ocr` directory. Files can either be PDF (.pdf) or multipage TIFF (.tiff, .tif) files. Files should all contain text of the same language.
|
||||
|
||||
3. Start the OCR process.
|
||||
```
|
||||
@ -67,9 +65,7 @@ The specified below `sfb1288inf/ocr:latest` are described in the [OCR arguments]
|
||||
|
||||
`-l languagecode`
|
||||
* Tells tesseract which language will be used.
|
||||
* options = deu (German), deu_frak (German Fraktur), eng (English),
|
||||
enm (Middle englisch), fra (French), frm (Middle french), por (Portuguese),
|
||||
spa (Spanish)
|
||||
* options = deu (German), deu_frak (German Fraktur), eng (English), enm (Middle englisch), fra (French), frm (Middle french), ita (Italian), por (Portuguese), spa (Spanish)
|
||||
* required = True
|
||||
|
||||
`--keep-intermediates`
|
||||
@ -84,8 +80,7 @@ kept.
|
||||
* required = False
|
||||
|
||||
`--skip-binarisation`
|
||||
* Used to skip binarization with ocropus. If skipped, only the tesseract
|
||||
binarization is used.
|
||||
* Used to skip binarization with ocropus. If skipped, only the tesseract binarization is used.
|
||||
* default = False
|
||||
|
||||
Example with all arguments used:
|
||||
|
Loading…
Reference in New Issue
Block a user