mirror of
https://gitlab.ub.uni-bielefeld.de/sfb1288inf/ocr.git
synced 2024-12-27 07:54:18 +00:00
Update README
This commit is contained in:
parent
18b659684a
commit
ca4f218d2a
11
README.md
11
README.md
@ -35,9 +35,7 @@ docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/ocr:latest
|
|||||||
mkdir -p /<mydatalocation>/files_for_ocr /<mydatalocation>/files_from_ocr
|
mkdir -p /<mydatalocation>/files_for_ocr /<mydatalocation>/files_from_ocr
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Place your files inside the `/<mydatalocation>/files_for_ocr` directory. Files can either be
|
2. Place your files inside the `/<mydatalocation>/files_for_ocr` directory. Files can either be PDF (.pdf) or multipage TIFF (.tiff, .tif) files. Files should all contain text of the same language.
|
||||||
multipage TIFF (.tiff, .tif) or PDF (.pdf) files. Files should all contain text
|
|
||||||
of the same language.
|
|
||||||
|
|
||||||
3. Start the OCR process.
|
3. Start the OCR process.
|
||||||
```
|
```
|
||||||
@ -67,9 +65,7 @@ The specified below `sfb1288inf/ocr:latest` are described in the [OCR arguments]
|
|||||||
|
|
||||||
`-l languagecode`
|
`-l languagecode`
|
||||||
* Tells tesseract which language will be used.
|
* Tells tesseract which language will be used.
|
||||||
* options = deu (German), deu_frak (German Fraktur), eng (English),
|
* options = deu (German), deu_frak (German Fraktur), eng (English), enm (Middle englisch), fra (French), frm (Middle french), ita (Italian), por (Portuguese), spa (Spanish)
|
||||||
enm (Middle englisch), fra (French), frm (Middle french), por (Portuguese),
|
|
||||||
spa (Spanish)
|
|
||||||
* required = True
|
* required = True
|
||||||
|
|
||||||
`--keep-intermediates`
|
`--keep-intermediates`
|
||||||
@ -84,8 +80,7 @@ kept.
|
|||||||
* required = False
|
* required = False
|
||||||
|
|
||||||
`--skip-binarisation`
|
`--skip-binarisation`
|
||||||
* Used to skip binarization with ocropus. If skipped, only the tesseract
|
* Used to skip binarization with ocropus. If skipped, only the tesseract binarization is used.
|
||||||
binarization is used.
|
|
||||||
* default = False
|
* default = False
|
||||||
|
|
||||||
Example with all arguments used:
|
Example with all arguments used:
|
||||||
|
Loading…
Reference in New Issue
Block a user