Update README

This commit is contained in:
Patrick Jentsch 2019-05-17 01:07:39 +02:00
parent 18b659684a
commit ca4f218d2a

View File

@ -35,9 +35,7 @@ docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/ocr:latest
mkdir -p /<mydatalocation>/files_for_ocr /<mydatalocation>/files_from_ocr mkdir -p /<mydatalocation>/files_for_ocr /<mydatalocation>/files_from_ocr
``` ```
2. Place your files inside the `/<mydatalocation>/files_for_ocr` directory. Files can either be 2. Place your files inside the `/<mydatalocation>/files_for_ocr` directory. Files can either be PDF (.pdf) or multipage TIFF (.tiff, .tif) files. Files should all contain text of the same language.
multipage TIFF (.tiff, .tif) or PDF (.pdf) files. Files should all contain text
of the same language.
3. Start the OCR process. 3. Start the OCR process.
``` ```
@ -67,9 +65,7 @@ The specified below `sfb1288inf/ocr:latest` are described in the [OCR arguments]
`-l languagecode` `-l languagecode`
* Tells tesseract which language will be used. * Tells tesseract which language will be used.
* options = deu (German), deu_frak (German Fraktur), eng (English), * options = deu (German), deu_frak (German Fraktur), eng (English), enm (Middle englisch), fra (French), frm (Middle french), ita (Italian), por (Portuguese), spa (Spanish)
enm (Middle englisch), fra (French), frm (Middle french), por (Portuguese),
spa (Spanish)
* required = True * required = True
`--keep-intermediates` `--keep-intermediates`
@ -84,8 +80,7 @@ kept.
* required = False * required = False
`--skip-binarisation` `--skip-binarisation`
* Used to skip binarization with ocropus. If skipped, only the tesseract * Used to skip binarization with ocropus. If skipped, only the tesseract binarization is used.
binarization is used.
* default = False * default = False
Example with all arguments used: Example with all arguments used: