From ca4f218d2a3d5f601b91b1b6ce50c494d95f9718 Mon Sep 17 00:00:00 2001 From: Patrick Jentsch Date: Fri, 17 May 2019 01:07:39 +0200 Subject: [PATCH] Update README --- README.md | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 5f57819..a2720be 100644 --- a/README.md +++ b/README.md @@ -35,9 +35,7 @@ docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/ocr:latest mkdir -p //files_for_ocr //files_from_ocr ``` -2. Place your files inside the `//files_for_ocr` directory. Files can either be -multipage TIFF (.tiff, .tif) or PDF (.pdf) files. Files should all contain text -of the same language. +2. Place your files inside the `//files_for_ocr` directory. Files can either be PDF (.pdf) or multipage TIFF (.tiff, .tif) files. Files should all contain text of the same language. 3. Start the OCR process. ``` @@ -67,9 +65,7 @@ The specified below `sfb1288inf/ocr:latest` are described in the [OCR arguments] `-l languagecode` * Tells tesseract which language will be used. -* options = deu (German), deu_frak (German Fraktur), eng (English), -enm (Middle englisch), fra (French), frm (Middle french), por (Portuguese), -spa (Spanish) +* options = deu (German), deu_frak (German Fraktur), eng (English), enm (Middle englisch), fra (French), frm (Middle french), ita (Italian), por (Portuguese), spa (Spanish) * required = True `--keep-intermediates` @@ -84,8 +80,7 @@ kept. * required = False `--skip-binarisation` -* Used to skip binarization with ocropus. If skipped, only the tesseract -binarization is used. +* Used to skip binarization with ocropus. If skipped, only the tesseract binarization is used. * default = False Example with all arguments used: