Update README

2025-08-14 06:13:26 +00:00 · 2019-05-17 01:07:39 +02:00
parent 18b659684a
commit ca4f218d2a
1 changed files with 3 additions and 8 deletions
--- a/README.md
+++ b/README.md
@@ -35,9 +35,7 @@ docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/ocr:latest
 mkdir -p /<mydatalocation>/files_for_ocr /<mydatalocation>/files_from_ocr
 ```

-2. Place your files inside the `/<mydatalocation>/files_for_ocr` directory. Files can either be
-multipage TIFF (.tiff, .tif) or PDF (.pdf) files. Files should all contain text
-of the same language.
+2. Place your files inside the `/<mydatalocation>/files_for_ocr` directory. Files can either be PDF (.pdf) or multipage TIFF (.tiff, .tif) files. Files should all contain text of the same language.

 3. Start the OCR process.
 ```
@@ -67,9 +65,7 @@ The specified below `sfb1288inf/ocr:latest` are described in the [OCR arguments]

 `-l languagecode`
 * Tells tesseract which language will be used.
-* options = deu (German), deu_frak (German Fraktur), eng (English),
-enm (Middle englisch), fra (French), frm (Middle french), por (Portuguese),
-spa (Spanish)
+* options = deu (German), deu_frak (German Fraktur), eng (English), enm (Middle englisch), fra (French), frm (Middle french), ita (Italian), por (Portuguese), spa (Spanish)
 * required = True

 `--keep-intermediates`
@@ -84,8 +80,7 @@ kept.
 * required = False

 `--skip-binarisation`
-* Used to skip binarization with ocropus. If skipped, only the tesseract
-binarization is used.
+* Used to skip binarization with ocropus. If skipped, only the tesseract binarization is used.
 * default = False

 Example with all arguments used: