From 08d9e594bbaf46dc7bb71bff0c54bd176e230789 Mon Sep 17 00:00:00 2001 From: Stephan Porada Date: Tue, 2 Apr 2019 15:43:41 +0200 Subject: [PATCH] Add Readme.md --- README.md | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/README.md b/README.md index e69de29..699e5ba 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,54 @@ +# Installation + +## Install additional packages +1. Install `screen`. We will use this to execute commands in their own terminal session. + +## Build your own image + +1. Clone this repository and navigate into it. +2. Build the image from the dockerfile. `docker build -t : .` For example: `docker build -t ocr_container:latest .` + +Alternatively build directly from git. +1. Use the following command to build directly from gitLab. `docker build -t : https://gitlab.ub.uni-bielefeld.de/sfb1288inf/ocr.git`. + +## Folder setup + +1. Create input and output folders for the OCR files. +2. `mkdir -p /some/path//ocr/files_for_ocr /some/path//ocr/files_from_ocr` + +## Run the container + +1. Run container from an image. and /some/path are the same as mentioned in the step folder setup. We are creating two volumes based on the folder paths provided in the section Folder setup. +``` +docker run \ + --name \ + -dit \ + -v /some/path//files_for_ocr:/root/files_for_ocr \ + -v /some/path//files_from_ocr:/root/files_from_ocr \ + +``` + +## Start an OCR job +1. Place some files inside the folder _files\_for\_ocr_. Files can either be multipage tiffs or PDF files. One folder per file is needed. Files should all be of the same language. +2. Start a screen session with `screen -dmS ` +3. Enter the screen session with `screen -r `. (Try this if there is an error. `script -q -c "screen -r " /dev/null`). +4. Start the OCR process for all files placed in _files\_for\_ocr_ with `docker exec -it ocr -i files_for_ocr -o files_from_ocr -l `. + +Valid language codes are: +- deu (German) +- deu_frak (German Fraktur) +- eng (English) +- enm (Middle englisch) +- fra (French) +- frm (Middle french) +- por (Portuguese) +- spa (Spanish) + +## Exit an re-enter the current running OCR process +1. You can leave the currently running OCR process by pressing `ctrl + a + d` and thus leaving the screen session. +2. Re-enter the screen session to check the status of the running OCR job with `screen -r `. (Try this if there is an error. `script -q -c "screen -r " /dev/null`). + +## Use prebuilt image + +## Add additional trained data for OCR of additional languages. +TBD