ocr/README.md

# Installation

## Install additional packages
1. Install `screen`. We will use this to execute commands in their own terminal session.

## Build your own image

1. Clone this repository and navigate into it.
2. Build the image from the dockerfile. `docker build -t <image_name>:<tag> .` For example: `docker build -t ocr_container:latest .`

Alternatively build directly from git.
1. Use the following command to build directly from gitLab. `docker build -t <image_name>:<tag> https://gitlab.ub.uni-bielefeld.de/sfb1288inf/ocr.git`.

## Folder setup

1. Create input and output folders for the OCR files.
2. `mkdir -p /some/path/<container-name>/ocr/files_for_ocr /some/path/<image_name>/ocr/files_from_ocr`

## Run the container

1. Run container from an image. <contianer_name> and /some/path are the same as mentioned in the step folder setup. We are creating two volumes based on the folder paths provided in the section Folder setup.
```
docker run \
  --name <container-name> \
  -dit \
  -v /some/path/<container-name>/files_for_ocr:/root/files_for_ocr \
  -v /some/path/<container-name>/files_from_ocr:/root/files_from_ocr \
  <image_name>
```

## Start an OCR job
1. Place some files inside the folder _files\_for\_ocr_. Files can either be multipage tiffs or PDF files. One folder per file is needed. Files should all be of the same language.
2. Start a screen session with `screen -dmS <container_name>`
3. Enter the screen session with `screen -r <container-name>`. (Try this if there is an error. `script -q -c "screen -r <container-name>" /dev/null`).
4. Start the OCR process for all files placed in _files\_for\_ocr_ with `docker exec -it <container-name> ocr -i files_for_ocr -o files_from_ocr -l <sprachcode>`.

Valid language codes are:
- deu (German)
- deu_frak (German Fraktur)
- eng (English)
- enm (Middle englisch)
- fra (French)
- frm (Middle french)
- por (Portuguese)
- spa (Spanish)

## Exit an re-enter the current running OCR process
1. You can leave the currently running OCR process by pressing `ctrl + a + d` and thus leaving the screen session.
2. Re-enter the screen session to check the status of the running OCR job with `screen -r <container-name>`. (Try this if there is an error. `script -q -c "screen -r <container-name>" /dev/null`).

## Use prebuilt image

## Add additional trained data for OCR of additional languages.
TBD
Add Readme.md 2019-04-02 13:43:41 +00:00			`# Installation`

			`## Install additional packages`
			1. Install `screen`. We will use this to execute commands in their own terminal session.

			`## Build your own image`

			`1. Clone this repository and navigate into it.`
			2. Build the image from the dockerfile. `docker build -t <image_name>:<tag> .` For example: `docker build -t ocr_container:latest .`

			`Alternatively build directly from git.`
			1. Use the following command to build directly from gitLab. `docker build -t <image_name>:<tag> https://gitlab.ub.uni-bielefeld.de/sfb1288inf/ocr.git`.

			`## Folder setup`

			`1. Create input and output folders for the OCR files.`
			2. `mkdir -p /some/path/<container-name>/ocr/files_for_ocr /some/path/<image_name>/ocr/files_from_ocr`

			`## Run the container`

			`1. Run container from an image. <contianer_name> and /some/path are the same as mentioned in the step folder setup. We are creating two volumes based on the folder paths provided in the section Folder setup.`
			```
			`docker run \`
			`--name <container-name> \`
			`-dit \`
			`-v /some/path/<container-name>/files_for_ocr:/root/files_for_ocr \`
			`-v /some/path/<container-name>/files_from_ocr:/root/files_from_ocr \`
			`<image_name>`
			```

			`## Start an OCR job`
			`1. Place some files inside the folder _files\_for\_ocr_. Files can either be multipage tiffs or PDF files. One folder per file is needed. Files should all be of the same language.`
			2. Start a screen session with `screen -dmS <container_name>`
			3. Enter the screen session with `screen -r <container-name>`. (Try this if there is an error. `script -q -c "screen -r <container-name>" /dev/null`).
			4. Start the OCR process for all files placed in _files\_for\_ocr_ with `docker exec -it <container-name> ocr -i files_for_ocr -o files_from_ocr -l <sprachcode>`.

			`Valid language codes are:`
			`- deu (German)`
			`- deu_frak (German Fraktur)`
			`- eng (English)`
			`- enm (Middle englisch)`
			`- fra (French)`
			`- frm (Middle french)`
			`- por (Portuguese)`
			`- spa (Spanish)`

			`## Exit an re-enter the current running OCR process`
			1. You can leave the currently running OCR process by pressing `ctrl + a + d` and thus leaving the screen session.
			2. Re-enter the screen session to check the status of the running OCR job with `screen -r <container-name>`. (Try this if there is an error. `script -q -c "screen -r <container-name>" /dev/null`).

			`## Use prebuilt image`

			`## Add additional trained data for OCR of additional languages.`
			`TBD`