mirror of
https://gitlab.ub.uni-bielefeld.de/sfb1288inf/ocr.git
synced 2025-01-13 20:10:35 +00:00
Add Readme.md
This commit is contained in:
parent
3e3673c58e
commit
08d9e594bb
54
README.md
54
README.md
@ -0,0 +1,54 @@
|
|||||||
|
# Installation
|
||||||
|
|
||||||
|
## Install additional packages
|
||||||
|
1. Install `screen`. We will use this to execute commands in their own terminal session.
|
||||||
|
|
||||||
|
## Build your own image
|
||||||
|
|
||||||
|
1. Clone this repository and navigate into it.
|
||||||
|
2. Build the image from the dockerfile. `docker build -t <image_name>:<tag> .` For example: `docker build -t ocr_container:latest .`
|
||||||
|
|
||||||
|
Alternatively build directly from git.
|
||||||
|
1. Use the following command to build directly from gitLab. `docker build -t <image_name>:<tag> https://gitlab.ub.uni-bielefeld.de/sfb1288inf/ocr.git`.
|
||||||
|
|
||||||
|
## Folder setup
|
||||||
|
|
||||||
|
1. Create input and output folders for the OCR files.
|
||||||
|
2. `mkdir -p /some/path/<container-name>/ocr/files_for_ocr /some/path/<image_name>/ocr/files_from_ocr`
|
||||||
|
|
||||||
|
## Run the container
|
||||||
|
|
||||||
|
1. Run container from an image. <contianer_name> and /some/path are the same as mentioned in the step folder setup. We are creating two volumes based on the folder paths provided in the section Folder setup.
|
||||||
|
```
|
||||||
|
docker run \
|
||||||
|
--name <container-name> \
|
||||||
|
-dit \
|
||||||
|
-v /some/path/<container-name>/files_for_ocr:/root/files_for_ocr \
|
||||||
|
-v /some/path/<container-name>/files_from_ocr:/root/files_from_ocr \
|
||||||
|
<image_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Start an OCR job
|
||||||
|
1. Place some files inside the folder _files\_for\_ocr_. Files can either be multipage tiffs or PDF files. One folder per file is needed. Files should all be of the same language.
|
||||||
|
2. Start a screen session with `screen -dmS <container_name>`
|
||||||
|
3. Enter the screen session with `screen -r <container-name>`. (Try this if there is an error. `script -q -c "screen -r <container-name>" /dev/null`).
|
||||||
|
4. Start the OCR process for all files placed in _files\_for\_ocr_ with `docker exec -it <container-name> ocr -i files_for_ocr -o files_from_ocr -l <sprachcode>`.
|
||||||
|
|
||||||
|
Valid language codes are:
|
||||||
|
- deu (German)
|
||||||
|
- deu_frak (German Fraktur)
|
||||||
|
- eng (English)
|
||||||
|
- enm (Middle englisch)
|
||||||
|
- fra (French)
|
||||||
|
- frm (Middle french)
|
||||||
|
- por (Portuguese)
|
||||||
|
- spa (Spanish)
|
||||||
|
|
||||||
|
## Exit an re-enter the current running OCR process
|
||||||
|
1. You can leave the currently running OCR process by pressing `ctrl + a + d` and thus leaving the screen session.
|
||||||
|
2. Re-enter the screen session to check the status of the running OCR job with `screen -r <container-name>`. (Try this if there is an error. `script -q -c "screen -r <container-name>" /dev/null`).
|
||||||
|
|
||||||
|
## Use prebuilt image
|
||||||
|
|
||||||
|
## Add additional trained data for OCR of additional languages.
|
||||||
|
TBD
|
Loading…
x
Reference in New Issue
Block a user