Patrick Jentsch
|
e1b78b6ba4
|
Update to Tesseract 5.0.0, Set version 0.1.0
|
2022-01-04 11:42:55 +01:00 |
|
Patrick Jentsch
|
a0760487ae
|
Don't process files in subdirectories
|
2021-04-12 13:22:28 +02:00 |
|
Patrick Jentsch
|
a798457c43
|
Add mising --log-dir argument to wrapper script
|
2021-04-12 09:53:59 +02:00 |
|
Patrick Jentsch
|
e2da0fb839
|
Tweak the README and pipeline help.
|
2021-03-26 10:03:59 +01:00 |
|
Patrick Jentsch
|
e78f667438
|
Use more descriptive argument names then i and o (now: input and output)
|
2021-03-18 10:32:55 +01:00 |
|
Patrick Jentsch
|
41f70da8eb
|
Update the hocrtotei script
|
2021-03-17 16:58:13 +01:00 |
|
Patrick Jentsch
|
6db7f70446
|
Add back german language models
|
2021-03-17 14:26:24 +01:00 |
|
Patrick Jentsch
|
947658a7d8
|
Change intermediate image name in order to fix issues with building multiple branches/tags at the same time
|
2021-03-15 14:11:23 +01:00 |
|
Patrick Jentsch
|
acbf61be05
|
Cleanup and make use of globbing for input files for binarization and ocr
|
2021-03-15 12:45:05 +01:00 |
|
Patrick Jentsch
|
104598039e
|
Dockerfile codestyle
|
2021-02-24 15:28:04 +01:00 |
|
Patrick Jentsch
|
da29659a9b
|
Add back missing author mention
|
2021-02-24 15:17:42 +01:00 |
|
Patrick Jentsch
|
613bceb4ff
|
Add new models
|
2021-02-23 11:11:50 +01:00 |
|
Patrick Jentsch
|
ca7df6d0ed
|
First work on version 1.0.0
|
2021-02-19 13:04:03 +01:00 |
|
Patrick Jentsch
|
07635dcdfa
|
Use "buster" instead of "10" in FROM
|
2020-10-08 23:17:48 +02:00 |
|
Patrick Jentsch
|
c0069d5453
|
Use new Dockerfile structure
|
2020-10-08 23:09:10 +02:00 |
|
Patrick Jentsch
|
e941f64ee4
|
test new ci config
|
2020-10-07 16:44:38 +02:00 |
|
Stephan Porada
|
cb68d6de2d
|
One thread per page ocr patch
|
2020-10-07 13:46:22 +02:00 |
|
Patrick Jentsch
|
4b84488fe6
|
fix gitlab ci
|
2020-09-23 16:58:07 +02:00 |
|
Patrick Jentsch
|
7d52ad9f68
|
Update
|
2020-09-23 15:52:24 +02:00 |
|
Patrick Jentsch
|
ac4b5c2fd8
|
Add possibility to use an intermediate dir
|
2020-09-22 17:44:32 +02:00 |
|
Patrick Jentsch
|
6d90d43699
|
fix cleanup attempt
|
2020-09-21 15:36:03 +02:00 |
|
Patrick Jentsch
|
4bd0d3bb01
|
Use commit_sha for intermediate image
|
2020-09-21 15:02:04 +02:00 |
|
Patrick Jentsch
|
15061bfaaf
|
add tag to clean stage
|
2020-09-21 15:00:09 +02:00 |
|
Patrick Jentsch
|
7cc8ebd666
|
compile tesseract in container
|
2020-09-21 14:46:03 +02:00 |
|
Patrick Jentsch
|
82285a8e6c
|
better multithreading
|
2020-07-02 11:49:35 +02:00 |
|
Patrick Jentsch
|
7322a5bc7c
|
More GhostScript, less dependencies!
|
2020-07-02 11:47:43 +02:00 |
|
Patrick Jentsch
|
2b63ba9e59
|
Remove unused dependencies and use ghostscript for image split
|
2020-07-01 11:03:34 +02:00 |
|
Patrick Jentsch
|
aee9628e5e
|
fix pipeline
|
2020-06-23 15:19:27 +02:00 |
|
Stephan Porada
|
ec5b4eb521
|
Add PDF compression
|
2020-06-16 09:31:34 +02:00 |
|
Stephan Porada
|
b77ca5914f
|
Set relative file paths in hocr
|
2020-06-10 11:48:58 +02:00 |
|
Stephan Porada
|
018939ae55
|
Add PoCo zips part 1
|
2020-06-09 16:58:22 +02:00 |
|
Patrick Jentsch
|
64fe706126
|
Keep uncompressed output files after zip jobs.
|
2020-05-13 09:11:01 +02:00 |
|
Patrick Jentsch
|
a75b32ca1d
|
Bump versions
|
2020-04-06 09:21:52 +02:00 |
|
Patrick Jentsch
|
364e3d626d
|
Fix zip creation
|
2020-04-04 15:37:21 +02:00 |
|
Patrick Jentsch
|
36a86887b0
|
Update OCR Pipeline
|
2020-04-03 17:35:30 +02:00 |
|
stephan
|
eb5ccf4e21
|
Add ocr to filenames
|
2020-02-18 10:16:24 +01:00 |
|
stephan
|
c1f5252633
|
Some cosmetics
|
2020-02-17 14:59:34 +01:00 |
|
stephan
|
880f0efcf9
|
Add zip fielname argument
|
2020-02-17 14:26:50 +01:00 |
|
Patrick Jentsch
|
6c4a642cb7
|
Add a switch for zip functionality
|
2020-02-03 15:00:27 +01:00 |
|
Patrick Jentsch
|
dfc05be7db
|
add zip creation of results
|
2020-01-20 15:04:55 +01:00 |
|
Patrick Jentsch
|
3a4cc16e5b
|
Update
|
2019-11-04 15:14:59 +01:00 |
|
Patrick Jentsch
|
8a4d006687
|
Update .gitlab-ci.yml
|
2019-09-16 15:39:02 +02:00 |
|
Patrick Jentsch
|
3e43c8eab5
|
Update .gitlab-ci.yml
|
2019-09-16 15:33:35 +02:00 |
|
Patrick Jentsch
|
f1d1434e1a
|
Update .gitlab-ci.yml
|
2019-09-16 15:30:11 +02:00 |
|
Patrick Jentsch
|
62a435e8c2
|
Update .gitlab-ci.yml
|
2019-09-16 15:28:33 +02:00 |
|
Patrick Jentsch
|
088cf49b89
|
set charset again!
|
2019-09-12 11:30:52 +02:00 |
|
Patrick Jentsch
|
cebc53da03
|
Codestyle
|
2019-09-11 15:15:00 +02:00 |
|
Patrick Jentsch
|
1fd85d1b44
|
Change CI script.
|
2019-07-31 11:23:41 +02:00 |
|
Patrick Jentsch
|
fa4a798351
|
Use language models from repository. Remove workaround for the legacy German Fraktur model.
|
2019-07-31 11:13:55 +02:00 |
|
Patrick Jentsch
|
1a3d7175fe
|
Remove comments
|
2019-06-11 14:18:46 +02:00 |
|