Commit Graph

151 Commits

Author SHA1 Message Date
Patrick Jentsch
e3fd679b38 Mark all scripts as executeable 2022-01-04 13:21:38 +01:00
Patrick Jentsch
8a3816121c fix image tag 2022-01-04 12:10:26 +01:00
Patrick Jentsch
e1b78b6ba4 Update to Tesseract 5.0.0, Set version 0.1.0 2022-01-04 11:42:55 +01:00
Patrick Jentsch
a0760487ae Don't process files in subdirectories 2021-04-12 13:22:28 +02:00
Patrick Jentsch
a798457c43 Add mising --log-dir argument to wrapper script 2021-04-12 09:53:59 +02:00
Patrick Jentsch
e2da0fb839 Tweak the README and pipeline help. 2021-03-26 10:03:59 +01:00
Patrick Jentsch
e78f667438 Use more descriptive argument names then i and o (now: input and output) 2021-03-18 10:32:55 +01:00
Patrick Jentsch
41f70da8eb Update the hocrtotei script 2021-03-17 16:58:13 +01:00
Patrick Jentsch
6db7f70446 Add back german language models 2021-03-17 14:26:24 +01:00
Patrick Jentsch
947658a7d8 Change intermediate image name in order to fix issues with building multiple branches/tags at the same time 2021-03-15 14:11:23 +01:00
Patrick Jentsch
acbf61be05 Cleanup and make use of globbing for input files for binarization and ocr 2021-03-15 12:45:05 +01:00
Patrick Jentsch
104598039e Dockerfile codestyle 2021-02-24 15:28:04 +01:00
Patrick Jentsch
da29659a9b Add back missing author mention 2021-02-24 15:17:42 +01:00
Patrick Jentsch
613bceb4ff Add new models 2021-02-23 11:11:50 +01:00
Patrick Jentsch
ca7df6d0ed First work on version 1.0.0 2021-02-19 13:04:03 +01:00
Patrick Jentsch
07635dcdfa Use "buster" instead of "10" in FROM 2020-10-08 23:17:48 +02:00
Patrick Jentsch
c0069d5453 Use new Dockerfile structure 2020-10-08 23:09:10 +02:00
Patrick Jentsch
e941f64ee4 test new ci config 2020-10-07 16:44:38 +02:00
Stephan Porada
cb68d6de2d One thread per page ocr patch 2020-10-07 13:46:22 +02:00
Patrick Jentsch
4b84488fe6 fix gitlab ci 2020-09-23 16:58:07 +02:00
Patrick Jentsch
7d52ad9f68 Update 2020-09-23 15:52:24 +02:00
Patrick Jentsch
ac4b5c2fd8 Add possibility to use an intermediate dir 2020-09-22 17:44:32 +02:00
Patrick Jentsch
6d90d43699 fix cleanup attempt 2020-09-21 15:36:03 +02:00
Patrick Jentsch
4bd0d3bb01 Use commit_sha for intermediate image 2020-09-21 15:02:04 +02:00
Patrick Jentsch
15061bfaaf add tag to clean stage 2020-09-21 15:00:09 +02:00
Patrick Jentsch
7cc8ebd666 compile tesseract in container 2020-09-21 14:46:03 +02:00
Patrick Jentsch
82285a8e6c better multithreading 2020-07-02 11:49:35 +02:00
Patrick Jentsch
7322a5bc7c More GhostScript, less dependencies! 2020-07-02 11:47:43 +02:00
Patrick Jentsch
2b63ba9e59 Remove unused dependencies and use ghostscript for image split 2020-07-01 11:03:34 +02:00
Patrick Jentsch
aee9628e5e fix pipeline 2020-06-23 15:19:27 +02:00
Stephan Porada
ec5b4eb521 Add PDF compression 2020-06-16 09:31:34 +02:00
Stephan Porada
b77ca5914f Set relative file paths in hocr 2020-06-10 11:48:58 +02:00
Stephan Porada
018939ae55 Add PoCo zips part 1 2020-06-09 16:58:22 +02:00
Patrick Jentsch
64fe706126 Keep uncompressed output files after zip jobs. 2020-05-13 09:11:01 +02:00
Patrick Jentsch
a75b32ca1d Bump versions 2020-04-06 09:21:52 +02:00
Patrick Jentsch
364e3d626d Fix zip creation 2020-04-04 15:37:21 +02:00
Patrick Jentsch
36a86887b0 Update OCR Pipeline 2020-04-03 17:35:30 +02:00
stephan
eb5ccf4e21 Add ocr to filenames 2020-02-18 10:16:24 +01:00
stephan
c1f5252633 Some cosmetics 2020-02-17 14:59:34 +01:00
stephan
880f0efcf9 Add zip fielname argument 2020-02-17 14:26:50 +01:00
Patrick Jentsch
6c4a642cb7 Add a switch for zip functionality 2020-02-03 15:00:27 +01:00
Patrick Jentsch
dfc05be7db add zip creation of results 2020-01-20 15:04:55 +01:00
Patrick Jentsch
3a4cc16e5b Update 2019-11-04 15:14:59 +01:00
Patrick Jentsch
8a4d006687 Update .gitlab-ci.yml 2019-09-16 15:39:02 +02:00
Patrick Jentsch
3e43c8eab5 Update .gitlab-ci.yml 2019-09-16 15:33:35 +02:00
Patrick Jentsch
f1d1434e1a Update .gitlab-ci.yml 2019-09-16 15:30:11 +02:00
Patrick Jentsch
62a435e8c2 Update .gitlab-ci.yml 2019-09-16 15:28:33 +02:00
Patrick Jentsch
088cf49b89 set charset again! 2019-09-12 11:30:52 +02:00
Patrick Jentsch
cebc53da03 Codestyle 2019-09-11 15:15:00 +02:00
Patrick Jentsch
1fd85d1b44 Change CI script. 2019-07-31 11:23:41 +02:00