47 Commits

Author SHA1 Message Date
Patrick Jentsch
7322a5bc7c More GhostScript, less dependencies! 2020-07-02 11:47:43 +02:00
Patrick Jentsch
2b63ba9e59 Remove unused dependencies and use ghostscript for image split 2020-07-01 11:03:34 +02:00
Patrick Jentsch
aee9628e5e fix pipeline 2020-06-23 15:19:27 +02:00
Stephan Porada
ec5b4eb521 Add PDF compression 2020-06-16 09:31:34 +02:00
Stephan Porada
b77ca5914f Set relative file paths in hocr 2020-06-10 11:48:58 +02:00
Stephan Porada
018939ae55 Add PoCo zips part 1 2020-06-09 16:58:22 +02:00
Patrick Jentsch
64fe706126 Keep uncompressed output files after zip jobs. 2020-05-13 09:11:01 +02:00
Patrick Jentsch
364e3d626d Fix zip creation 2020-04-04 15:37:21 +02:00
Patrick Jentsch
36a86887b0 Update OCR Pipeline 2020-04-03 17:35:30 +02:00
stephan
eb5ccf4e21 Add ocr to filenames 2020-02-18 10:16:24 +01:00
stephan
c1f5252633 Some cosmetics 2020-02-17 14:59:34 +01:00
stephan
880f0efcf9 Add zip fielname argument 2020-02-17 14:26:50 +01:00
Patrick Jentsch
6c4a642cb7 Add a switch for zip functionality 2020-02-03 15:00:27 +01:00
Patrick Jentsch
dfc05be7db add zip creation of results 2020-01-20 15:04:55 +01:00
Patrick Jentsch
fa4a798351 Use language models from repository. Remove workaround for the legacy German Fraktur model. 2019-07-31 11:13:55 +02:00
Patrick Jentsch
1a3d7175fe Remove comments 2019-06-11 14:18:46 +02:00
Patrick Jentsch
e1462152fe Codestyle 2019-05-20 11:10:40 +02:00
Patrick Jentsch
93de923b4e Some variable renaming 2019-05-17 12:00:56 +02:00
Patrick Jentsch
46bb0efd14 Add description to hocrtotei 2019-05-16 14:59:22 +02:00
Patrick Jentsch
b81ad4cc67 Use argparse in hocrtotei 2019-05-16 14:21:01 +02:00
Patrick Jentsch
4c0ba270db Update 2019-05-16 00:09:19 +02:00
Patrick Jentsch
03b1054560 Sort all lists before processing 2019-05-15 14:55:36 +02:00
Patrick Jentsch
b9dba80d7f update for better graph 2019-05-15 13:54:08 +02:00
Patrick Jentsch
e5c0d53a03 Add some output messages and code formatting. 2019-05-15 11:56:24 +02:00
Patrick Jentsch
843151e547 Correct order for output files. 2019-05-13 15:03:43 +02:00
Patrick Jentsch
efbf6f24e6 Update 2019-04-25 11:40:27 +02:00
Patrick Jentsch
d25204d6a9 Change tif split handling, sort files before merging 2019-04-24 17:01:49 +02:00
Patrick Jentsch
10b473ae37 Implement the workaround a bit different 2019-04-16 11:38:36 +02:00
Patrick Jentsch
a533ef76c6 Update 2019-04-15 10:40:08 +02:00
Patrick Jentsch
84bcac0fc7 Update 2019-04-15 10:34:28 +02:00
Patrick Jentsch
f3fe886335 Update 2019-04-15 10:33:20 +02:00
Patrick Jentsch
5e43e09beb Update 2019-04-15 10:25:57 +02:00
Patrick Jentsch
5e11fcae01 Rename output directories. 2019-04-15 10:13:08 +02:00
Patrick Jentsch
8e6868194d Fix bug 2019-04-15 10:02:02 +02:00
Patrick Jentsch
fdc53fd16c Use a single core for deu_frak 2019-04-15 09:56:47 +02:00
Patrick Jentsch
d84db585fa Sort files in output. 2019-04-15 09:47:30 +02:00
Patrick Jentsch
eb6327aed3 Rename job. 2019-04-15 09:28:05 +02:00
Patrick Jentsch
0d3efe167e Update 2019-04-14 14:33:40 +02:00
Patrick Jentsch
9f3c71a118 Fehler behoben 2019-04-12 15:36:47 +02:00
Patrick Jentsch
ac9b25271f Add skip binarization 2019-04-12 15:28:24 +02:00
Patrick Jentsch
0a25afbd51 len not length... Thx python 2019-04-11 13:55:35 +02:00
Patrick Jentsch
1e740aec66 Aktualisieren ocr 2019-04-11 13:51:07 +02:00
Patrick Jentsch
d4218fcd7c Aktualisieren ocr 2019-04-11 13:46:24 +02:00
Patrick Jentsch
fd7ad08e1e Update ocr 2019-04-11 13:04:03 +02:00
Patrick Jentsch
3131174676 Fix input file 2019-04-11 11:55:42 +02:00
Patrick Jentsch
a947e36997 Start one ocropus-nlbin job per page instead of one per document 2019-04-11 11:50:09 +02:00
Patrick Jentsch
26757eda03 Some renaming and cleanup. 2019-03-10 20:59:30 +01:00