2019-04-17 14:20:09 +02:00
|
|
|
## Web Server
|
2019-07-01 14:18:35 +02:00
|
|
|
- **Solutions**:
|
|
|
|
- Apache (with mod WSGI)
|
|
|
|
- or nginx (with gunicorn I guess)
|
2019-07-01 14:53:33 +02:00
|
|
|
- **Goal/Function**:
|
|
|
|
- Serves content
|
|
|
|
- handels HTTP requests etc.
|
2019-07-01 15:39:37 +02:00
|
|
|
- handels encryption
|
|
|
|
- with SSL/TLS and Let's encrypt
|
2019-07-01 14:53:33 +02:00
|
|
|
- serves forms for user request and inputs
|
|
|
|
- has copy of Joblist to display those for the user according to requests etc.
|
|
|
|
- talks to the Manager service
|
|
|
|
- Users CANNOT talk directly with the manager
|
|
|
|
- has list of all currently running user sessions (maybe used for authentication)
|
2019-04-17 14:20:09 +02:00
|
|
|
|
2019-07-01 14:18:35 +02:00
|
|
|
## Application Server
|
|
|
|
- **Solution**: Flask
|
|
|
|
|
|
|
|
### Authentication and session management
|
2019-07-01 14:53:33 +02:00
|
|
|
- **Solutions**:
|
2019-07-01 14:18:35 +02:00
|
|
|
- Flask-Login (minimal)
|
|
|
|
- Flask-Session (maybe a bit more functions)
|
2019-07-01 14:53:33 +02:00
|
|
|
- **Goal/Function**
|
|
|
|
- handels both internal and external users
|
2019-07-01 14:18:35 +02:00
|
|
|
- Relational Database
|
|
|
|
- **Solutions**_
|
|
|
|
- PostgreSQL
|
|
|
|
- MariaDB
|
|
|
|
- Object Relational Mapper
|
|
|
|
- **Solutions**:
|
|
|
|
- Flask-SQLAlchemy
|
|
|
|
|
|
|
|
### Manager Service:
|
|
|
|
- Part of the Application Server
|
2019-07-01 14:53:33 +02:00
|
|
|
- manages also files on file server
|
2019-07-01 14:18:35 +02:00
|
|
|
- Joblist
|
|
|
|
- **Solution**:
|
|
|
|
- http://www.celeryproject.org/
|
|
|
|
- Thread safe
|
|
|
|
- Scheduling
|
|
|
|
- Ressource management
|
|
|
|
- REST API
|
|
|
|
- **Solution**
|
|
|
|
- Flask internal
|
2019-07-01 15:39:37 +02:00
|
|
|
- and also part of celery
|
2019-07-01 14:53:33 +02:00
|
|
|
- **Goal/Function**
|
|
|
|
- Passes requests to the joblist/celery
|
|
|
|
- Functions:
|
|
|
|
- create_job
|
|
|
|
- delete_job
|
|
|
|
- get_job (JSON Object or metadata or both?)
|
|
|
|
- alter_job
|
2019-07-02 14:00:19 +02:00
|
|
|
- Mail notifications
|
|
|
|
- **Solution**:
|
|
|
|
- Flask-Mail
|
|
|
|
- **Goal/Function**
|
|
|
|
- Sends Mails to users if a OCR job has finished
|
2019-07-01 14:18:35 +02:00
|
|
|
|
2019-07-01 15:39:37 +02:00
|
|
|
## OCR containers with tesseract
|
|
|
|
- **Goal/Function**
|
|
|
|
- celery checks joblists continiously
|
|
|
|
- job start commands will be passed to the containers
|
|
|
|
- jobs will be started accordingly
|
2019-04-17 14:20:09 +02:00
|
|
|
|
|
|
|
## Compute pool: Docker Cluster
|
2019-07-01 14:53:33 +02:00
|
|
|
- **Solutions**:
|
2019-07-01 14:18:35 +02:00
|
|
|
- Kubernetes
|
|
|
|
- swarm
|
2019-07-01 14:53:33 +02:00
|
|
|
- **Goal/Function**
|
|
|
|
- How to handle job and resource management for user Jobs and processes.
|
|
|
|
- gets requests and tasks from the manager
|
2019-04-17 14:20:09 +02:00
|
|
|
|
2019-07-01 14:18:35 +02:00
|
|
|
## File Server (Scans, pdfs etc.)
|
2019-07-01 14:53:33 +02:00
|
|
|
- **Goal/Function**
|
|
|
|
- stores user input and
|
|
|
|
- output files
|
2019-07-01 15:39:37 +02:00
|
|
|
- Upload
|
|
|
|
- Download
|
2019-07-01 14:53:33 +02:00
|
|
|
- **Solutions**:
|
2019-07-01 15:39:37 +02:00
|
|
|
- WebDAV/Samba/Docker Volume
|
2019-07-02 15:21:33 +02:00
|
|
|
|
|
|
|
# Additional Functions
|
|
|
|
|
|
|
|
## Information retrival system
|
|
|
|
- **Solutions**:
|
|
|
|
- CEQUL with CWB Server
|
|
|
|
- Lucene
|
|
|
|
- **Functions/Goals**
|
|
|
|
- KWIC
|
|
|
|
- KWIC with complex querys (POS, NER, Lemma querys)
|
|
|
|
- Frequency lists
|
|
|
|
- n-grams
|
|
|
|
- complex n-grams
|
|
|
|
- etc.
|
|
|
|
|