diff --git a/README.md b/README.md index 4e65c3b1..2bc1f880 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,27 @@ As a last step texts can be loaded into an information retrieval system to query ## Configuration and startup 1. **Create Docker swarm:** + +The following part is for **users** and not the development team. The development team uses a script which sets up a local development swarm. + The generated computational workload is handled by a [Docker](https://docs.docker.com/) swarm. A swarm is a group of machines that are running Docker and joined into a cluster. It consists out of two different kinds of members, managers and workers. Currently it is not possible to specify a dedicated Docker host, instead Opaque expects the executing system to be a swarm manager of a cluster with at least one dedicated worker machine. The swarm setup process is described best in the [Docker documentation](https://docs.docker.com/engine/swarm/swarm-tutorial/). + +The dev team can use dind_swarm_setup.sh. If the workers cannot join the manager node. Try opening the following ports using the ubuntu firewall ufw: +```bash +sudo ufw allow 2376/tcp \ +&& sudo ufw allow 7946/udp \ +&& sudo ufw allow 7946/tcp \ +&& sudo ufw allow 80/tcp \ +&& sudo ufw allow 2377/tcp \ +&& sudo ufw allow 4789/udp + +sudo ufw reload && sudo ufw enable +sudo systemctl restart docker +``` + 2. **Create a network storage:** +The dind_swarm_setup.sh script handles this step for the dev team aswell. + A shared network space is necessary so that all swarm members have access to all the data. To achieve this a [Samba](https://www.samba.org/) can be used. ``` bash # Example: Create a Samba share via Docker @@ -55,7 +74,7 @@ $ nopaque.env # Fill out the empty variables within this file. ``` bash # Execute the following 3 steps only on first startup $ docker-compose run web flask db upgrade -$ docker-compose run web flask db insert-initial-database-entries +$ docker-compose run web flask insert-initial-database-entries $ docker-compose down $ docker-compose up diff --git a/app/jobs/forms.py b/app/jobs/forms.py index 82e574b6..ddf9a917 100644 --- a/app/jobs/forms.py +++ b/app/jobs/forms.py @@ -25,6 +25,7 @@ class AddNLPJobForm(FlaskForm): choices=[('2.2.0', 'Latest (2.2.0)'), ('2.2.0', '2.2.0')], validators=[DataRequired()]) + check_encoding = BooleanField('Check encoding') def validate_files(form, field): for file in field.data: diff --git a/app/services/views.py b/app/services/views.py index 971a1868..875ff700 100644 --- a/app/services/views.py +++ b/app/services/views.py @@ -8,6 +8,7 @@ from werkzeug.utils import secure_filename from . import services import json import os +from app import logger SERVICES = {'corpus_analysis': {'name': 'Corpus analysis'}, @@ -36,7 +37,12 @@ def service(service): return make_response(add_job_form.errors, 400) service_args = [] if service == 'nlp': + logger.warning(add_job_form.check_encoding) service_args.append('-l {}'.format(add_job_form.language.data)) + logger.warning("Service args: {}".format(service_args)) + if add_job_form.check_encoding.data: + service_args.append('--check-encoding') + logger.warning("Service args: {}".format(service_args)) if service == 'ocr': service_args.append('-l {}'.format(add_job_form.language.data)) if not add_job_form.binarization.data: diff --git a/app/templates/services/nlp.html.j2 b/app/templates/services/nlp.html.j2 index 4c4c0b13..300dbba0 100644 --- a/app/templates/services/nlp.html.j2 +++ b/app/templates/services/nlp.html.j2 @@ -101,6 +101,18 @@ +
+
+ Check Encoding +

If the input files are not created with the nopaque OCR service and you do not know if your tex files are UTF-8 encoded, check this switch. We will try to automatically determine the right encoding for your texts.

+
+ +
+
+