nopaque/README.md

96 lines
3.8 KiB
Markdown
Raw Normal View History

2019-07-17 13:49:16 +02:00
# Opaque
2019-08-26 14:09:43 +02:00
Opaque is a virtual research environment (VRE) bundling OCR, NLP and additional computer linguistic methods for research purposes in the field of Digital Humanities.
Opaque is designed as a web application which can be easily used by researchers to aid them during their research process.
In particular researchers can use Opaque to start OCR jobs for digitized sources. The text output of these OCR jobs can then be used as an input for tagging processes (POS, NER etc.).
2020-01-07 15:03:21 +01:00
As a last step texts can be loaded into an information retrieval system to query for specific words, phrases in connection with linguistic features.
2019-08-26 14:09:43 +02:00
2019-07-17 13:49:16 +02:00
## Dependencies
2019-08-26 14:09:43 +02:00
- cifs-utils
2020-01-07 15:03:21 +01:00
- Docker
- Docker Compose
2019-08-15 12:03:16 +02:00
2020-01-07 15:03:21 +01:00
## Configuration and startup
2019-08-15 12:03:16 +02:00
2019-08-26 14:09:43 +02:00
1. **Create Docker swarm:**
2020-02-13 14:35:55 +01:00
The following part is for **users** and not the development team. The development team uses a script which sets up a local development swarm.
2020-01-07 15:03:21 +01:00
The generated computational workload is handled by a [Docker](https://docs.docker.com/) swarm. A swarm is a group of machines that are running Docker and joined into a cluster. It consists out of two different kinds of members, managers and workers. Currently it is not possible to specify a dedicated Docker host, instead Opaque expects the executing system to be a swarm manager of a cluster with at least one dedicated worker machine. The swarm setup process is described best in the [Docker documentation](https://docs.docker.com/engine/swarm/swarm-tutorial/).
2020-02-13 14:35:55 +01:00
The dev team can use dind_swarm_setup.sh. If the workers cannot join the manager node. Try opening the following ports using the ubuntu firewall ufw:
```bash
sudo ufw allow 2376/tcp \
&& sudo ufw allow 7946/udp \
&& sudo ufw allow 7946/tcp \
&& sudo ufw allow 80/tcp \
&& sudo ufw allow 2377/tcp \
&& sudo ufw allow 4789/udp
sudo ufw reload && sudo ufw enable
sudo systemctl restart docker
```
2. **Create a network storage:**
2020-02-13 14:35:55 +01:00
The dind_swarm_setup.sh script handles this step for the dev team aswell.
2020-01-07 15:03:21 +01:00
A shared network space is necessary so that all swarm members have access to all the data. To achieve this a [Samba](https://www.samba.org/) can be used.
2019-08-26 14:09:43 +02:00
``` bash
2020-01-07 15:03:21 +01:00
# Example: Create a Samba share via Docker
# More details can be found under https://hub.docker.com/r/dperson/samba/
2020-03-16 10:49:45 +01:00
sudo mkdir -p /srv/nopaque/storage
docker run \
2020-01-07 15:03:21 +01:00
--name opaque_storage \
-v /srv/nopaque/storage:/srv/nopaque/storage \
-p 445:445 \
dperson/samba \
-p \
-s storage.nopaque;/srv/nopaque/storage;no;no;no;nopaque \
-u nopaque;nopaque
# Mount the Samba share on all swarm member nodes with the following code
2020-03-16 10:49:45 +01:00
sudo mkdir /mnt/nopaque
sudo mount --types cifs --options gid=${USER},password=nopaque,uid=${USER},user=nopaque,vers=3.0 //<YOUR IP>/storage.nopaque /mnt/nopaque
2019-08-26 14:09:43 +02:00
```
2020-01-07 15:03:21 +01:00
3. **Download Opaque**
``` bash
2020-03-16 10:49:45 +01:00
git clone https://gitlab.ub.uni-bielefeld.de/sfb1288inf/opaque.git
cd opaque
docker-compose pull
2019-08-15 12:03:16 +02:00
```
4. **Configure your instance:**
For production environments it is recommended to activate and secure the Docker HTTP API. You can read more [here](https://gitlab.ub.uni-bielefeld.de/sfb1288inf/opaque_daemon).
2019-08-26 14:09:43 +02:00
``` bash
2020-03-16 10:49:45 +01:00
mkdir logs
cp nopaque.env.tpl nopaque.env
<YOUR EDITOR> nopaque.env # Fill out the empty variables within this file. For the gitlab login either use your credentials (not recommended) Or create a gitlab token
2019-08-15 12:03:16 +02:00
```
2020-03-16 10:49:45 +01:00
5. Further development instructions
Use the following command to allow docker to pull images from your gitlab registry. TODO: Check if this could also work wit a token?
```bash
docker login gitlab.ub.uni-bielefeld.de:4567
```
6. **Start your instance**
2020-01-07 15:03:21 +01:00
``` bash
# Execute the following 3 steps only on first startup
2020-03-16 10:49:45 +01:00
docker-compose run web flask db upgrade
docker-compose run web flask insert-initial-database-entries
docker-compose down
2019-08-26 14:09:43 +02:00
2020-03-16 10:49:45 +01:00
docker-compose up
2020-01-07 15:03:21 +01:00
```
2020-01-08 16:02:18 +01:00
6. **Alter Database Models**
``` bash
2020-03-16 10:49:45 +01:00
docker-compose run web flask db migrate
docker-compose run web flask db upgrade
2020-01-08 16:02:18 +01:00
```