nopaque/README.md

89 lines
3.4 KiB
Markdown
Raw Normal View History

2019-07-17 13:49:16 +02:00
# Opaque
2019-08-26 14:09:43 +02:00
Opaque is a virtual research environment (VRE) bundling OCR, NLP and additional computer linguistic methods for research purposes in the field of Digital Humanities.
Opaque is designed as a web application which can be easily used by researchers to aid them during their research process.
In particular researchers can use Opaque to start OCR jobs for digitized sources. The text output of these OCR jobs can then be used as an input for tagging processes (POS, NER etc.).
As a last step texts can be loaded into an information retrieval system to query for specific words, phrases in connection with POS tags.
2019-07-17 13:49:16 +02:00
## Dependencies
- Docker: https://www.docker.com/
2019-08-08 13:46:55 +02:00
- Python 3.5+
2019-08-26 14:09:43 +02:00
- cifs-utils
2019-08-15 12:03:16 +02:00
2019-08-26 14:09:43 +02:00
## Setup
2019-09-11 16:25:56 +02:00
0. **Create log files**
2019-09-12 11:35:23 +02:00
``` bash
2019-09-11 16:25:56 +02:00
mkdir /logs
```
2019-08-15 12:03:16 +02:00
2019-08-26 14:09:43 +02:00
1. **Create Docker swarm:**
2019-08-26 10:26:37 +02:00
The generated computational workload is handled by a [Docker](https://docs.docker.com/) swarm. A swarm is a group of machines that are running Docker and joined into a cluster. It consists out of two different kinds of members, managers and workers. Currently it is not possible to specify a dedicated Docker host, instead Opaque expects the executing system to be a swarm manager of a swarm with at least one dedicated worker machine. The [swarm setup](https://docs.docker.com/engine/swarm/swarm-tutorial/) process is described best in the Docker documentation.
2019-08-26 14:09:43 +02:00
2. Create a dedicated user `opaque` on all swarm members with `sudo useradd opaque`.
2019-08-26 10:26:37 +02:00
2019-08-26 14:09:43 +02:00
3. Create shared network storage
A shared network space is necessary so that all swarm members have access to all the data. To achieve this a [Samba](https://www.samba.org/) share is used.
``` bash
# Start a samba service on a swarm manager node
SAMBA_DIRECTORY=</ABSOLUT/PATH>
SAMBA_HOSTNAME=<HOSTNAME>
SAMBA_PASSWORD=<SET_PASSWORD>
2019-08-26 10:26:37 +02:00
2019-08-26 14:09:43 +02:00
docker service create \
--constraint node.hostname==$SAMBA_HOSTNAME \
--mount type=bind,src=$SAMBA_DIRECTORY,dst=/storage.opaque \
--name samba_opaque \
--publish published=139,target=139,mode=host \
--publish published=445,target=445,mode=host \
dperson/samba \
-p \
-s "storage.opaque;/storage.opaque;no;no;no;opaque" \
-u "opaque;$SAMBA_PASSWORD"
2019-08-26 10:26:37 +02:00
2019-08-26 14:09:43 +02:00
# The following steps need to be executed on all swarm members
# Login as opaque user
sudo su opaque
# Create mount point for opaque storage
mkdir -p $HOME/mnt/opaque
# Mount the samba share
sudo mount -t cifs -o gid=opaque,password=$SAMBA_PASSWORD,uid=opaque,user=opaque,vers=3.0 //$SAMBA_HOSTNAME/storage.opaque $HOME/mnt/opaque
```
2019-08-26 10:26:37 +02:00
2019-08-26 14:09:43 +02:00
4. Clone the Opaque repository to the swarm manager, that should execute the Opaque server software
2019-08-15 12:03:16 +02:00
```
git clone https://gitlab.ub.uni-bielefeld.de/sfb1288inf/opaque.git
cd opaque
```
2019-08-26 14:09:43 +02:00
4.1 Create a configuration file
``` bash
2019-08-26 10:26:37 +02:00
touch .env
# Account information of a mail account for sending emails to opaque users.
echo "MAIL_USERNAME=opaque@example.com" >> .env
echo "MAIL_PASSWORD=password" >> .env
echo "MAIL_SERVER=smtp.example.com" >> .env
echo "MAIL_PORT=587" >> .env
echo "MAIL_USE_TLS=true" >> .env
# A user registering with this email address will automatically promoted as an admin.
2019-12-02 11:34:28 +01:00
echo "NOPAQUE_ADMIN=admin.opaque@example.com" >> .env
2019-08-26 10:26:37 +02:00
# Absolut path to an existing directory to save all opaque files.
2019-08-26 14:09:43 +02:00
echo "OPAQUE_STORAGE=/home/opaque/mnt/opaque" >> .env
2019-08-15 12:03:16 +02:00
```
2019-08-26 14:09:43 +02:00
4.2 Create Python virtual environment, activate it and install the required python packages.
2019-08-15 12:03:16 +02:00
```
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
2019-08-26 14:09:43 +02:00
5. Start the server: `python opaque.py`
2019-09-23 14:24:09 +02:00
6. test