Go to file
2020-01-09 16:28:00 +01:00
app Add options to add more detailed metadate to corpus file 2020-01-09 16:04:52 +01:00
migrations Add more metadate to corpus files 2020-01-08 16:02:42 +01:00
tests Add some tests 2019-07-09 15:41:28 +02:00
.gitignore Add Code for ssl 2020-01-09 16:27:55 +01:00
.gitlab-ci.yml Update 2019-11-04 15:06:54 +01:00
config.py Add sql alchemy engine options to common configurations. 2020-01-07 11:32:46 +01:00
dind_swarm_setup.sh Update dind swarm script. 2020-01-03 10:27:55 +01:00
dind_swarm.yml Change storage dir 2019-12-02 11:57:08 +01:00
docker-compose.yml Add Code for ssl 2020-01-09 16:27:55 +01:00
docker-entrypoint.sh Add reverse proxy 2020-01-09 15:13:47 +01:00
Dockerfile Renaming: opaque->nopaque 2019-12-02 11:34:28 +01:00
nopaque.env.tpl Add reverse proxy 2020-01-09 15:13:47 +01:00
nopaque.py Add options to add more detailed metadate to corpus file 2020-01-09 16:04:52 +01:00
README.md Enhance Readme 2020-01-08 16:02:18 +01:00
requirements.txt Remove gunicorn from requirements.txt 2020-01-08 16:02:28 +01:00

Opaque

Opaque is a virtual research environment (VRE) bundling OCR, NLP and additional computer linguistic methods for research purposes in the field of Digital Humanities.

Opaque is designed as a web application which can be easily used by researchers to aid them during their research process.

In particular researchers can use Opaque to start OCR jobs for digitized sources. The text output of these OCR jobs can then be used as an input for tagging processes (POS, NER etc.).

As a last step texts can be loaded into an information retrieval system to query for specific words, phrases in connection with linguistic features.

Dependencies

  • cifs-utils
  • Docker
  • Docker Compose

Configuration and startup

  1. Create Docker swarm: The generated computational workload is handled by a Docker swarm. A swarm is a group of machines that are running Docker and joined into a cluster. It consists out of two different kinds of members, managers and workers. Currently it is not possible to specify a dedicated Docker host, instead Opaque expects the executing system to be a swarm manager of a cluster with at least one dedicated worker machine. The swarm setup process is described best in the Docker documentation.
  2. Create a network storage: A shared network space is necessary so that all swarm members have access to all the data. To achieve this a Samba can be used.
# Example: Create a Samba share via Docker
# More details can be found under https://hub.docker.com/r/dperson/samba/
$ sudo mkdir -p /srv/nopaque/storage
$ docker run \
    --name opaque_storage \
    -v /srv/nopaque/storage:/srv/nopaque/storage \
    -p 445:445 \
    dperson/samba \
      -p \
      -s storage.nopaque;/srv/nopaque/storage;no;no;no;nopaque \
      -u nopaque;nopaque

# Mount the Samba share on all swarm member nodes with the following code
$ sudo mkdir /mnt/nopaque
$ sudo mount --types cifs --options gid=${USER},password=nopaque,uid=${USER},user=nopaque,vers=3.0 //<YOUR IP>/storage.nopaque /mnt/nopaque
  1. Download Opaque
$ git clone https://gitlab.ub.uni-bielefeld.de/sfb1288inf/opaque.git
$ cd opaque
$ docker-compose pull
  1. Configure your instance: For production environments it is recommended to activate and secure the Docker HTTP API. You can read more here.
$ cp nopaque.env.tpl nopaque.env
$ <YOUR EDITOR> nopaque.env # Fill out the empty variables within this file.
  1. Start your instance
# Execute the following 3 steps only on first startup
$ docker-compose run web flask db upgrade
$ docker-compose run web flask db insert-initial-database-entries
$ docker-compose down

$ docker-compose up
  1. Alter Database Models
$ docker-compose run web flask db migrate
$ docker-compose run web flask db upgrade