Update README.md

This commit is contained in:
Stephan Porada 2019-03-03 21:24:44 +01:00
parent ba3102f559
commit 357f47e689

View File

@ -36,12 +36,17 @@ The actual data can be found here: https://gitlab.ub.uni-bielefeld.de/sporada/bu
## Import the data into the database ## Import the data into the database
1. Befor importing the data we have to setup the tables in the PostgreSQL database. 1. Befor importing the data we have to setup the tables in the PostgreSQL database.
1. Do this with `docker-compose run web python manage.py makemigrations` - Do this with `docker-compose run web python manage.py makemigrations`
1. followed by `docker-compose run web python manage.py migrate`. - followed by `docker-compose run web python manage.py migrate`.
11. Now the data for the ngrams, speeches, and speakers has to be imported into the database of the app. 11. Now the data for the ngrams, speeches, and speakers has to be imported into the database of the app.
12. Shutdown the app with the command `docker-compose down`. 12. Shutdown the app with the command `docker-compose down`.
13. Change the owner rights of all files in the repository. This has to be done because every process inside a docker container is always executed with root privilage. Thus the created volumes are not accessable anymore. Change the rights with `sudo chown -R $USER:$USER .` This is only needed for linux systems. 13. Change the owner rights of all files in the repository. (This step should only be necessary for linux systems.)
12. Download the folders *MdB\_data* and *outputs* from the link mentioned in [this repository](https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data) and copy those into the folder *input_volume* which is located inside the web app repository on the root level. If the downloaded folders are inside an archive extract the folders first. This folder is a volume which is mounted into the web app container. The contianer is able to read every data inside that volume. Note that the volume is accessed with the path */usr/src/app/input_data* not */usr/src/app/input_volume*. - This has to be done because every process inside a docker container is always executed with root privilage. Thus the created volumes are not accessable anymore.
- Change the rights with `sudo chown -R $USER:$USER .`
12. Download the folders *MdB\_data* and *outputs* from the link mentioned in [this repository](https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data).
- Copy those into the folder *input_volume* which is located inside the web app repository on the root level.
- If the downloaded folders are inside an archive extract the folders first.
- The folder *input_volume* is a volume which is mounted into the web app container. The contianer is able to read every data inside that volume. Note that the volume is accessed with the path */usr/src/app/input_data* not */usr/src/app/input_volume*.
13. Restart the app with `docker-compose up` 13. Restart the app with `docker-compose up`
13. First we have to import the speaker data. This will be done by executing following command `docker-compose run web python manage.py import_speakers /usr/src/app/input_data/MdB_data/MdB_Stammdaten.xml` in the second terminal. 13. First we have to import the speaker data. This will be done by executing following command `docker-compose run web python manage.py import_speakers /usr/src/app/input_data/MdB_data/MdB_Stammdaten.xml` in the second terminal.
14. After that we can import all the protocols and thus all speeches for every person. The command to do that is `docker-compose run web python manage.py import_protocols /usr/src/app/input_data/outputs/markup/full_periods` (Importing all protocols takes up to 2 days. For testing purposes *dev\_data/beautiful\_xml* or *test\_data/beautiful\_xml* can be used.) 14. After that we can import all the protocols and thus all speeches for every person. The command to do that is `docker-compose run web python manage.py import_protocols /usr/src/app/input_data/outputs/markup/full_periods` (Importing all protocols takes up to 2 days. For testing purposes *dev\_data/beautiful\_xml* or *test\_data/beautiful\_xml* can be used.)