Update README.md

This commit is contained in:
Stephan Porada 2019-03-03 21:24:44 +01:00
parent ba3102f559
commit 357f47e689

View File

@ -36,12 +36,17 @@ The actual data can be found here: https://gitlab.ub.uni-bielefeld.de/sporada/bu
## Import the data into the database
1. Befor importing the data we have to setup the tables in the PostgreSQL database.
1. Do this with `docker-compose run web python manage.py makemigrations`
1. followed by `docker-compose run web python manage.py migrate`.
- Do this with `docker-compose run web python manage.py makemigrations`
- followed by `docker-compose run web python manage.py migrate`.
11. Now the data for the ngrams, speeches, and speakers has to be imported into the database of the app.
12. Shutdown the app with the command `docker-compose down`.
13. Change the owner rights of all files in the repository. This has to be done because every process inside a docker container is always executed with root privilage. Thus the created volumes are not accessable anymore. Change the rights with `sudo chown -R $USER:$USER .` This is only needed for linux systems.
12. Download the folders *MdB\_data* and *outputs* from the link mentioned in [this repository](https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data) and copy those into the folder *input_volume* which is located inside the web app repository on the root level. If the downloaded folders are inside an archive extract the folders first. This folder is a volume which is mounted into the web app container. The contianer is able to read every data inside that volume. Note that the volume is accessed with the path */usr/src/app/input_data* not */usr/src/app/input_volume*.
13. Change the owner rights of all files in the repository. (This step should only be necessary for linux systems.)
- This has to be done because every process inside a docker container is always executed with root privilage. Thus the created volumes are not accessable anymore.
- Change the rights with `sudo chown -R $USER:$USER .`
12. Download the folders *MdB\_data* and *outputs* from the link mentioned in [this repository](https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data).
- Copy those into the folder *input_volume* which is located inside the web app repository on the root level.
- If the downloaded folders are inside an archive extract the folders first.
- The folder *input_volume* is a volume which is mounted into the web app container. The contianer is able to read every data inside that volume. Note that the volume is accessed with the path */usr/src/app/input_data* not */usr/src/app/input_volume*.
13. Restart the app with `docker-compose up`
13. First we have to import the speaker data. This will be done by executing following command `docker-compose run web python manage.py import_speakers /usr/src/app/input_data/MdB_data/MdB_Stammdaten.xml` in the second terminal.
14. After that we can import all the protocols and thus all speeches for every person. The command to do that is `docker-compose run web python manage.py import_protocols /usr/src/app/input_data/outputs/markup/full_periods` (Importing all protocols takes up to 2 days. For testing purposes *dev\_data/beautiful\_xml* or *test\_data/beautiful\_xml* can be used.)