„README.md“ ändern

This commit is contained in:
sporada 2021-01-19 15:31:25 +01:00
parent bdbef9c1c2
commit ca42b27969

View File

@ -7,7 +7,7 @@ which member of parliament hold what speech etc.
This software can mark every protocol from 1949 till 2017 automatically. The
software identifies speakers, their speeches, metadata etc. For detailed information
why this software was made and how it works, read the corresponding master thises
uploaded [here](https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_web_app/raw/24641c2959796659d428514c9cdd3782d4248da0/2019-02-04_Stephan_Porada_Masterthesis_semi.pdf?inline=false) (It is written in german though).
uploaded [here](https://gitea.sporada.eu/sporada/bundesdata_web_app/src/branch/master/2019-02-04_Stephan_Porada_Masterthesis_semi.pdf) (It is written in german though).
Besides the markup the software can also calculate ngrams for all automatically
marked protocols either from lemmatized or just tokenized text with or without
@ -24,12 +24,11 @@ The web app also provides an Ngram Viewer based on the produced ngram data that
displays ngram frequencies for all protocols from 1949 till 2017. The Ngram Viewer
is similar to the [Google Ngram Viewer](https://books.google.com/ngrams).
The source code of the web application can be found here: https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_web_app.
A live version of the app is accessible from inside the University Bielefeld
network by visiting http://129.70.12.88:8000/.
The source code of the web application can be found here: https://gitea.sporada.eu/sporada/bundesdata_web_app.
A live version of the web application can be visited via the link: https://bundesdata.sporada.eu/.
## Input and Output data
The input and output data of this software can be found here: https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data.
The input and output data of this software can be found here: https://gitea.sporada.eu/sporada/bundesdata_markup_nlp_data.
You can find all automatically marked protocols and ngrams there. Also the
official protocols used as input data are included.
@ -61,7 +60,7 @@ official protocols used as input data are included.
### Markup process
1. Downlaod some protocols to use them as an input for the markup process.
- You can either download some files from https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data including the _development\_data\_xml_ data set found in _inputs_.
- You can either download some files from https://gitea.sporada.eu/sporada/bundesdata_markup_nlp_data including the _development\_data\_xml_ data set found in _inputs_.
- Or download the protocols directly from https://www.bundestag.de/services/opendata.
- Only protocols from the 1st to 18th period can be used as an input.
2. Place the protocols you want to mark in one directory. The directory can contain one level of sub directories in example for protocols of different periods. This tutorial will continue using the folder _development\_data\_xml_.