„README.md“ ändern
This commit is contained in:
parent
bdbef9c1c2
commit
ca42b27969
11
README.md
11
README.md
@ -7,7 +7,7 @@ which member of parliament hold what speech etc.
|
||||
This software can mark every protocol from 1949 till 2017 automatically. The
|
||||
software identifies speakers, their speeches, metadata etc. For detailed information
|
||||
why this software was made and how it works, read the corresponding master thises
|
||||
uploaded [here](https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_web_app/raw/24641c2959796659d428514c9cdd3782d4248da0/2019-02-04_Stephan_Porada_Masterthesis_semi.pdf?inline=false) (It is written in german though).
|
||||
uploaded [here](https://gitea.sporada.eu/sporada/bundesdata_web_app/src/branch/master/2019-02-04_Stephan_Porada_Masterthesis_semi.pdf) (It is written in german though).
|
||||
|
||||
Besides the markup the software can also calculate ngrams for all automatically
|
||||
marked protocols either from lemmatized or just tokenized text with or without
|
||||
@ -24,12 +24,11 @@ The web app also provides an Ngram Viewer based on the produced ngram data that
|
||||
displays ngram frequencies for all protocols from 1949 till 2017. The Ngram Viewer
|
||||
is similar to the [Google Ngram Viewer](https://books.google.com/ngrams).
|
||||
|
||||
The source code of the web application can be found here: https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_web_app.
|
||||
A live version of the app is accessible from inside the University Bielefeld
|
||||
network by visiting http://129.70.12.88:8000/.
|
||||
The source code of the web application can be found here: https://gitea.sporada.eu/sporada/bundesdata_web_app.
|
||||
A live version of the web application can be visited via the link: https://bundesdata.sporada.eu/.
|
||||
|
||||
## Input and Output data
|
||||
The input and output data of this software can be found here: https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data.
|
||||
The input and output data of this software can be found here: https://gitea.sporada.eu/sporada/bundesdata_markup_nlp_data.
|
||||
You can find all automatically marked protocols and ngrams there. Also the
|
||||
official protocols used as input data are included.
|
||||
|
||||
@ -61,7 +60,7 @@ official protocols used as input data are included.
|
||||
### Markup process
|
||||
|
||||
1. Downlaod some protocols to use them as an input for the markup process.
|
||||
- You can either download some files from https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data including the _development\_data\_xml_ data set found in _inputs_.
|
||||
- You can either download some files from https://gitea.sporada.eu/sporada/bundesdata_markup_nlp_data including the _development\_data\_xml_ data set found in _inputs_.
|
||||
- Or download the protocols directly from https://www.bundestag.de/services/opendata.
|
||||
- Only protocols from the 1st to 18th period can be used as an input.
|
||||
2. Place the protocols you want to mark in one directory. The directory can contain one level of sub directories in example for protocols of different periods. This tutorial will continue using the folder _development\_data\_xml_.
|
||||
|
Loading…
x
Reference in New Issue
Block a user