Update README.md

This commit is contained in:
Stephan Porada 2019-03-03 18:50:37 +01:00
parent b73c788e97
commit a49b27bd35

View File

@ -89,3 +89,12 @@ official protocols used as input data are included.
5. If you want to calculate n-grams from tokenized protocols without stopwords per year use this command: `./bundesdata_nlp.py -cn year tk_ns_year -sp /path/to/nlp_output/nlp_beuatiful_xml/ /path/to/some/folder/for/the/output/`.
6. If you want to calculate n-grams from tokenized protocols with stopwords per speaker use this command: `./bundesdata_nlp.py -cn speaker tk_ws_speaker -sp /path/to/nlp_output/nlp_beuatiful_xml/ /path/to/some/folder/for/the/output/`.
7. The parameter `-cn` is always followed by two arguments (Example: `-cn year lm_ns_year`). The first is used to specifie how the n-grams are counted. It can be set to "year", "mont_year", "speaker" or "speech". N-grams will then be count by year, speaker and so on. The second argument is a user specified string to identify from what kind of protocols the n-grams have been calculated. The string "lm_ns_year" for example describes that the input protocols have been lemmatized (lm) and contain no stop words (ns). The last part (year) specifies that the n-grams have been calculated by year.
# Used packages and software
- js-beautify
- Lielmanis, E.; Newman, L.; Stockman, D. & Sanfilippo, S.
- lxml
- Behnel, S.; Faassen, M.; Bicking, I.; Joukl, H.; Sapin, S.; Parent, M.-A.; Grisel, O.; Buchcik, K.; Wagner, F.; Kroymann, E.; Everitt, P.; Ng, V.; Kern, R.; Pakulat, A.; Sankel, D.; Kasperski, M.; da Silva, S. & Oberndörfer, P.
- Babel2018
- Ronacher, A.