Initial commit

This commit is contained in:
Stephan Porada 2019-02-28 14:09:53 +01:00
commit 96e84d083d
97 changed files with 66293 additions and 0 deletions

15
.gitignore vendored Normal file
View File

@ -0,0 +1,15 @@
*.pyc
__pycache
.DS_Store
*_initial.py
.idea
**/staticfiles/*
**/input_data/*
/input_volume/*
postgres_data/*
static_volume/*
!**/.gitkeep

50
README.md Normal file
View File

@ -0,0 +1,50 @@
# What is this?
This django web app DESCRIPTION.
The needed data was created by using this software: https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_software
The actual data can be found here: https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data
# Installation
## Systemrequirements
* docker
* docker-compose
* unix-like OS
## Install requirements
1. First install `docker` for your OS according to this guide: [https://docs.docker.com/install/](https://docs.docker.com/install/)
2. After that install `docker-compose` for you system according to this guide: [https://docs.docker.com/compose/install/](https://docs.docker.com/compose/install/)
3. Clone this reposiory with `git clone https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_web_app.git` to a location of your choice.
## Build the app/start the app with docker
1. Navigate into the repository with `cd path/to/the/repository`.
5. Start the web app with `docker-compose up`. Doing this the first time will take some time because all the needed images (postgres, python and nginx) will have to be downloaded. Also all the packages defined in the Pipfile will be installed in the according image. This terminal will stay open for now to show the log messages of the three running containers.
6. Check if the app is running by visiting [127.0.0.1:8000](127.0.0.1:8000)
7. The website should be showing up. But it looks kind of broken. To fix this collect the needed static files (css, images, javascripts etc.). Open a second terminal in the same location as the running one and execute `docker-compose run web python manage.py collectstatic`.
8. Answere the question with `yes` if needed.
9. Reload [127.0.0.1:8000](127.0.0.1:8000) to verfiy that the files have been successfully imported. The website should look pretty nice now.
## Import the data into the database
1. Befor importing the data we have to setup the tables in the psotgresql database. Do this with `docker-compose run web python manage.py makemigrations` followed by `docker-compose run web python manage.py migrate`.
11. Now the data for the ngrams, speeches, and speakers has to be imported into the database of the app.
12. Shutdown the app with the command `docker-compose down`.
13. Change the owner rights of all files in the repository. This has to be done because every process inside a docker container is always executed with root privilage. Thus the created volumes are not accessable anymore. Change the rights with `sudo chown -R $USER:$USER .`
12. Download the folders *MdB\_data* and *outputs* from the link mentioned in [this repository](https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data) and copy it into the folder *input_volume* which is located inside the web app repository on the root level. This folder is a volume which is mounted into the web app container. The contianer is able to read every data inside that volume. Note that the volume is accessed with the path */usr/src/app/input_data* not */usr/src/app/input_volume*.
13. Restart the app with `docker-compose up`
13. First we have to import the speaker data. This will be done by executing following command `docker-compose run web python manage.py import_speakers /usr/src/app/input_data/MdB_data/MdB_Stammdaten.xml` in the second terminal.
14. After that we can import all the protocols and thus all speeches for every person. The command to do that is `docker-compose run web python manage.py import_protocols /usr/src/app/input_data/outputs/markup/full_periods` (Importing all protocols takes up to 2 days. For testing purposes *dev\_data/beautiful\_xml* or *test\_data/beautiful\_xml* can be used.)
15. Now the n-grams can be imported by using `docker-compose run web python manage.py import_ngrams_bulk 1 /usr/src/app/input_data/outputs/nlp/full_periods/n-grams/lm_ns_year/1_grams lm_ns_year`. This command imports the alphabetically splitted n-grams into their according tables. First parameter of this command is *1*. This tells the function to import the n-grams from the input path as 1-grams. Therefore the second parameter is the inputpath */usr/src/app/input_data/outputs/nlp/full_periods/n-grams/lm_ns_year/1_grams* where the 1-grams are located. The last part of the input path clearly identifies the n-grams as 1-grams. Finally the third parameter identifies what kind of n-grams are being imported. In this case the parameter is set to *lm_ns_year* which means the ngrams are based on lemmatized text without stopwords counted by year. An example to import 2-grams would look like this `docker-compose run web python manage.py import_ngrams_bulk 2 /usr/src/app/input_data/outputs/nlp/full_periods/n-grams/lm_ns_year/2_grams lm_ns_year`. To import 3-grams from a different corpus the command for example should look like this: `docker-compose run web python manage.py import_ngrams_bulk 3 /usr/src/app/input_data/outputs/nlp/full_periods/n-grams/tk_ws_speaker_\(1-3\)/3_grams tk_ws_speaker`. Be careful when importing the n-grams. If the parameters are set wrong, the n-grams will be imorted into the wrong tables and thus leading to incorect findings using the Ngram Viewer. It is possible to import different n-gram sets at the same time using multiple commands in multiple terminals. Just keep an eye out on the CPU and RAM usage. There is also an optional fourth parameter to set the batch size of one insert. The default is set to read 1 million rows from the csv and insert them at once into the database. The parameter `-bs 10000000` would set it to 10 million. Increasing that value also increases the RAM usage so be careful with that.
16. Repeate the step above for every kind of n-gram data you want to import. Importing 1-grams will only take some minutes while importing 5-grams will take several hours. (For testing purposes the n-grams from the test or development data can be used.)
17. After importing the n-grams the web app is all set up.
18. The app can be shut down with `docker-compose down`. All imported data is saved persistently in the database volume *postgres_data*.
19. To restart the app use `docker-compose up` or `docker-compose -d` to start it detatched.
# Live version
A live Version of the app is running under http://129.70.12.88:8000/ in the University Bielefeld network. You have to access the universitie network via VPN to be able to use the live version. (https://www.ub.uni-bielefeld.de/search/vpn/)

20
app/Dockerfile Normal file
View File

@ -0,0 +1,20 @@
# pull official base image
FROM python:3.7.2
# set environment varibles
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# set work directory
WORKDIR /usr/src/app
# install dependencies
RUN pip install --upgrade pip
RUN pip install pipenv
COPY ./Pipfile /usr/src/app/Pipfile
RUN pipenv install --skip-lock --system --dev
# copy project
COPY . /usr/src/app/

22
app/Pipfile Normal file
View File

@ -0,0 +1,22 @@
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"
[packages]
django= "==2.1.4"
psycopg2= "==2.7.6.1"
gunicorn= "==19.9.0"
lxml= "==4.2.5"
tqdm= "==4.28.1"
django-watson= "==1.5.2"
django-tables2= "==2.0.3"
django-jchart= "==0.4.2"
[requires]
python_version = "3.7"

42
app/Pipfile.lock generated Normal file
View File

@ -0,0 +1,42 @@
{
"_meta": {
"hash": {
"sha256": "653830713010356a8c80045a262ac3ea3bce011557c04f8cb5f7305a86068d02"
},
"pipfile-spec": 6,
"requires": {
"python_version": "3.7"
},
"sources": [
{
"name": "pypi",
"url": "https://pypi.python.org/simple",
"verify_ssl": true
}
]
},
"default": {
"django": {
"hashes": [
"sha256:7f246078d5a546f63c28fc03ce71f4d7a23677ce42109219c24c9ffb28416137",
"sha256:ea50d85709708621d956187c6b61d9f9ce155007b496dd914fdb35db8d790aec"
],
"version": "==2.1"
},
"gunicorn": {
"hashes": [
"sha256:aa8e0b40b4157b36a5df5e599f45c9c76d6af43845ba3b3b0efe2c70473c2471",
"sha256:fa2662097c66f920f53f70621c6c58ca4a3c4d3434205e608e121b5b3b71f4f3"
],
"version": "==19.9.0"
},
"pytz": {
"hashes": [
"sha256:31cb35c89bd7d333cd32c5f278fca91b523b0834369e757f4c5641ea252236ca",
"sha256:8e0f8568c118d3077b46be7d654cc8167fa916092e28320cde048e54bfc9f1e6"
],
"version": "==2018.7"
}
},
"develop": {}
}

0
app/blog/__init__.py Executable file
View File

3
app/blog/admin.py Executable file
View File

@ -0,0 +1,3 @@
from django.contrib import admin
# Register your models here.

5
app/blog/apps.py Executable file
View File

@ -0,0 +1,5 @@
from django.apps import AppConfig
class BlogConfig(AppConfig):
name = 'blog'

View File

3
app/blog/models.py Executable file
View File

@ -0,0 +1,3 @@
from django.db import models
# Create your models here.

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 MiB

View File

@ -0,0 +1 @@
(c) Deutscher Bundestag / Marc-Steffen Unger

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

View File

@ -0,0 +1 @@
(c) Deutscher Bundestag / Thomas Köhler/photothek.net

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 MiB

View File

@ -0,0 +1 @@
(c) Deutscher Bundestag / Achim Melde

77
app/blog/static/blog/main.css Executable file
View File

@ -0,0 +1,77 @@
html {
word-break: break-word;
}
body {
display: flex;
min-height: 100vh;
flex-direction: column;
}
main {
flex: 1 0 auto;
}
/* span.comment {
color: white;
float: right;
position: relative;
border: 5px solid #607D8B;
text-align: center;
background: #607D8B;
-webkit-border-radius: 10px;
-moz-border-radius: 10px;
border-radius: 10px;
margin-bottom: 10px;
} */
span.comment {
position: relative;
background: #607D8B;
border: 5px solid #607D8B;
border-radius: 0.6em;
color: white;
float: right;
margin-bottom: 20px;
}
span.comment:after {
content: '';
position: absolute;
bottom: 0;
left: 75%;
width: 0;
height: 0;
border: 0.938em solid transparent;
border-top-color: #607D8B;
border-bottom: 0;
border-right: 0;
margin-left: -0.469em;
margin-bottom: -0.937em;
}
.data-size {
height: inherit;
width: 90%;
left: 0;
top: 0;
margin-left: 5%;
margin-top: 3.66em;
padding: 0.75em;
overflow: fixed;
float: top;
position: fixed;
}
/* changes lable color of witches */
.switch label input[type=checkbox]:checked+.lever:after {
background-color: #558B2F !important;
}
.switch label input[type=checkbox]:checked+.lever {
background-color: #558B2F !important;
}
/* Hight for parallax container */
.parallax-container {
height: 500px;
}

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 275 KiB

View File

@ -0,0 +1,9 @@
The recommended way to use the Material Icons font is by linking to the web font hosted on Google Fonts:
```html
<link href="https://fonts.googleapis.com/icon?family=Material+Icons"
rel="stylesheet">
```
Read more in our full usage guide:
http://google.github.io/material-design-icons/#icon-font-for-the-web

View File

@ -0,0 +1,932 @@
3d_rotation e84d
ac_unit eb3b
access_alarm e190
access_alarms e191
access_time e192
accessibility e84e
accessible e914
account_balance e84f
account_balance_wallet e850
account_box e851
account_circle e853
adb e60e
add e145
add_a_photo e439
add_alarm e193
add_alert e003
add_box e146
add_circle e147
add_circle_outline e148
add_location e567
add_shopping_cart e854
add_to_photos e39d
add_to_queue e05c
adjust e39e
airline_seat_flat e630
airline_seat_flat_angled e631
airline_seat_individual_suite e632
airline_seat_legroom_extra e633
airline_seat_legroom_normal e634
airline_seat_legroom_reduced e635
airline_seat_recline_extra e636
airline_seat_recline_normal e637
airplanemode_active e195
airplanemode_inactive e194
airplay e055
airport_shuttle eb3c
alarm e855
alarm_add e856
alarm_off e857
alarm_on e858
album e019
all_inclusive eb3d
all_out e90b
android e859
announcement e85a
apps e5c3
archive e149
arrow_back e5c4
arrow_downward e5db
arrow_drop_down e5c5
arrow_drop_down_circle e5c6
arrow_drop_up e5c7
arrow_forward e5c8
arrow_upward e5d8
art_track e060
aspect_ratio e85b
assessment e85c
assignment e85d
assignment_ind e85e
assignment_late e85f
assignment_return e860
assignment_returned e861
assignment_turned_in e862
assistant e39f
assistant_photo e3a0
attach_file e226
attach_money e227
attachment e2bc
audiotrack e3a1
autorenew e863
av_timer e01b
backspace e14a
backup e864
battery_alert e19c
battery_charging_full e1a3
battery_full e1a4
battery_std e1a5
battery_unknown e1a6
beach_access eb3e
beenhere e52d
block e14b
bluetooth e1a7
bluetooth_audio e60f
bluetooth_connected e1a8
bluetooth_disabled e1a9
bluetooth_searching e1aa
blur_circular e3a2
blur_linear e3a3
blur_off e3a4
blur_on e3a5
book e865
bookmark e866
bookmark_border e867
border_all e228
border_bottom e229
border_clear e22a
border_color e22b
border_horizontal e22c
border_inner e22d
border_left e22e
border_outer e22f
border_right e230
border_style e231
border_top e232
border_vertical e233
branding_watermark e06b
brightness_1 e3a6
brightness_2 e3a7
brightness_3 e3a8
brightness_4 e3a9
brightness_5 e3aa
brightness_6 e3ab
brightness_7 e3ac
brightness_auto e1ab
brightness_high e1ac
brightness_low e1ad
brightness_medium e1ae
broken_image e3ad
brush e3ae
bubble_chart e6dd
bug_report e868
build e869
burst_mode e43c
business e0af
business_center eb3f
cached e86a
cake e7e9
call e0b0
call_end e0b1
call_made e0b2
call_merge e0b3
call_missed e0b4
call_missed_outgoing e0e4
call_received e0b5
call_split e0b6
call_to_action e06c
camera e3af
camera_alt e3b0
camera_enhance e8fc
camera_front e3b1
camera_rear e3b2
camera_roll e3b3
cancel e5c9
card_giftcard e8f6
card_membership e8f7
card_travel e8f8
casino eb40
cast e307
cast_connected e308
center_focus_strong e3b4
center_focus_weak e3b5
change_history e86b
chat e0b7
chat_bubble e0ca
chat_bubble_outline e0cb
check e5ca
check_box e834
check_box_outline_blank e835
check_circle e86c
chevron_left e5cb
chevron_right e5cc
child_care eb41
child_friendly eb42
chrome_reader_mode e86d
class e86e
clear e14c
clear_all e0b8
close e5cd
closed_caption e01c
cloud e2bd
cloud_circle e2be
cloud_done e2bf
cloud_download e2c0
cloud_off e2c1
cloud_queue e2c2
cloud_upload e2c3
code e86f
collections e3b6
collections_bookmark e431
color_lens e3b7
colorize e3b8
comment e0b9
compare e3b9
compare_arrows e915
computer e30a
confirmation_number e638
contact_mail e0d0
contact_phone e0cf
contacts e0ba
content_copy e14d
content_cut e14e
content_paste e14f
control_point e3ba
control_point_duplicate e3bb
copyright e90c
create e150
create_new_folder e2cc
credit_card e870
crop e3be
crop_16_9 e3bc
crop_3_2 e3bd
crop_5_4 e3bf
crop_7_5 e3c0
crop_din e3c1
crop_free e3c2
crop_landscape e3c3
crop_original e3c4
crop_portrait e3c5
crop_rotate e437
crop_square e3c6
dashboard e871
data_usage e1af
date_range e916
dehaze e3c7
delete e872
delete_forever e92b
delete_sweep e16c
description e873
desktop_mac e30b
desktop_windows e30c
details e3c8
developer_board e30d
developer_mode e1b0
device_hub e335
devices e1b1
devices_other e337
dialer_sip e0bb
dialpad e0bc
directions e52e
directions_bike e52f
directions_boat e532
directions_bus e530
directions_car e531
directions_railway e534
directions_run e566
directions_subway e533
directions_transit e535
directions_walk e536
disc_full e610
dns e875
do_not_disturb e612
do_not_disturb_alt e611
do_not_disturb_off e643
do_not_disturb_on e644
dock e30e
domain e7ee
done e876
done_all e877
donut_large e917
donut_small e918
drafts e151
drag_handle e25d
drive_eta e613
dvr e1b2
edit e3c9
edit_location e568
eject e8fb
email e0be
enhanced_encryption e63f
equalizer e01d
error e000
error_outline e001
euro_symbol e926
ev_station e56d
event e878
event_available e614
event_busy e615
event_note e616
event_seat e903
exit_to_app e879
expand_less e5ce
expand_more e5cf
explicit e01e
explore e87a
exposure e3ca
exposure_neg_1 e3cb
exposure_neg_2 e3cc
exposure_plus_1 e3cd
exposure_plus_2 e3ce
exposure_zero e3cf
extension e87b
face e87c
fast_forward e01f
fast_rewind e020
favorite e87d
favorite_border e87e
featured_play_list e06d
featured_video e06e
feedback e87f
fiber_dvr e05d
fiber_manual_record e061
fiber_new e05e
fiber_pin e06a
fiber_smart_record e062
file_download e2c4
file_upload e2c6
filter e3d3
filter_1 e3d0
filter_2 e3d1
filter_3 e3d2
filter_4 e3d4
filter_5 e3d5
filter_6 e3d6
filter_7 e3d7
filter_8 e3d8
filter_9 e3d9
filter_9_plus e3da
filter_b_and_w e3db
filter_center_focus e3dc
filter_drama e3dd
filter_frames e3de
filter_hdr e3df
filter_list e152
filter_none e3e0
filter_tilt_shift e3e2
filter_vintage e3e3
find_in_page e880
find_replace e881
fingerprint e90d
first_page e5dc
fitness_center eb43
flag e153
flare e3e4
flash_auto e3e5
flash_off e3e6
flash_on e3e7
flight e539
flight_land e904
flight_takeoff e905
flip e3e8
flip_to_back e882
flip_to_front e883
folder e2c7
folder_open e2c8
folder_shared e2c9
folder_special e617
font_download e167
format_align_center e234
format_align_justify e235
format_align_left e236
format_align_right e237
format_bold e238
format_clear e239
format_color_fill e23a
format_color_reset e23b
format_color_text e23c
format_indent_decrease e23d
format_indent_increase e23e
format_italic e23f
format_line_spacing e240
format_list_bulleted e241
format_list_numbered e242
format_paint e243
format_quote e244
format_shapes e25e
format_size e245
format_strikethrough e246
format_textdirection_l_to_r e247
format_textdirection_r_to_l e248
format_underlined e249
forum e0bf
forward e154
forward_10 e056
forward_30 e057
forward_5 e058
free_breakfast eb44
fullscreen e5d0
fullscreen_exit e5d1
functions e24a
g_translate e927
gamepad e30f
games e021
gavel e90e
gesture e155
get_app e884
gif e908
golf_course eb45
gps_fixed e1b3
gps_not_fixed e1b4
gps_off e1b5
grade e885
gradient e3e9
grain e3ea
graphic_eq e1b8
grid_off e3eb
grid_on e3ec
group e7ef
group_add e7f0
group_work e886
hd e052
hdr_off e3ed
hdr_on e3ee
hdr_strong e3f1
hdr_weak e3f2
headset e310
headset_mic e311
healing e3f3
hearing e023
help e887
help_outline e8fd
high_quality e024
highlight e25f
highlight_off e888
history e889
home e88a
hot_tub eb46
hotel e53a
hourglass_empty e88b
hourglass_full e88c
http e902
https e88d
image e3f4
image_aspect_ratio e3f5
import_contacts e0e0
import_export e0c3
important_devices e912
inbox e156
indeterminate_check_box e909
info e88e
info_outline e88f
input e890
insert_chart e24b
insert_comment e24c
insert_drive_file e24d
insert_emoticon e24e
insert_invitation e24f
insert_link e250
insert_photo e251
invert_colors e891
invert_colors_off e0c4
iso e3f6
keyboard e312
keyboard_arrow_down e313
keyboard_arrow_left e314
keyboard_arrow_right e315
keyboard_arrow_up e316
keyboard_backspace e317
keyboard_capslock e318
keyboard_hide e31a
keyboard_return e31b
keyboard_tab e31c
keyboard_voice e31d
kitchen eb47
label e892
label_outline e893
landscape e3f7
language e894
laptop e31e
laptop_chromebook e31f
laptop_mac e320
laptop_windows e321
last_page e5dd
launch e895
layers e53b
layers_clear e53c
leak_add e3f8
leak_remove e3f9
lens e3fa
library_add e02e
library_books e02f
library_music e030
lightbulb_outline e90f
line_style e919
line_weight e91a
linear_scale e260
link e157
linked_camera e438
list e896
live_help e0c6
live_tv e639
local_activity e53f
local_airport e53d
local_atm e53e
local_bar e540
local_cafe e541
local_car_wash e542
local_convenience_store e543
local_dining e556
local_drink e544
local_florist e545
local_gas_station e546
local_grocery_store e547
local_hospital e548
local_hotel e549
local_laundry_service e54a
local_library e54b
local_mall e54c
local_movies e54d
local_offer e54e
local_parking e54f
local_pharmacy e550
local_phone e551
local_pizza e552
local_play e553
local_post_office e554
local_printshop e555
local_see e557
local_shipping e558
local_taxi e559
location_city e7f1
location_disabled e1b6
location_off e0c7
location_on e0c8
location_searching e1b7
lock e897
lock_open e898
lock_outline e899
looks e3fc
looks_3 e3fb
looks_4 e3fd
looks_5 e3fe
looks_6 e3ff
looks_one e400
looks_two e401
loop e028
loupe e402
low_priority e16d
loyalty e89a
mail e158
mail_outline e0e1
map e55b
markunread e159
markunread_mailbox e89b
memory e322
menu e5d2
merge_type e252
message e0c9
mic e029
mic_none e02a
mic_off e02b
mms e618
mode_comment e253
mode_edit e254
monetization_on e263
money_off e25c
monochrome_photos e403
mood e7f2
mood_bad e7f3
more e619
more_horiz e5d3
more_vert e5d4
motorcycle e91b
mouse e323
move_to_inbox e168
movie e02c
movie_creation e404
movie_filter e43a
multiline_chart e6df
music_note e405
music_video e063
my_location e55c
nature e406
nature_people e407
navigate_before e408
navigate_next e409
navigation e55d
near_me e569
network_cell e1b9
network_check e640
network_locked e61a
network_wifi e1ba
new_releases e031
next_week e16a
nfc e1bb
no_encryption e641
no_sim e0cc
not_interested e033
note e06f
note_add e89c
notifications e7f4
notifications_active e7f7
notifications_none e7f5
notifications_off e7f6
notifications_paused e7f8
offline_pin e90a
ondemand_video e63a
opacity e91c
open_in_browser e89d
open_in_new e89e
open_with e89f
pages e7f9
pageview e8a0
palette e40a
pan_tool e925
panorama e40b
panorama_fish_eye e40c
panorama_horizontal e40d
panorama_vertical e40e
panorama_wide_angle e40f
party_mode e7fa
pause e034
pause_circle_filled e035
pause_circle_outline e036
payment e8a1
people e7fb
people_outline e7fc
perm_camera_mic e8a2
perm_contact_calendar e8a3
perm_data_setting e8a4
perm_device_information e8a5
perm_identity e8a6
perm_media e8a7
perm_phone_msg e8a8
perm_scan_wifi e8a9
person e7fd
person_add e7fe
person_outline e7ff
person_pin e55a
person_pin_circle e56a
personal_video e63b
pets e91d
phone e0cd
phone_android e324
phone_bluetooth_speaker e61b
phone_forwarded e61c
phone_in_talk e61d
phone_iphone e325
phone_locked e61e
phone_missed e61f
phone_paused e620
phonelink e326
phonelink_erase e0db
phonelink_lock e0dc
phonelink_off e327
phonelink_ring e0dd
phonelink_setup e0de
photo e410
photo_album e411
photo_camera e412
photo_filter e43b
photo_library e413
photo_size_select_actual e432
photo_size_select_large e433
photo_size_select_small e434
picture_as_pdf e415
picture_in_picture e8aa
picture_in_picture_alt e911
pie_chart e6c4
pie_chart_outlined e6c5
pin_drop e55e
place e55f
play_arrow e037
play_circle_filled e038
play_circle_outline e039
play_for_work e906
playlist_add e03b
playlist_add_check e065
playlist_play e05f
plus_one e800
poll e801
polymer e8ab
pool eb48
portable_wifi_off e0ce
portrait e416
power e63c
power_input e336
power_settings_new e8ac
pregnant_woman e91e
present_to_all e0df
print e8ad
priority_high e645
public e80b
publish e255
query_builder e8ae
question_answer e8af
queue e03c
queue_music e03d
queue_play_next e066
radio e03e
radio_button_checked e837
radio_button_unchecked e836
rate_review e560
receipt e8b0
recent_actors e03f
record_voice_over e91f
redeem e8b1
redo e15a
refresh e5d5
remove e15b
remove_circle e15c
remove_circle_outline e15d
remove_from_queue e067
remove_red_eye e417
remove_shopping_cart e928
reorder e8fe
repeat e040
repeat_one e041
replay e042
replay_10 e059
replay_30 e05a
replay_5 e05b
reply e15e
reply_all e15f
report e160
report_problem e8b2
restaurant e56c
restaurant_menu e561
restore e8b3
restore_page e929
ring_volume e0d1
room e8b4
room_service eb49
rotate_90_degrees_ccw e418
rotate_left e419
rotate_right e41a
rounded_corner e920
router e328
rowing e921
rss_feed e0e5
rv_hookup e642
satellite e562
save e161
scanner e329
schedule e8b5
school e80c
screen_lock_landscape e1be
screen_lock_portrait e1bf
screen_lock_rotation e1c0
screen_rotation e1c1
screen_share e0e2
sd_card e623
sd_storage e1c2
search e8b6
security e32a
select_all e162
send e163
sentiment_dissatisfied e811
sentiment_neutral e812
sentiment_satisfied e813
sentiment_very_dissatisfied e814
sentiment_very_satisfied e815
settings e8b8
settings_applications e8b9
settings_backup_restore e8ba
settings_bluetooth e8bb
settings_brightness e8bd
settings_cell e8bc
settings_ethernet e8be
settings_input_antenna e8bf
settings_input_component e8c0
settings_input_composite e8c1
settings_input_hdmi e8c2
settings_input_svideo e8c3
settings_overscan e8c4
settings_phone e8c5
settings_power e8c6
settings_remote e8c7
settings_system_daydream e1c3
settings_voice e8c8
share e80d
shop e8c9
shop_two e8ca
shopping_basket e8cb
shopping_cart e8cc
short_text e261
show_chart e6e1
shuffle e043
signal_cellular_4_bar e1c8
signal_cellular_connected_no_internet_4_bar e1cd
signal_cellular_no_sim e1ce
signal_cellular_null e1cf
signal_cellular_off e1d0
signal_wifi_4_bar e1d8
signal_wifi_4_bar_lock e1d9
signal_wifi_off e1da
sim_card e32b
sim_card_alert e624
skip_next e044
skip_previous e045
slideshow e41b
slow_motion_video e068
smartphone e32c
smoke_free eb4a
smoking_rooms eb4b
sms e625
sms_failed e626
snooze e046
sort e164
sort_by_alpha e053
spa eb4c
space_bar e256
speaker e32d
speaker_group e32e
speaker_notes e8cd
speaker_notes_off e92a
speaker_phone e0d2
spellcheck e8ce
star e838
star_border e83a
star_half e839
stars e8d0
stay_current_landscape e0d3
stay_current_portrait e0d4
stay_primary_landscape e0d5
stay_primary_portrait e0d6
stop e047
stop_screen_share e0e3
storage e1db
store e8d1
store_mall_directory e563
straighten e41c
streetview e56e
strikethrough_s e257
style e41d
subdirectory_arrow_left e5d9
subdirectory_arrow_right e5da
subject e8d2
subscriptions e064
subtitles e048
subway e56f
supervisor_account e8d3
surround_sound e049
swap_calls e0d7
swap_horiz e8d4
swap_vert e8d5
swap_vertical_circle e8d6
switch_camera e41e
switch_video e41f
sync e627
sync_disabled e628
sync_problem e629
system_update e62a
system_update_alt e8d7
tab e8d8
tab_unselected e8d9
tablet e32f
tablet_android e330
tablet_mac e331
tag_faces e420
tap_and_play e62b
terrain e564
text_fields e262
text_format e165
textsms e0d8
texture e421
theaters e8da
thumb_down e8db
thumb_up e8dc
thumbs_up_down e8dd
time_to_leave e62c
timelapse e422
timeline e922
timer e425
timer_10 e423
timer_3 e424
timer_off e426
title e264
toc e8de
today e8df
toll e8e0
tonality e427
touch_app e913
toys e332
track_changes e8e1
traffic e565
train e570
tram e571
transfer_within_a_station e572
transform e428
translate e8e2
trending_down e8e3
trending_flat e8e4
trending_up e8e5
tune e429
turned_in e8e6
turned_in_not e8e7
tv e333
unarchive e169
undo e166
unfold_less e5d6
unfold_more e5d7
update e923
usb e1e0
verified_user e8e8
vertical_align_bottom e258
vertical_align_center e259
vertical_align_top e25a
vibration e62d
video_call e070
video_label e071
video_library e04a
videocam e04b
videocam_off e04c
videogame_asset e338
view_agenda e8e9
view_array e8ea
view_carousel e8eb
view_column e8ec
view_comfy e42a
view_compact e42b
view_day e8ed
view_headline e8ee
view_list e8ef
view_module e8f0
view_quilt e8f1
view_stream e8f2
view_week e8f3
vignette e435
visibility e8f4
visibility_off e8f5
voice_chat e62e
voicemail e0d9
volume_down e04d
volume_mute e04e
volume_off e04f
volume_up e050
vpn_key e0da
vpn_lock e62f
wallpaper e1bc
warning e002
watch e334
watch_later e924
wb_auto e42c
wb_cloudy e42d
wb_incandescent e42e
wb_iridescent e436
wb_sunny e430
wc e63d
web e051
web_asset e069
weekend e16b
whatshot e80e
widgets e1bd
wifi e63e
wifi_lock e1e1
wifi_tethering e1e2
work e8f9
wrap_text e25b
youtube_searched_for e8fa
zoom_in e8ff
zoom_out e900
zoom_out_map e56b

View File

@ -0,0 +1,36 @@
@font-face {
font-family: 'Material Icons';
font-style: normal;
font-weight: 400;
src: url(MaterialIcons-Regular.eot); /* For IE6-8 */
src: local('Material Icons'),
local('MaterialIcons-Regular'),
url(MaterialIcons-Regular.woff2) format('woff2'),
url(MaterialIcons-Regular.woff) format('woff'),
url(MaterialIcons-Regular.ttf) format('truetype');
}
.material-icons {
font-family: 'Material Icons';
font-weight: normal;
font-style: normal;
font-size: 24px; /* Preferred icon size */
display: inline-block;
line-height: 1;
text-transform: none;
letter-spacing: normal;
word-wrap: normal;
white-space: nowrap;
direction: ltr;
/* Support for all WebKit browsers. */
-webkit-font-smoothing: antialiased;
/* Support for Safari and Chrome. */
text-rendering: optimizeLegibility;
/* Support for Firefox. */
-moz-osx-font-smoothing: grayscale;
/* Support for IE. */
font-feature-settings: 'liga';
}

View File

@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2014-2018 Materialize
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -0,0 +1,91 @@
<p align="center">
<a href="http://materializecss.com/">
<img src="http://materializecss.com/res/materialize.svg" width="150">
</a>
</p>
<h3 align="center">MaterializeCSS</h3>
<p align="center">
Materialize, a CSS Framework based on material design.
<br>
<a href="http://materializecss.com/"><strong>-- Browse the docs --</strong></a>
<br>
<br>
<a href="https://travis-ci.org/Dogfalo/materialize">
<img src="https://travis-ci.org/Dogfalo/materialize.svg?branch=master" alt="Travis CI badge">
</a>
<a href="https://badge.fury.io/js/materialize-css">
<img src="https://badge.fury.io/js/materialize-css.svg" alt="npm version badge">
</a>
<a href="https://cdnjs.com/libraries/materialize">
<img src="https://img.shields.io/cdnjs/v/materialize.svg" alt="CDNJS version badge">
</a>
<a href="https://david-dm.org/Dogfalo/materialize">
<img src="https://david-dm.org/Dogfalo/materialize/status.svg" alt="dependencies Status badge">
</a>
<a href="https://david-dm.org/Dogfalo/materialize#info=devDependencies">
<img src="https://david-dm.org/Dogfalo/materialize/dev-status.svg" alt="devDependency Status badge">
</a>
<a href="https://gitter.im/Dogfalo/materialize">
<img src="https://badges.gitter.im/Join%20Chat.svg" alt="Gitter badge">
</a>
</p>
## Table of Contents
- [Quickstart](#quickstart)
- [Documentation](#documentation)
- [Supported Browsers](#supported-browsers)
- [Changelog](#changelog)
- [Testing](#testing)
- [Contributing](#contributing)
- [Copyright and license](#copyright-and-license)
## Quickstart:
Read the [getting started guide](http://materializecss.com/getting-started.html) for more information on how to use materialize.
- [Download the latest release](https://github.com/Dogfalo/materialize/releases/latest) of materialize directly from GitHub. ([Beta](https://github.com/Dogfalo/materialize/releases/))
- Clone the repo: `git clone https://github.com/Dogfalo/materialize.git` (Beta: `git clone -b v1-dev https://github.com/Dogfalo/materialize.git`)
- Include the files via [cdnjs](https://cdnjs.com/libraries/materialize). More [here](http://materializecss.com/getting-started.html). ([Beta](https://cdnjs.com/libraries/materialize/1.0.0-beta))
- Install with [npm](https://www.npmjs.com): `npm install materialize-css` (Beta: `npm install materialize-css@next`)
- Install with [Bower](https://bower.io): `bower install materialize` ([DEPRECATED](https://bower.io/blog/2017/how-to-migrate-away-from-bower/))
- Install with [Atmosphere](https://atmospherejs.com): `meteor add materialize:materialize` (Beta: `meteor add materialize:materialize@=1.0.0-beta`)
## Documentation
The documentation can be found at <http://materializecss.com>. To run the documentation locally on your machine, you need [Node.js](https://nodejs.org/en/) installed on your computer.
### Running documentation locally
Run these commands to set up the documentation:
```bash
git clone https://github.com/Dogfalo/materialize
cd materialize
npm install
```
Then run `grunt monitor` to compile the documentation. When it finishes, open a new browser window and navigate to `localhost:8000`. We use [BrowserSync](https://www.browsersync.io/) to display the documentation.
### Documentation for previous releases
Previous releases and their documentation are available for [download](https://github.com/Dogfalo/materialize/releases).
## Supported Browsers:
Materialize is compatible with:
- Chrome 35+
- Firefox 31+
- Safari 9+
- Opera
- Edge
- IE 11+
## Changelog
For changelogs, check out [the Releases section of materialize](https://github.com/Dogfalo/materialize/releases) or the [CHANGELOG.md](CHANGELOG.md).
## Testing
We use Jasmine as our testing framework and we're trying to write a robust test suite for our components. If you want to help, [here's a starting guide on how to write tests in Jasmine](CONTRIBUTING.md#jasmine-testing-guide).
## Contributing
Check out the [CONTRIBUTING document](CONTRIBUTING.md) in the root of the repository to learn how you can contribute. You can also browse the [help-wanted](https://github.com/Dogfalo/materialize/labels/help-wanted) tag in our issue tracker to find things to do.
## Copyright and license
Code Copyright 2018 Materialize. Code released under the MIT license.

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,9 @@
{% extends "blog/base.html" %}
{% block content %}
<div class="container">
<div class="row">
<h1>About page!</h1>
</div>
</div>
{% endblock content %}

View File

@ -0,0 +1,90 @@
{% load static %}
<!DOCTYPE html>
<html lang="ger">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<!--Let browser know website is optimized for mobile-->
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<!--Import materialize.css-->
<link type="text/css" rel="stylesheet" href="{% static "blog/materialize/css/materialize.css" %}" media="screen,projection" />
<!-- import materialize icons css -->
<link type="text/css" rel="stylesheet" href="{% static "blog/material-design-icons/iconfont/material-icons.css"%}">
<!-- Import own css -->
<link type="text/css" rel="stylesheet" href="{% static "blog/main.css" %}" media="screen,projection" />
{% if title %}
<title>Bundesdata {{ title }}</title>
{% else %}
<title>Bundesdata</title>
{% endif %}
</head>
<body class="blue-grey lighten-5">
<!-- Dropdown Structure -->
<ul id="dropdown_protocols" class="dropdown-content">
<li><a href="{% url "Protokoll-list" %}"><i class="material-icons left"">insert_drive_file</i>Protokolle</a></li>
<li><a href="{% url "Reden" %}"><i class="material-icons left"">insert_comment</i>Reden</a></li>
</ul>
<nav class="light-green darken-1 nav-extended">
<div class="container">
<div class="nav-wrapper">
<a href="/" class="brand-logo">Bundesdata</a>
<a href="#" data-target="mobile-demo" class="sidenav-trigger"><i class="material-icons">menu</i></a>
<ul class="right hide-on-med-and-down">
<li><a href="{% url "MdBs" %}"><i class="material-icons left"">account_circle</i>MdBs</a></li>
<!-- Dropdown Trigger -->
<li><a class="dropdown-trigger" href="#!" data-target="dropdown_protocols"><i class="material-icons left"">insert_drive_file</i>Dokumente<i class="material-icons right">arrow_drop_down</i></a></li>
<li><a href="{% url "ngram-viewer-jahr" %}"><i class="material-icons left"">trending_up</i>Ngram Viewer</a></li>
<li><a href="{% url "about-page" %}"><i class="material-icons left"">info_outline</i>Info</a></li>
</ul>
</div>
{% block nav-tabs %}
{% endblock nav-tabs %}
</nav>
<ul id="dropdown_protocols_side" class="dropdown-content">
<li><a href="{% url "Protokoll-list" %}"><i class="material-icons left"">insert_drive_file</i>Protokolle</a></li>
<li><a href="{% url "Reden" %}"><i class="material-icons left"">insert_comment</i>Reden</a></li>
</ul>
<ul class="sidenav" id="mobile-demo">
<li><a href="/" class="brand-logo">Bundesdata</a></li>
<li><a href="{% url "MdBs" %}}"><i class="material-icons"">account_circle</i>MdBs</a></li>
<li><a class="dropdown-trigger" href="#!" data-target="dropdown_protocols_side"><i class="material-icons left"">insert_drive_file</i>Dokumente<i class="material-icons right">arrow_drop_down</i></a></li>
<li><a href="{% url "ngram-viewer-jahr" %}"><i class="material-icons"">trending_up</i>Ngram Viewer</a></li>
<li><a href="{% url "about-page" %}"><i class="material-icons"">info_outline</i>Info</a></li>
</ul>
<main>
{% block content %}
{% endblock content %}
</main>
<footer class="page-footer light-green darken-1">
<div class="container">
<div class="row">
<div class="col s12">
<h5 class="white-text">Bundesdata</h5>
<p class="grey-text text-lighten-4">Ein Projekt für Politikinteressierte.</p>
</div>
</div>
</div>
<div class="footer-copyright">
<div class="container">
© 2019 Copyright <a class="grey-text text-lighten-4" href="#">Stephan Porada</a>
<a class="grey-text text-lighten-4 right" href="{% url "impressum" %}">Impressum</a>
</div>
</div>
</footer>
<!-- Optional JavaScript -->
<script src="{% static "blog/materialize/js/materialize.min.js" %}"></script>
<script>
M.AutoInit();
M.Dropdown.init(
document.querySelectorAll('.dropdown-trigger'),
{"coverTrigger": false}
)
</script>
<script src="{% static "blog/chartjs/Chart.bundle.js"%}"></script>
</body>
</html>

View File

@ -0,0 +1,9 @@
{% extends "blog/base.html" %}
{% block content %}
<div class="container">
<div class="row">
<h1>Blog page!</h1>
</div>
</div>
{% endblock content %}

208
app/blog/templates/blog/home.html Executable file
View File

@ -0,0 +1,208 @@
{% extends "blog/base.html" %}
{% load static %}
{% block content %}
<div class="parallax-container">
<div class="parallax"><img src="{% static "/blog/images/4116197.jpg" %}"></div>
<div class="section white hide-on-large-only">
<div class="row container">
<ul class="collection">
<li class="collection-item avatar">
<a href="{% url "MdBs" %}"><i class="material-icons circle blue-grey">account_circle</i>
<span class="title">Abgeordnete (MdBs)</span></a>
<p class="grey-text text-darken-3 lighten-3">Profile aller Mitglieder des Deutschen Bundestags von 1949 bis heute.</p>
<p class="grey-text text-darken-3 lighten-3">Durchsuch- und sortierbare Liste.</p>
</li>
<li class="collection-item avatar">
<a href="{% url "Protokoll-list" %}"><i class="material-icons circle blue-grey">insert_drive_file</i>
<span class="title">Protokolle</span></a>
<p>Liste der Plenarprotokolle von 1949 bis einschließlich 2017.</p>
</li>
<li class="collection-item avatar">
<a href="{% url "Reden" %}"><i class="material-icons circle blue-grey">insert_comment</i>
<span class="title">Reden</span></a>
<p>Übersicht über alle einzelnen Reden und Redebeiträge der Mitglieder des Bundestags von 1949 bis 2017.</p>
</li>
<li class="collection-item avatar">
<a href="{% url "ngram-viewer-jahr" %}"><i class="material-icons circle blue-grey">trending_up</i>
<span class="title">Ngram Viewer</span></a>
<p>Mit dem Ngram Viewer können Begriffshäufigkeiten dargestellt werden.</p>
</li>
</ul>
</div>
</div>
<div class="container hide-on-med-and-down">
<div class="row" style="margin-top: 25px;">
<div class="col s12 m3 l3">
<div class="card medium hoverable">
<div class="card-content">
<p class="center-align"><i class="large material-icons blue-grey-text darken-4"">account_circle</i></p>
<span class=" card-title">Abgeordnete (MdBs)</span>
<p class="grey-text text-darken-3 lighten-3">Profile aller Mitglieder des Deutschen Bundestags von 1949 bis heute.</p>
</div>
<div class="card-action">
<a href="{% url "MdBs" %}"><span class="light-green-text text-darken-4"><i class="material-icons right">send</i>Zu den MdBs</span></a>
</div>
</div>
</div>
<div class="col s12 m3 l3">
<div class="card medium hoverable">
<div class="card-content">
<p class="center-align"><i class="large material-icons blue-grey-text darken-4">insert_drive_file</i></p>
<span class="card-title">Protokolle</span>
<p class="grey-text text-darken-3 lighten-3">Liste der Plenarprotokolle von 1949 bis einschließlich 2017.</p>
</div>
<div class="card-action">
<a href="{% url "Protokoll-list" %}"><span class="light-green-text text-darken-4"><i class="material-icons right">send</i>Zu den Protokollen</span></a>
</div>
</div>
</div>
<div class="col s12 m3 l3">
<div class="card medium hoverable">
<div class="card-content">
<p class="center-align"><i class="large material-icons blue-grey-text darken-4">insert_comment</i></p>
<span class="card-title">Reden</span>
<p class="grey-text text-darken-3 lighten-3">Übersicht über alle einzelnen Reden und Redebeiträge der Mitglieder des Bundestags von 1949 bis 2017.</p>
</div>
<div class="card-action">
<a href="{% url "Reden" %}"><span class="light-green-text text-darken-4"><i class="material-icons right">send</i>Zu den Reden</span></a>
</div>
</div>
</div>
<div class="col s12 m3 l3">
<div class="card medium hoverable">
<div class="card-content">
<p class="center-align"><i class="large material-icons blue-grey-text darken-4">trending_up</i></p>
<span class="card-title">Ngram Viewer</span>
<p class="grey-text text-darken-3 lighten-3">Stellt Begriffshäufigkeiten der Reden über den Zeitram von 1949 bis 2017 dar.</p>
</div>
<div class="card-action">
<a href="{% url "ngram-viewer-jahr" %}"><span class="light-green-text text-darken-4"><i class="material-icons right">send</i>Zum Ngram Viewer</span></a>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="parallax-container">
<div class="section white">
<div class="row">
<div class="container grey-text text-darken-3 lighten-3">
<h4 class="header black-text">Das Projekt</h4>
<p">Das Projekt Bundesdata
möchte die Bundestagsplenarprotokolle für alle
Bürger und Bürgerinnen in einer strukturierten und einfachen Form
zugänglich und analysierbar machen sowie interaktive Statistiken zu diesen liefern.</p>
<p>Für dieses Unterfangen
wurden die von der Bundesregierung bereitgestellten XML-Versionen der
Bundestagsplenarprotokolle automatisch mit zusätzlichen Informationen
versehen und strukturiert. Mit den verschiednen Tools auf Bundesdata
können die Protokolle von 1949 bis 2017 einfach durchsucht und
deren Inhalte analysiert werden.</p>
</div>
</div>
</div>
<div class="parallax"><img src="{% static "/blog/images/4116197.jpg" %}"></div>
</div>
<div class="parallax-container">
<div class="section white">
<div class="row">
<div class="container grey-text text-darken-3 lighten-3">
<h4 class="header black-text">Die Tools</h4>
<p>Politikinteressierte können z.B.
recherchieren, welches <a href="{% url "MdBs" %}">Mitglied des Bundestags</a> zu welcher Zeit der
deutschen Geschichte welche <a href="{% url "Reden"%}">Reden oder Redebeiträge</a> im Bundestag
gehalten hat. Mit Hilfe des <a href="{% url "ngram-viewer-jahr"%}">Ngram Viewers</a> können Begriffe im Verlauf
der Zeit dargestellt werden, um geschichtliche und politische Ereignisse,
die sich in der Sprache der Reden widerspiegeln, sichtbar und analysierbar
zu machen.</p>
<p>Das Projekt ist im Rahmen einer Masterarbeit entstanden.
Mehr Informationen zum Projekt, der Arbeit, der Datengrundlage sowie der
automatischen Auszeichnung mit zusätzlichen Informationen und
Metadaten gibt es auf der <a href="{% url "about-page" %}">Info-Seite</a>.</p>
</div>
</div>
</div>
<div class="parallax"><img src="{% static "/blog/images/4116197.jpg" %}"></div>
</div>
<div class="parallax-container">
<div class="section white">
<div class="row ">
<div class="container grey-text text-darken-3 lighten-3">
<h4 class="header black-text">Datengrundlage</h4>
<p>Die Ausgangsdaten, welche für das Projekt genutzt wurden, sind für
alle Bürger und Bürgerinnen auf der
<a href="https://www.bundestag.de/service/opendata">Webseite des Bundestag</a>
frei zugänglich.</p>
<p>Im Rahmen einer Open
Data-Initiative stellt der deutsche Bundestag alle Plenarprotokolle
sowie die biografischen Daten aller Abgeordneten seit 1949 als
XML-Dateien zur Verfügung.</p>
<p>
Das Projekt Bundesdata umfasst alle XML-Protokolle der Wahlperioden 1.
bis 18. und deckt somit den Zeitraum von 1949 bis 2017 ab.<p>
</div>
</div>
</div>
<div class="parallax"><img src="{% static "/blog/images/4094966.jpg" %}"></div>
</div>
<div class="parallax-container">
<div class="section white">
<div class="row ">
<div class="container grey-text text-darken-3 lighten-3">
<h4 class="header black-text">Automatische Auszeichnung</h4>
<p>Da die von derBundesregierung bereitstellen XML-Protokolle nur wenig
bis keine maschinenlesbare Informationen dazu enthalten, welcher
Abgeordnete oder welche Abgeordnete zu welchem Zeitpunkt einen Redebeitrag
im Bundestag hatte, sind die Ausgangsdaten im Rahmen des Projekts
automatisch mit weiteren Informationen angereichert und strukturiert
worden. Hierfür wurde eine eigene Software entwickelt, die die öffentlich verfügbaren XML-Protokolle automatisch mit zusätzliche Metadaten auszeichnet. Diese Auszeichnung ermöglicht es die Protokolle auf der Website strukturiert darzustellen und durchsuchbar zu machen. Ebenfalls können so auch erst die N-Gramme für den Ngram Viewer berechnet werden.</p>
</div>
</div>
</div>
<div class="parallax"><img src="{% static "/blog/images/4094966.jpg" %}"></div>
</div>
<div class="parallax-container">
<div class="section white">
<div class="row ">
<div class="container grey-text text-darken-3 lighten-3">
<h4 class="header black-text">Quellcode für Software und Webanwendung</h4>
<p>
Der Quellcode für die eigens entwickelte Software, welche die automatische
Auszeichnung erstellt hat, kann auf <a href="https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_software">GitLab</a> eingesehen und
heruntergeladen werden. Der Quellcode für die Webseite ist ebenfalls
auf <a href="https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_web_app">GitLab</a> verfügbar.</p>
</div>
</div>
</div>
<div class="parallax"><img src="{% static "/blog/images/4094966.jpg" %}"></div>
</div>
<div class="parallax-container">
<div class="section white">
<div class="row ">
<div class="container grey-text text-darken-3 lighten-3">
<h4 class="header black-text">Download der ausgezeichneten Daten</h4>
<p>
Die für das Projekt mittels der eigenen Software erstellten XML-Protokolle sowie weitere Forschungsdaten können <a href="https://gitlab.ub.uni-bielefeld.de/sporada/bundesdata_markup_nlp_data"> hier heruntergeladen werden</a>.</p>
</div>
</div>
</div>
<div class="parallax"><img src="{% static "/blog/images/4094966.jpg" %}"></div>
</div>
<div class="parallax-container">
<div class="section white">
<div class="row ">
<div class="container grey-text text-darken-3 lighten-3">
<h4 class="header black-text">Fehlerquoten und Probleme der Ausgangsdaten</h4>
<p>Die automatische Auszeichnung der Protokolle ist nicht gänzlich fehlerfrei.
Somit können Fehler bei der Darstellung der Reden auf der Website auftreten.
Wie hoch genau die einzelen Fehlerqouten sind, sowie weitere Informationen zum
Projekt, der Arbeit, der Datengrundlage und der automatischen Auszeichnung mit
zusätzlichen Informationen und Metadaten gibt es auf der
<a href="{% url "about-page" %}">Info-Seite</a>.</p>
</div>
</div>
</div>
<div class="parallax"><img src="{% static "/blog/images/4094966.jpg" %}"></div>
</div>
{% endblock content %}

View File

@ -0,0 +1,62 @@
{% extends "blog/base.html" %}
{% block content %}
<div class="container">
<div class="row">
<div class="col s12">
<div class="card">
<div class="card-content">
<span class="card-title">Impressum</span>
<b>Angaben gemäß § 5 TMG</b>
<br />
<p>Stephan Porada<br />
Bremer Straße 43<br />
33613 Bielefeld</p>
<br />
<b>Kontakt</b>
<br />
<p>E-Mail: sporada@uni-bielefeld.de</p>
<br />
<b>Verantwortlich für den Inhalt nach § 55 Abs. 2 RStV</b>
<br />
<p>Stephan Porada<br />
Bremer Straße 43<br />
33613 Bielefeld</p>
<br />
<b>Haftung für Inhalte</b>
<br />
<p>Als Diensteanbieter sind wir gemäß § 7 Abs.1 TMG für eigene Inhalte auf diesen Seiten nach den
allgemeinen Gesetzen verantwortlich. Nach §§ 8 bis 10 TMG sind wir als Diensteanbieter jedoch nicht
verpflichtet, übermittelte oder gespeicherte fremde Informationen zu überwachen oder nach Umständen zu
forschen, die auf eine rechtswidrige Tätigkeit hinweisen.
Verpflichtungen zur Entfernung oder Sperrung der Nutzung von Informationen nach den allgemeinen
Gesetzen bleiben hiervon unberührt. Eine diesbezügliche Haftung ist jedoch erst ab dem Zeitpunkt der
Kenntnis einer konkreten Rechtsverletzung möglich. Bei Bekanntwerden von entsprechenden
Rechtsverletzungen werden wir diese Inhalte umgehend entfernen.</p>
<br />
<p>Haftung für Links
Unser Angebot enthält Links zu externen Websites Dritter, auf deren Inhalte wir keinen Einfluss haben.
Deshalb können wir für diese fremden Inhalte auch keine Gewähr übernehmen. Für die Inhalte der
verlinkten Seiten ist stets der jeweilige Anbieter oder Betreiber der Seiten verantwortlich. Die verlinkten
2 / 4Seiten wurden zum Zeitpunkt der Verlinkung auf mögliche Rechtsverstöße überprüft. Rechtswidrige Inhalte
waren zum Zeitpunkt der Verlinkung nicht erkennbar.
Eine permanente inhaltliche Kontrolle der verlinkten Seiten ist jedoch ohne konkrete Anhaltspunkte einer
Rechtsverletzung nicht zumutbar. Bei Bekanntwerden von Rechtsverletzungen werden wir derartige Links
umgehend entfernen.</p>
<br />
<p>Quelle:
eRecht24</p>
<br />
<b>Quelle der Bilder:</b>
<p>
Bild 4094966.jpg auf der Homepage: (c) Deutscher Bundestag / Marc-Steffen Unger <br />
Bild 4116197.jpg auf der Homepage: (c) Deutscher Bundestag / Thomas Köhler/photothek.net <br />
</p>
</div>
</div>
</div>
</div>
</div>
{% endblock content %}

3
app/blog/tests.py Executable file
View File

@ -0,0 +1,3 @@
from django.test import TestCase
# Create your tests here.

9
app/blog/urls.py Executable file
View File

@ -0,0 +1,9 @@
from django.urls import path
from . import views
urlpatterns = [
path("blog/", views.blog, name="blog"),
path("about/", views.about, name="about-page"),
path("", views.home, name="home"),
path("impressum/", views.impressum, name="impressum"),
]

17
app/blog/views.py Executable file
View File

@ -0,0 +1,17 @@
from django.shortcuts import render
def home(request):
return render(request, "blog/home.html", {"title": "Homepage"})
def blog(request):
return render(request, "blog/blog.html")
def about(request):
return render(request, "blog/about.html", {"title": "About"})
def impressum(request):
return render(request, "blog/impressum.html", {"title": "Impressum"})

0
app/bundesdata_app/__init__.py Executable file
View File

163
app/bundesdata_app/settings.py Executable file
View File

@ -0,0 +1,163 @@
"""
Django settings for bundesdata_app project.
Generated by 'django-admin startproject' using Django 2.1.4.
For more information on this file, see
https://docs.djangoproject.com/en/2.1/topics/settings/
For the full list of settings and their values, see
https://docs.djangoproject.com/en/2.1/ref/settings/
"""
import os
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/2.1/howto/deployment/checklist/
# SECURITY WARNING: keep the secret key used in production secret!
# This is just some random genearted key to test the App. If you want to set up your own running public version of this app replace this key with an new one that you will keep secret!
SECRET_KEY = '=7n(1!he%todz-)jo))$upf0(vor9v9ke5rn&fli%6l562!_=0'
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
ALLOWED_HOSTS = ["127.0.0.1", "localhost"]
# Application definition
INSTALLED_APPS = [
'jchart',
'watson',
'django_tables2',
'blog.apps.BlogConfig',
'ngram_viewer.apps.NgramViewerConfig',
'speakers.apps.SpeakersConfig',
'speeches.apps.SpeechesConfig',
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
'watson.middleware.SearchContextMiddleware'
]
ROOT_URLCONF = 'bundesdata_app.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
WSGI_APPLICATION = 'bundesdata_app.wsgi.application'
# Database
# https://docs.djangoproject.com/en/2.1/ref/settings/#databases
# Changes NAME, USER and PASSWORD details before deploying your own public version of this app.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'databaseName',
'USER': 'databaseUserName',
'PASSWORD': 'totalSecurePassword',
'HOST': 'db',
'PORT': '5432',
}
}
# Password validation
# https://docs.djangoproject.com/en/2.1/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]
# Internationalization
# https://docs.djangoproject.com/en/2.1/topics/i18n/
LANGUAGE_CODE = 'de-de'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/2.1/howto/static-files/
STATIC_URL = '/staticfiles/'
STATIC_ROOT = os.path.join(BASE_DIR, 'staticfiles')
# WATSON
WATSON_POSTGRES_SEARCH_CONFIG = "pg_catalog.german"
# LOGGING: uncomment for logging
# LOGGING = {
# 'version': 1,
# 'disable_existing_loggers': False,
# 'handlers': {
# 'file': {
# 'level': 'DEBUG',
# 'class': 'logging.FileHandler',
# 'filename': '/usr/src/app/input_data/debug.log',
# },
# },
# 'loggers': {
# 'django': { # Logger for Django framework code
# 'handlers': ['file'],
# 'level': 'DEBUG',
# 'propagate': True,
# },
# 'ngram_viewer': { # Specific logger for your app
# 'handlers': ['file'],
# 'level': 'DEBUG',
# 'propagate': True,
# },
# },
# }

25
app/bundesdata_app/urls.py Executable file
View File

@ -0,0 +1,25 @@
"""bundesdata_app URL Configuration
The `urlpatterns` list routes URLs to views. For more information please see:
https://docs.djangoproject.com/en/2.1/topics/http/urls/
Examples:
Function views
1. Add an import: from my_app import views
2. Add a URL to urlpatterns: path('', views.home, name='home')
Class-based views
1. Add an import: from other_app.views import Home
2. Add a URL to urlpatterns: path('', Home.as_view(), name='home')
Including another URLconf
1. Import the include() function: from django.urls import include, path
2. Add a URL to urlpatterns: path('blog/', include('blog.urls'))
"""
from django.contrib import admin
from django.urls import path, include
urlpatterns = [
path('admin/', admin.site.urls),
path('ngram-viewer/', include("ngram_viewer.urls")),
path('mdbs/', include("speakers.urls")),
path('protokolle/', include("speeches.urls")),
path('', include("blog.urls")),
]

16
app/bundesdata_app/wsgi.py Executable file
View File

@ -0,0 +1,16 @@
"""
WSGI config for bundesdata_app project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/2.1/howto/deployment/wsgi/
"""
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'bundesdata_app.settings')
application = get_wsgi_application()

15
app/manage.py Executable file
View File

@ -0,0 +1,15 @@
#!/usr/bin/env python
import os
import sys
if __name__ == '__main__':
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'bundesdata_app.settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)

0
app/ngram_viewer/__init__.py Executable file
View File

3
app/ngram_viewer/admin.py Executable file
View File

@ -0,0 +1,3 @@
from django.contrib import admin
# Register your models here.

6
app/ngram_viewer/apps.py Executable file
View File

@ -0,0 +1,6 @@
from django.apps import AppConfig
from watson import search as watson
class NgramViewerConfig(AppConfig):
name = 'ngram_viewer'

92
app/ngram_viewer/charts.py Executable file
View File

@ -0,0 +1,92 @@
from jchart import Chart
from jchart.config import Axes, DataSet, rgba, Tick
from random import randint
class TimeChart(Chart):
"""
Class to configure the N-Gramm Viewer line chart over time. The class function
get_datasets() is used to get the data sets and creates one data set for
each.
"""
chart_type = "line"
responsive = True
scales = {
'xAxes': [Axes(type='time', position="bottom")],
}
def __init__(self):
super(TimeChart, self).__init__()
self.data_sets = None
def get_datasets(self, **kwargs):
if kwargs is not None:
for key, value in kwargs.items():
self.data_sets = value
lable_names = []
data_sets = []
for dict in self.data_sets:
for key, value in dict.items():
lable_names.append(key)
data_sets.append(value)
data_set_objects = []
for lable_name, data_set in zip(lable_names, data_sets):
data_set_objects.append(DataSet(type="line",
label=lable_name,
borderColor=rgba(randint(0,255), randint(0,255), randint(0,255)),
data=data_set,
lineTension=0))
return data_set_objects
class BarChart(Chart):
"""
Class to configure the N-Gramm Viewer bar chart per speaker.
The class function get_datasets() is used to get the data sets and creates
one data set for each.
"""
chart_type = "horizontalBar"
responsive = True
def __init__(self, speaker_range=10):
super(BarChart, self).__init__()
self.data_sets = None
self.speaker_range = int(speaker_range)
self.lable_names = []
self.bar_data = []
self.bar_names = []
def get_labels(self):
try:
tmp_list = self.lable_names
self.lable_names = sum(tmp_list, [])[:self.speaker_range]
except TypeError as e:
pass
return self.lable_names
def create_data(self, **kwargs):
if kwargs is not None:
for key, value in kwargs.items():
self.data_sets = value
for d in self.data_sets:
entry_lable_names = []
entry_bar_data = []
entry_bar_name = []
for key, value in d.items():
for set in value:
entry_lable_names.append(set["x"])
entry_bar_data.append(set["y"])
self.lable_names.append(entry_lable_names)
entry_bar_name.append(key)
self.bar_names.extend(entry_bar_name)
entry_bar_data = entry_bar_data[:self.speaker_range]
self.bar_data.append(entry_bar_data[:self.speaker_range])
def get_datasets(self):
data_set_objects = []
for bar_data, bar_name in zip(self.bar_data, self.bar_names):
data_set_objects.append(DataSet(type="horizontalBar",
label=bar_name,
backgroundColor=rgba(randint(0,255), randint(0,255), randint(0,255)),
data=bar_data[:self.speaker_range]))
return data_set_objects

39
app/ngram_viewer/forms.py Executable file
View File

@ -0,0 +1,39 @@
from django import forms
class NgramForm(forms.Form):
"""
Describes and configures the input html form for the Ngram Viewer per year.
"""
CORPUS_CHOICE = [('lm_ns_year', 'Lemmatisiert ohne Stoppwörter'),
('tk_ws_year', 'Nicht lemmatisiert mit Stoppwörter'),]
query = forms.CharField(label="Suche Ngramme", max_length="200")
case_sensitive = forms.BooleanField(label="case-sensitive", required=False)
search_plus = forms.BooleanField(label="search-plus", required=False)
ignore_missing = forms.BooleanField(label="fill-zeros", required=False)
corpus_choice = forms.ChoiceField(label="Wählen Sie einen Corpus", choices=CORPUS_CHOICE)
class NgramFormSpeaker(forms.Form):
"""
Describes and configures the input html form for the Ngram Viewer per speaker.
"""
CORPUS_CHOICE = [('lm_ns_speaker', 'Lemmatisiert ohne Stoppwörter'),
('tk_ws_speaker', 'Nicht lemmatisiert mit Stoppwörter'),]
query = forms.CharField(label="Suche Ngramm", max_length="200")
case_sensitive = forms.BooleanField(label="case-sensitive", required=False)
search_plus = forms.BooleanField(label="search-plus", required=False)
ignore_missing = forms.BooleanField(label="fill-zeros", required=False)
range = forms.IntegerField(label="Anzahl an Rednern")
corpus_choice = forms.ChoiceField(label="Wählen Sie einen Corpus", choices=CORPUS_CHOICE)
def clean_query(self):
data = self.cleaned_data["query"]
print(data)
if(len(data.split(",")) > 1):
raise forms.ValidationError("Es kann nur ein Ngramm gleichzeitig \
abgefragt werden.")
print(data.split(",")[0])
return data.split(",")[0]
return data

View File

@ -0,0 +1,107 @@
from django.core.management.base import BaseCommand
from ngram_viewer.models import *
from itertools import islice
from datetime import datetime
from tqdm import tqdm
import csv
import fnmatch
import os
class Command(BaseCommand):
help = ("Adds n-grams to the database using the django models"
" syntax. N-grams will be added from csv files with three columns."
" First column is the n-gram string, second column is the key "
" (e.g. year or speaker) and the third column is the counter."
" Input is a path pointing to one n-gram file. The user must specify"
" if the csv is containing 1-grams, 2-grams ... 5-grams with the"
" parameter 'n_grams'.")
def add_arguments(self, parser):
parser.add_argument("n_grams",
type=int,
choices=[1, 2, 3, 4, 5],
help="Tells the script to either import given input\
csv as 1-grams 2-grams etc.")
parser.add_argument("input_folder",
type=str,
help="File path to the csv containing one kind of \
ngrams.")
parser.add_argument("corpus_type",
choices=["lm_ns_year", "tk_ws_year", "lm_ns_speaker",
"tk_ws_speaker"],
help="user has to choose what kind of ngrams will \
be imported. lm_ns: Lemmatized without stopwords or\
tk_ws not lemmatized with stopwords.",
type=str)
parser.add_argument(
"--batch_size",
"-bs",
type=int,
default=1000000,
required=False,
help="Int to set how many rows(entries) should be \
inserted via bulk at once. Default is 1 million.")
def handle(self, *args, **options):
start_time = datetime.now()
self.stdout.write("Start time of script is: " + str(start_time))
folder_path = options["input_folder"]
n_grams = options["n_grams"]
corpus_type = options["corpus_type"]
batch_size = options["batch_size"]
list_of_files = []
for path, subdirs, files in os.walk(folder_path):
for name in files:
if fnmatch.fnmatch(name, "*.csv"):
list_of_files.append(os.path.join(path, name))
list_of_files = sorted(list_of_files)
for file in tqdm(list_of_files, desc="File status"):
with open(file, newline="") as csvfile:
n_gram_reader = csv.reader(csvfile, delimiter="\t")
row_count = sum(1 for row in n_gram_reader) # closes csvfile
iterations = int(row_count/batch_size) + 1
self.stdout.write("Number of rows in csv is: " + str(row_count))
self.stdout.write("Batch size is " + str(batch_size))
self.stdout.write((str(iterations)
+ " iterations are needed to import the"
" data into the database."))
with open(file, newline="") as csvfile: # reopens csvfile
sort_key = os.path.basename(file)[0:1]
if(sort_key == "_"):
sort_key = "_Non_ASCII"
n_gram_reader = csv.reader(csvfile, delimiter="\t")
if(n_grams == 1):
main_class = "One"
elif(n_grams == 2):
main_class = "Two"
elif(n_grams == 3):
main_class = "Three"
elif(n_grams == 4):
main_class = "Four"
elif(n_grams == 5):
main_class = "Five"
model = "Key{}_{}Gram_{}".format(sort_key, main_class, corpus_type)
print(model)
while True:
batch = [globals()[model](ngram=row[0],
key=row[1],
count=row[2])
for row in tqdm(islice(n_gram_reader, batch_size),
desc="Creating batch from row")]
if not batch:
break
self.stdout.write("Starting bulk insert.")
globals()[model].objects.bulk_create(batch, batch_size)
self.stdout.write("---------------------------------------")
end_time = datetime.now()
self.stdout.write("End time of script is: " + str(end_time))
duration = end_time - start_time
self.stdout.write("Duration of script is: " + str(duration))

View File

9639
app/ngram_viewer/models.py Executable file

File diff suppressed because it is too large Load Diff

287
app/ngram_viewer/ngram_search.py Executable file
View File

@ -0,0 +1,287 @@
from datetime import datetime
from ngram_viewer.models import *
from speakers.models import Speaker
from watson import search as watson
from collections import defaultdict, OrderedDict
import logging
class NgramSearch(object):
"""
Class that handles the search for ngrams per year. Inputs are the user query
and search options. User query will be splitted and every split will be used
as a single query. Every singel query returns a QuerySet which will be
searched again with a regex to either match full words or partial words.
New regex evaluated QuerySets will be returned. Data from those will be
retrived and converted to valid chart.js data sets. Besides the query the
user can pass some search options to the class like case sensitive and case
insensitve. This Class handles search per year which is kind of the default.
"""
def __init__(self, clean_data):
super(NgramSearch, self).__init__()
self.cs_query = clean_data["query"]
self.case_sensitive = clean_data["case_sensitive"]
self.search_plus = clean_data["search_plus"]
self.ignore_missing = clean_data["ignore_missing"]
self.corpus_choice = clean_data["corpus_choice"]
self.sub_querys_dict = defaultdict(list)
self.filtered_sets_dict = defaultdict(list)
self.raw_data = []
def get_time_from_year_str(self, query_data, date_format="%Y"):
"""
This function creates a valid datetime object from an input string.
Works with strings consisting of %Y, %Y-%m or %Y.%m.%d. Not needed for
now.
"""
for ngram_dict in query_data:
for key in ngram_dict:
data_series = ngram_dict[key]
for value_pair in data_series:
valid_time = datetime.strptime(value_pair["x"], date_format)
valid_time_str = valid_time.strftime("%Y-%m-%dT%H:%M:%S")
value_pair["x"] = valid_time_str
return query_data
def get_sub_querys(self):
"""
This function takes the comma separated query string and splits it into
the needed substring and sorts them into a dictionary according to their
length to distinguish between unigrams, bigrams and so on.
"""
# Some checks to see if the input query is valid
if(self.cs_query.startswith(",")):
self.cs_query = self.cs_query[1:]
elif(self.cs_query.endswith(",")):
self.cs_query = self.cs_query[:-1]
logger = logging.getLogger(__name__)
sub_querys = self.cs_query.split(",")
logger.info(sub_querys)
sub_querys_stripped = []
for sub_query in sub_querys:
if(sub_query.startswith(" ")):
sub_querys_stripped.append(sub_query[1:])
elif(sub_query.endswith(" ")):
sub_querys_stripped.append(sub_query[:-1])
else:
sub_querys_stripped.append(sub_query)
sub_querys_dict = defaultdict(list)
for sub_query in sub_querys_stripped:
# Checks for words starting with german Umlaut or special characters like "§$%&"
sort_key = sub_query[0].upper()
if(sort_key in ["Ä", "Ö", "Ü"]):
sort_key = "_Non_ASCII"
elif(sort_key.isascii() is True and sort_key.isalnum() is False):
sort_key = "_Non_ASCII"
elif(not sort_key.isascii()):
sort_key = "_Non_ASCII"
else:
sort_key = sort_key
if(len(sub_query.split()) == 1):
main_class = "One"
elif(len(sub_query.split()) == 2):
main_class = "Two"
elif(len(sub_query.split()) == 3):
main_class = "Three"
elif(len(sub_query.split()) == 4):
main_class = "Four"
elif(len(sub_query.split()) == 5):
main_class = "Five"
else:
sub_querys_dict["invalid"].append(sub_query)
continue
model = "Key{}_{}Gram_{}".format(sort_key,
main_class,
self.corpus_choice)
model = globals()[model]
sub_querys_dict[model].append(sub_query)
self.sub_querys_dict = sub_querys_dict
def enhanced_search(self):
"""
This function takes the sub_querys_dict and searches the database for every
subquery and returns QuerySets for those. In a second step the QuerySets
will be searched again with a regex to assure that QuerySets only contain
objects with an exact word match.
"""
# first broad search to catch every entry containing the query
# Without enhanced search syntax
if(self.search_plus is False):
query_sets_dict = defaultdict(list)
for key, values in self.sub_querys_dict.items():
if(key != "invalid"):
for value in values:
query_set = key.objects.filter(ngram__icontains=value) # Case-insensitve. Checks for entires that somehow contain the input string. Equal to LIKE SQL syntax. Should be faster than exact match and the QuerySet can be used for more specific search operations.
query_sets_dict[key].append((query_set, value))
# Case-insensitive exact match of entries
if(self.case_sensitive is False):
filtered_sets_dict = defaultdict(list)
for key, query_sets in query_sets_dict.items():
for query_set in query_sets:
r_filtered = query_set[0].filter(ngram__iexact=query_set[1]) # Matches entries that contain the exact query
filtered_sets_dict[key].append((r_filtered, query_set[1]))
# Case-sensitive exact match of entries
elif(self.case_sensitive is True):
filtered_sets_dict = defaultdict(list)
for key, query_sets in query_sets_dict.items():
for query_set in query_sets:
r_filtered = query_set[0].filter(ngram__exact=query_set[1]) # Matches entries that contain the exact query
filtered_sets_dict[key].append((r_filtered, query_set[1]))
# With enhanced search syntax
elif(self.search_plus is True):
# Case-insensitive exact match of entries
if(self.case_sensitive is False):
filtered_sets_dict = defaultdict(list)
for key, values in self.sub_querys_dict.items():
if(key != "invalid"):
for value in values:
if(value.endswith("__")):
r_filtered = key.objects.filter(ngram__iexact=value[:-2])
else:
r_filtered = key.objects.filter(ngram__iregex=value) # Matches entries that contain regex query case-insensitive
filtered_sets_dict[key].append((r_filtered, value))
# Case-sensitive exact match of entries
elif(self.case_sensitive is True):
filtered_sets_dict = defaultdict(list)
for key, values in self.sub_querys_dict.items():
if(key != "invalid"):
for value in values:
if(value.endswith("__")):
r_filtered = key.objects.filter(ngram__exact=value[:-2])
else:
r_filtered = key.objects.filter(ngram__regex=value) # Matches entries that contain regex query case-sensitive
filtered_sets_dict[key].append((r_filtered, value))
self.filtered_sets_dict = filtered_sets_dict
def query_sets_to_data(self):
"""
Converts QuerySets to data dictionaries. Fills missing years with zero
value counts for ngrams. Also sums upper and lower case n-grams to one ngram
with one count.
"""
data = []
for key, query_sets in self.filtered_sets_dict.items():
for query_set in query_sets:
data_line = {}
for ngram in query_set[0]:
if ngram.key in data_line:
data_line[ngram.key] += ngram.count
# print(ngram.key, ngram.count, ngram.one_gram)
else:
data_line[ngram.key] = ngram.count
# print(ngram.key, ngram.count, ngram.one_gram)
# print(data_line)
data.append({query_set[1]: data_line})
# checks for missing years and fills the mwith zero
if(self.ignore_missing is False):
years = [year for year in range(1949, 2018)]
for data_line in data:
for key, values in data_line.items():
for year in years:
if(str(year) not in values):
values[str(year)] = 0
data_line[key] = dict(sorted(values.items()))
elif(self.ignore_missing is True):
for data_line in data:
for key, values in data_line.items():
data_line[key] = dict(sorted(values.items()))
self.raw_data = data
def convert_to_data_set(self):
"""
Converts the cleaned data from query_sets_to_data into valid chart.js
data set json like objects.
"""
data_set = []
for data_line in self.raw_data:
data_set_line = defaultdict(list)
for key, values in data_line.items():
for year, count in values.items():
new_data_point = {}
new_data_point["y"] = count
new_data_point["x"] = year
data_set_line[key].append(new_data_point)
data_set.append(data_set_line)
self.data_set = data_set
class NgramSearchSpeaker(NgramSearch):
"""
Class that handles the search for ngrams per speaker. Inputs are the user
query and search options. User query will be splitted and every split will
be used as a single query. Every singel query returns a QuerySet which will
be searched again with a regex to either match full words or partial words.
New regex evaluated QuerySets will be returned. Data from those will be
retrived and converted to valid chart.js data sets. Besides the query the
user can pass some search options to the class like case sensitive and case
insensitve. Inherits from NgramSearch.
"""
def __init__(self, clean_data):
super(NgramSearch, self).__init__()
self.cs_query = clean_data["query"].split(",")[0]
self.case_sensitive = clean_data["case_sensitive"]
self.search_plus = clean_data["search_plus"]
self.ignore_missing = clean_data["ignore_missing"]
self.corpus_choice = clean_data["corpus_choice"]
self.sub_querys_dict = defaultdict(list)
self.filtered_sets_dict = defaultdict(list)
self.raw_data = []
def get_speaker_name(self, query_data):
"""
This function takes the speaker ID and gets the corresponding speaker
name.
"""
for ngram_dict in query_data:
for key in ngram_dict:
data_series = ngram_dict[key]
for value_pair in data_series:
speaker_id = value_pair["x"]
if(speaker_id != "None"):
speaker_details = Speaker.objects.get(pk=speaker_id)
value_pair["x"] = (speaker_id
+ ": "
+ speaker_details.first_name
+ " "
+ speaker_details.last_name
+ " ({})".format(speaker_details.party))
elif(speaker_id == "None"):
value_pair["x"] = "Redner nicht identifiziert."
return query_data
def query_sets_to_data(self):
"""
Converts QuerySets to data dictionaries. Fills missing years with zero
value counts for ngrams. Also sums upper and lower case n-grams to one ngram
with one count.
"""
data = []
for key, query_sets in self.filtered_sets_dict.items():
for query_set in query_sets:
data_line = {}
for ngram in query_set[0]:
if ngram.key in data_line:
data_line[ngram.key] += ngram.count
# print(ngram.key, ngram.count, ngram.one_gram)
else:
data_line[ngram.key] = ngram.count
# print(ngram.key, ngram.count, ngram.one_gram)
# print(data_line)
data.append({query_set[1]: data_line})
for d in data:
for key, value in d.items():
value = OrderedDict(sorted(value.items(), key=lambda t: t[1], reverse=True))
value = dict(value)
d[key] = value
self.raw_data = data

View File

@ -0,0 +1,96 @@
{% extends "blog/base.html" %}
{% block nav-tabs %}
<div class="nav-content">
<ul class="tabs tabs-transparent">
<li class="tab"><a target="_self" href="{% url "ngram-viewer-jahr" %}">Pro Jahr</a></li>
<li class="tab"><a target="_self" class="active" href="{% url "ngram-viewer-sprecher" %}">Pro MdB</a></li>
</ul>
</div>
{% endblock nav-tabs %}
{% block content %}
<div class="row">
<div class="col s12 m12 l4">
<div class="card">
<div class="card-content">
<span class="card-title center-align">Suchoptionen</span>
<div class="row">
<form method="GET" class="col s12">
{% csrf_token %}
{% if errors %}
<p class="red-text text-darken-2">Es kann nur jeweils ein Ngramm gesucht werden.</p>
{% endif %}
<div class="input-field col s12">
<i class="material-icons prefix">search</i>
<input id="id_query" type="text" name="{{form.query.html_name}}" class="autocomplete materialize-textarea validate" {% if form.query.value != None %}value = "{{form.query.value}}" {% else %}value = "Ausländer" {% endif %}}>
<label for="id_query">{{form.query.label}}</label>
<button class="btn waves-effect waves-light right light-green darken-3" type="submit" name="ngram-search">Suche
<i class="material-icons right">send</i>
</button>
</div>
<br />
<br />
Corpus:{{form.corpus_choice}}
<div class="section">
<div class="switch section ">
<span>Case-sensitive Suche:</span>
<div style="float: right;">
Aus
<label>
<input type="checkbox" name="{{form.case_sensitive.html_name}}" class="filled-in" {% if form.case_sensitive.value == True %}checked = "checked" {% endif %} />
<span class="lever"></span>
</label>
Ein
</div>
</div>
<div class="divider"></div>
<div class="switch section">
<span>Erweiterter Suchsyntax: <a class="tooltipped" data-position="bottom" data-tooltip="Ist diese Option aktiviert, kann die PostgreSQL interne regex-Syntax für die einzelnen Suchanfragen verwendet werden. Allerdings kann diese nur an Wortenden ('Asyl\w*') verwendet werden. Wörter können am Wortende mit '__' ('Krieg__') quasi escaped werden, so dass diese nicht als regulärer Ausdruck interpretiert werden."><i
class="material-icons tiny blue-grey-text darken-4">info_outline</i></a></span>
<div style="float: right;">
Aus
<label>
<input type="checkbox" name="{{form.search_plus.html_name}}" class="filled-in" {% if form.search_plus.value == True %}checked = "checked" {% endif %} />
<span class="lever"></span>
</label>
Ein
</div>
</div>
<div class="divider"></div>
<div class="section">
<div class="input-field col s12">
<i class="material-icons prefix">filter_9_plus</i>
<input id="id_query" type="text" name="{{form.range.html_name}}" class="autocomplete materialize-textarea validate" {% if form.range.value != None %}value = "{{form.range.value}}" {% else %}value = "10" {% endif %}}>
<label for="id_range">{{form.range.label}}</label>
</div>
</div>
<div class="divider"></div>
</div>
</form>
</div>
</div>
</div>
<ul class="collapsible">
<li>
<div class="collapsible-header"><i class="material-icons blue-grey-text darken-4">info_outline</i>Hilfe und Hinweise</div>
<div class="collapsible-body white">
<h6>Muster der Suchanfrage<h6>
<p></p>
<h6>Suchgeschwindigkeit<h6>
<p></p>
</div>
</li>
</ul>
</div>
<div class="col s12 m12 l8">
<div class="card">
<div class="card-content">
<span class="card-title">Graph</span>
{{ bar_chart.as_html }}
</div>
</div>
</div>
{% endblock content %}

View File

@ -0,0 +1,96 @@
{% extends "blog/base.html" %}
{% block nav-tabs %}
<div class="nav-content">
<ul class="tabs tabs-transparent">
<li class="tab"><a target="_self" class="active" href="{% url "ngram-viewer-jahr" %}">Pro Jahr</a></li>
<li class="tab"><a target="_self" href="{% url "ngram-viewer-sprecher" %}">Pro MdB</a></li>
</ul>
</div>
{% endblock nav-tabs %}
{% block content %}
<div class="row">
<div class="col s12 m12 l4">
<div class="card">
<div class="card-content">
<span class="card-title center-align">Suchoptionen</span>
<div class="row">
<form method="GET" class="col s12">
{% csrf_token %}
<div class="input-field col s12">
<i class="material-icons prefix">search</i>
<input id="id_query" type="text" name="{{form.query.html_name}}" class="autocomplete materialize-textarea validate" {% if form.query.value != None %}value = "{{form.query.value}}" {% else %}value = "Kroatien, Krieg, Asyl" {% endif %}}>
<label for="id_query">{{form.query.label}}</label>
<button class="btn waves-effect waves-light right light-green darken-3" type="submit" name="ngram-search">Suche
<i class="material-icons right">send</i>
</button>
</div>
<br />
<br />
Corpus:{{form.corpus_choice}}
<div class="section">
<div class="switch section ">
<span>Case-sensitive Suche:</span>
<div style="float: right;">
Aus
<label>
<input type="checkbox" name="{{form.case_sensitive.html_name}}" class="filled-in" {% if form.case_sensitive.value == True %}checked = "checked" {% endif %} />
<span class="lever"></span>
</label>
Ein
</div>
</div>
<div class="divider"></div>
<div class="switch section">
<span>Erweiterter Suchsyntax: <a class="tooltipped" data-position="bottom" data-tooltip="Ist diese Option aktiviert, kann die PostgreSQL interne regex-Syntax für die einzelnen Suchanfragen verwendet werden. Allerdings kann diese nur an Wortenden ('Asyl\w*') verwendet werden. Wörter können am Wortende mit '__' ('Krieg__') quasi escaped werden, so dass diese nicht als regulärer Ausdruck interpretiert werden."><i
class="material-icons tiny blue-grey-text darken-4">info_outline</i></a></span>
<div style="float: right;">
Aus
<label>
<input type="checkbox" name="{{form.search_plus.html_name}}" class="filled-in" {% if form.search_plus.value == True %}checked = "checked" {% endif %} />
<span class="lever"></span>
</label>
Ein
</div>
</div>
<div class="divider"></div>
<div class="switch section">
<span>Fehlende Daten ignorieren: <a class="tooltipped" data-position="bottom" data-tooltip="Ist diese Option aus, werden Jahre, die das gesuchte Ngramm nicht enthalten mit Nullwerten auf gefüllt. Wird diese Option aktiviert, werden Jahre, die das gesuchte Ngramm nicht enthalten ignoriert."><i
class="material-icons tiny blue-grey-text darken-4">info_outline</i></a></span>
<div style="float: right;">
Aus
<label>
<input type="checkbox" name="{{form.ignore_missing.html_name}}" class="filled-in" {% if form.ignore_missing.value == True %}checked="checked" {% endif %} />
<span class="lever"></span>
</label>
Ein
</div>
</div>
<div class="divider"></div>
</div>
</form>
</div>
</div>
</div>
<ul class="collapsible">
<li>
<div class="collapsible-header"><i class="material-icons blue-grey-text darken-4">info_outline</i>Hilfe und Hinweise</div>
<div class="collapsible-body white">
<h6>Muster der Suchanfrage<h6>
<p></p>
<h6>Suchgeschwindigkeit<h6>
<p></p>
</div>
</li>
</ul>
</div>
<div class="col s12 m12 l8">
<div class="card">
<div class="card-content">
<span class="card-title">Graph</span>
{{ line_chart.as_html}}
</div>
</div>
</div>
{% endblock content %}

3
app/ngram_viewer/tests.py Executable file
View File

@ -0,0 +1,3 @@
from django.test import TestCase
# Create your tests here.

7
app/ngram_viewer/urls.py Executable file
View File

@ -0,0 +1,7 @@
from django.urls import path
from . import views
urlpatterns = [
path("pro-jahr/", views.ngram_viewer_year, name="ngram-viewer-jahr"),
path("pro-mdb/", views.ngram_viewer_speaker, name="ngram-viewer-sprecher")
]

94
app/ngram_viewer/views.py Executable file
View File

@ -0,0 +1,94 @@
from django.shortcuts import render
from .charts import TimeChart, BarChart
from .forms import NgramForm, NgramFormSpeaker
from .ngram_search import NgramSearch, NgramSearchSpeaker
# import logging
def ngram_viewer_year(request):
# logger = logging.getLogger(__name__)
if(request.method == "GET"):
form = NgramForm(request.GET)
if(form.is_valid()):
clean_data = form.cleaned_data
search = NgramSearch(clean_data)
search.get_sub_querys()
search.enhanced_search()
search.query_sets_to_data()
search.convert_to_data_set()
line_chart = TimeChart()
line_chart.get_datasets(data_sets=search.data_set)
context = {"title": "Ngram Viewer für: " + clean_data["query"],
"form": form, "line_chart": line_chart}
# logger.info(search.sub_querys_dict)
# logger.info(search.filtered_sets_dict)
# logger.info(search.raw_data)
# logger.info(search.data_set)
return render(request,
"ngram_viewer/ngram_viewer_year.html",
context)
else:
form = NgramForm()
clean_data = {'query': 'Asyl, Kroatien, Krieg',
'case_sensitive': False,
'search_plus': False,
'ignore_missing': False,
'corpus_choice': 'lm_ns_year'}
search = NgramSearch(clean_data)
search.get_sub_querys()
search.enhanced_search()
search.query_sets_to_data()
search.convert_to_data_set()
line_chart = TimeChart()
line_chart.get_datasets(data_sets=search.data_set)
context = {"title": "Ngram Viewer pro Jahr für: " + clean_data["query"],
"form": form, "line_chart": line_chart}
return render(request,
"ngram_viewer/ngram_viewer_year.html",
context)
def ngram_viewer_speaker(request):
if(request.method == "GET"):
form = NgramFormSpeaker(request.GET)
if(form.is_valid()):
clean_data = form.cleaned_data
search = NgramSearchSpeaker(clean_data)
search.get_sub_querys()
search.enhanced_search()
search.query_sets_to_data()
search.convert_to_data_set()
speaker_data = search.get_speaker_name(search.data_set)
bar_chart = BarChart(clean_data["range"])
bar_chart.create_data(data_sets=speaker_data)
bar_chart.get_datasets()
bar_chart.get_labels()
context = {"title": "Ngram Viewer für: " + clean_data["query"],
"form": form, "bar_chart": bar_chart}
return render(request,
"ngram_viewer/ngram_viewer_speaker.html",
context)
else:
errors = form.errors
form = NgramFormSpeaker()
clean_data = {'query': 'Ausländer',
'case_sensitive': False,
'search_plus': False,
'ignore_missing': False,
'corpus_choice': 'lm_ns_speaker',
'range': '10'}
search = NgramSearchSpeaker(clean_data)
search.get_sub_querys()
search.enhanced_search()
search.query_sets_to_data()
search.convert_to_data_set()
speaker_data = search.get_speaker_name(search.data_set)
bar_chart = BarChart(clean_data["range"])
bar_chart.create_data(data_sets=speaker_data)
bar_chart.get_datasets()
bar_chart.get_labels()
context = {"title": "Ngram Viewer pro MdB für: " + clean_data["query"],
"form": form, "bar_chart": bar_chart, "errors": errors}
return render(request,
"ngram_viewer/ngram_viewer_speaker.html",
context)

0
app/speakers/__init__.py Executable file
View File

3
app/speakers/admin.py Executable file
View File

@ -0,0 +1,3 @@
from django.contrib import admin
# Register your models here.

10
app/speakers/apps.py Executable file
View File

@ -0,0 +1,10 @@
from django.apps import AppConfig
from watson import search as watson
class SpeakersConfig(AppConfig):
name = 'speakers'
def ready(self):
Speaker = self.get_model("Speaker")
watson.register(Speaker, exclude=["short_vita"])

11
app/speakers/forms.py Executable file
View File

@ -0,0 +1,11 @@
from django import forms
class SearchForm(forms.Form):
"""
Configures the html input form for the speaker search.
"""
query = forms.CharField(label="Suche MdB", max_length="200")
query.widget.attrs.update({"class": "autocomplete materialize-textarea",
"id": "icon_prefix2"})

View File

@ -0,0 +1,115 @@
from django.core.management.base import BaseCommand
from speakers.models import Speaker, LegislativeInfo, LegislativeInstitution
import datetime
from lxml import etree
from tqdm import tqdm
class Command(BaseCommand):
help = ("Adds speakers (MdBs) to the database using the django models"
" syntax. Speakers will be added from the official"
" Stammdatenbank.xml. Input is the Stammdatenbank.xml specified"
" by a path.")
def add_arguments(self, parser):
parser.add_argument("input_path",
type=str)
def handle(self, *args, **options):
file_path = options["input_path"]
# self.stdout.write("Reading data from file: " + file_path)
tree = etree.parse(file_path)
speakers = tree.xpath("//MDB")
for speaker_element in tqdm(speakers, desc="Importing speaker data"):
speaker = Speaker()
id = speaker_element.xpath("./ID")[0]
speaker.id = id.text
last_name = speaker_element.xpath("./NAMEN/NAME/NACHNAME")[0]
speaker.last_name = last_name.text
first_name = speaker_element.xpath("./NAMEN/NAME/VORNAME")[0]
speaker.first_name = first_name.text
nobility = speaker_element.xpath("./NAMEN/NAME/ADEL")[0]
speaker.nobility = nobility.text
name_prefix = speaker_element.xpath("./NAMEN/NAME/PRAEFIX")[0]
speaker.name_prefix = name_prefix.text
# self.stdout.write("Reading data for speaker: "
# + str(id.text)
# + " "
# + str(first_name.text)
# + " "
# + str(last_name.text))
title = speaker_element.xpath("./NAMEN/NAME/ANREDE_TITEL")[0]
speaker.title = title.text
birthday = speaker_element.xpath("./BIOGRAFISCHE_ANGABEN/GEBURTSDATUM")[0]
speaker.birthday = birthday.text
birthplace = speaker_element.xpath("./BIOGRAFISCHE_ANGABEN/GEBURTSORT")[0]
speaker.birthplace = birthplace.text
country_of_birth = speaker_element.xpath("./BIOGRAFISCHE_ANGABEN/GEBURTSLAND")[0]
speaker.country_of_birth = country_of_birth.text
day_of_death = speaker_element.xpath("./BIOGRAFISCHE_ANGABEN/STERBEDATUM")[0]
speaker.day_of_death = day_of_death.text
occupation = speaker_element.xpath("./BIOGRAFISCHE_ANGABEN/BERUF")[0]
speaker.occupation = occupation.text
short_vita = speaker_element.xpath("./BIOGRAFISCHE_ANGABEN/VITA_KURZ")[0]
speaker.short_vita = short_vita.text
party = speaker_element.xpath("./BIOGRAFISCHE_ANGABEN/PARTEI_KURZ")[0]
speaker.party = party.text
speaker.save()
legislative_periods = speaker_element.xpath("./WAHLPERIODEN/WAHLPERIODE/WP")
legislative_period_start_dates = speaker_element.xpath("./WAHLPERIODEN/WAHLPERIODE/MDBWP_VON")
legislative_period_end_dates = speaker_element.xpath("./WAHLPERIODEN/WAHLPERIODE/MDBWP_BIS")
mandate_types = speaker_element.xpath("./WAHLPERIODEN/WAHLPERIODE/MANDATSART")
legislative_institutions = speaker_element.xpath("./WAHLPERIODEN/WAHLPERIODE/INSTITUTIONEN/INSTITUTION/INS_LANG")
zipped_infos = zip(legislative_periods,
legislative_period_start_dates,
legislative_period_end_dates,
mandate_types,
legislative_institutions)
for p, sd, ed, m, i in zipped_infos:
legislative_info = LegislativeInfo()
legislative_info.foreign_speaker = speaker
legislative_info.legislative_period = p.text
if(sd.text is not None):
sd = datetime.datetime.strptime(sd.text, "%d.%m.%Y")
sd = datetime.datetime.strftime(sd, "%Y-%m-%d")
legislative_info.legislative_period_start_date = sd
if(ed.text is not None):
ed = datetime.datetime.strptime(ed.text, "%d.%m.%Y")
ed = datetime.datetime.strftime(ed, "%Y-%m-%d")
legislative_info.legislative_period_end_date = ed
legislative_info.mandate_type = m.text
# legislative_info.legislative_institution = i.text
legislative_info.save()
for period in speaker_element.xpath("./WAHLPERIODEN/WAHLPERIODE"):
# print("==============")
legislative_institutions = period.xpath("./INSTITUTIONEN/INSTITUTION/INS_LANG")
# print([e.text for e in legislative_institutions])
instition_start_dates = period.xpath("./INSTITUTIONEN/INSTITUTION/MDBINS_VON")
# print([e.text for e in instition_start_dates])
instition_end_dates = period.xpath("./INSTITUTIONEN/INSTITUTION/MDBINS_BIS")
# print([e.text for e in instition_end_dates])
# print("==============")
zipped_institutions = zip(legislative_institutions,
instition_start_dates,
instition_end_dates)
for institution, start_date, end_date in zipped_institutions:
legislative_institution = LegislativeInstitution()
legislative_institution.foreign_speaker = speaker
current_period = period.xpath("./WP")[0]
legislative_institution.current_period = current_period.text
legislative_institution.institution = institution.text
if(start_date.text is not None):
start_date = datetime.datetime.strptime(start_date.text,
"%d.%m.%Y")
start_date = datetime.datetime.strftime(start_date,
"%Y-%m-%d")
legislative_institution.institution_start_date = start_date
if(end_date.text is not None):
end_date = datetime.datetime.strptime(end_date.text,
"%d.%m.%Y")
end_date = datetime.datetime.strftime(end_date,
"%Y-%m-%d")
legislative_institution.institution_end_date = end_date
legislative_institution.save()

View File

74
app/speakers/models.py Executable file
View File

@ -0,0 +1,74 @@
from django.db import models
class Speaker(models.Model):
"""
This models contains general data about one MdB. Data will be imported from
the Stammdatenbank.xml via the custom django-admin command import_speakers.py.
"""
id = models.IntegerField(verbose_name="MdB ID", primary_key=True)
last_name = models.CharField(verbose_name="Nachname", max_length=50)
first_name = models.CharField(verbose_name="Vorname", max_length=50)
nobility = models.CharField(verbose_name="Adelstitel", max_length=50,
null=True)
name_prefix = models.CharField(verbose_name="Namenspräfix", max_length=50,
null=True)
title = models.CharField(verbose_name="Akademischer Titel", null=True,
blank=True, max_length=50)
birthday = models.IntegerField(verbose_name="Geburtstag", )
birthplace = models.CharField(verbose_name="Geburtsort", null=True,
blank=True, max_length=50)
country_of_birth = models.CharField(verbose_name="Geburtsland", null=True,
blank=True, max_length=50)
day_of_death = models.IntegerField(verbose_name="Todesjahr", null=True,
blank=True)
occupation = models.TextField(verbose_name="Beruf")
short_vita = models.TextField(verbose_name="Kurzbiographie", default=None,
null=True, blank=True)
party = models.CharField(verbose_name="Partei", null=True, blank=True,
max_length=50)
def __str__(self):
return str(self.id) + " " + self.first_name + " " + self.last_name
class LegislativeInfo(models.Model):
"""
This model contains data about the periods an MdB was an active part of the
Deutsche Bundestag. Needs a foreign key which is the coresponding Speaker
entry.
"""
foreign_speaker = models.ForeignKey("Speaker", on_delete=models.CASCADE)
legislative_period = models.IntegerField(verbose_name="Wahlperiode",
null=True)
legislative_period_start_date = models.DateField(verbose_name="MdB von",
null=True)
legislative_period_end_date = models.DateField(verbose_name="MdB bis",
null=True)
mandate_type = models.CharField(verbose_name="Mandatsart", null=True,
blank=True, max_length=50)
def __str__(self):
return str(self.foreign_speaker) + " " + str(self.legislative_period)
class LegislativeInstitution(models.Model):
"""
This model contains data about the instituions an MdB was part of during a
specific legislative period. Needs a foreign key which is the coresponding
Speaker entry.
"""
foreign_speaker = models.ForeignKey("Speaker",
on_delete=models.CASCADE)
current_period = models.IntegerField(verbose_name="Wahlperiode",
null=True)
institution = models.CharField(verbose_name="Institut", null=True,
blank=True, max_length=255)
institution_start_date = models.DateField(verbose_name="Mitglied von",
null=True)
institution_end_date = models.DateField(verbose_name="Mitglied bis",
null=True)
def __str__(self):
return str(self.foreign_legislative_info) + " " + str(self.institution)

19
app/speakers/tables.py Executable file
View File

@ -0,0 +1,19 @@
import django_tables2 as tables
from .models import Speaker
from django_tables2.utils import A # alias for Accessor
class SpeakerTable(tables.Table):
"""
Configures the table showing all speakers. Inserts a column with links to
the profile of one speaker. Also defines all shown columns.
"""
link = tables.LinkColumn("MdB", text="Profil", args=[A("id")],
orderable=False,
attrs={"a": {"class": "waves-effect waves-light btn light-green darken-3"}}) # Adds colum with Link to Profile
class Meta:
model = Speaker
fields = ("last_name", "first_name", "party", "id")
template_name = "speakers/table.html"
empty_text = ("Für den eingegebenen Suchbegriff gibt es leider keine Ergebnisse.")

View File

@ -0,0 +1,120 @@
{% extends "blog/base.html" %}
{% load render_table from django_tables2 %}
{% block content %}
<div class="container">
<div class="row">
<div class="col s12 m4 l4">
<div class="card">
<div class="card-content">
<span class="card-title center-align">
{% if current_speaker.title %}
{{current_speaker.title}}
{% endif %}
{% if current_speaker.nobility %}
{{current_speaker.nobility}}
{% endif %}
{{current_speaker.first_name}}
{% if current_speaker.name_prefix %}
{{current_speaker.name_prefix}}
{% endif %}
{{current_speaker.last_name}}</span>
<p class="center-align"><i class="large material-icons blue-grey-text darken-4">account_circle</i></p>
<span class="card-title">Biographie</span>
<ul>
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">cake</i> Geburtstag: {{current_speaker.birthday}}</li>
<br />
{% if current_speaker.day_of_death %}
<li><b style="font-size: 2em;" class="blue-grey-text darken-4"></b><span style="position: relative; right: -20px;"> Todesjahr:
{{ current_speaker.day_of_death}}</span></li>
<br />
{% endif %}
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">home</i> Geburtsort: {{current_speaker.birthplace}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">work</i> Beruf: {{current_speaker.occupation}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">fingerprint</i> Bundestags ID: {{current_speaker.id}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">people</i> Partei: {{current_speaker.party}}<a class="tooltipped" data-position="bottom" data-tooltip="Aktuelle eingetragene Partei, bei der die Person Mitglied ist oder war. "><i
class="material-icons blue-grey-text darken-4">info_outline</i></a></li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">filter_9_plus</i> Reden/Redebeiträge insgesamt: {{speech_count}}</li>
</ul>
</div>
</div>
<ul class="collapsible expandable hoverable">
{% if current_speaker.short_vita %}
<li>
<div class="collapsible-header">Kurzbiographie</div>
<div class="collapsible-body white"><span>{{current_speaker.short_vita}}</span></div>
</li>
{% endif %}
{% for period in sorted_l_info %}
<li>
<div class="collapsible-header">
<i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">library_books</i> Wahlperiode {{period.legislative_period}}
</div>
<div class="collapsible-body white">
<ul>
<span class="card-title">Mitglied des Bundestags</span>
<li>Mitglied von {{period.legislative_period_start_date|date:"d.m.Y"}} bis
{% if period.legislative_period_end_date is None %}
heute
{% else %}
{{period.legislative_period_end_date|date:"d.m.Y"}}
{% endif %}
</li>
<br />
<li>Mandatsart: {{period.mandate_type}}
<li>
<br />
<span class="card-title">Institutions und Fraktionszugehörigkeit</span>
{% for institution in sorted_i_info %}
{% if institution.current_period == period.legislative_period %}
<li><b>{{institution.institution}}</b>
<li>
<br />
{% if institution.institution_start_date is not None %}
<li>Von {{institution.institution_start_date|date:"d.m.Y"}} bis
{% if institution.institution_end_date is None %}
heute
{% else %}
{{institution.institution_end_date|date:"d.m.Y"}}
{% endif %}
<li>
<br />
{% endif %}
{% endif %}
{% endfor %}
</ul>
</div>
</li>
{% endfor %}
</ul>
</div>
<div class="col s12 m8">
<div class="card">
<div class="card-content">
<span class="card-title">Reden</span>
<div class="card-content">
<p>Alle Reden, die von {% if current_speaker.title %}
{{current_speaker.title}}
{% endif %}
{% if current_speaker.nobility %}
{{current_speaker.nobility}}
{% endif %}
{{current_speaker.first_name}}
{% if current_speaker.name_prefix %}
{{current_speaker.name_prefix}}
{% endif %}
{{current_speaker.last_name}} als MdB gehalten wurden.
<div style="overflow-x:auto;">
{% render_table speech_table %}
</div>
</div>
</div>
</div>
</div>
</div>
</div>
{% endblock content %}

View File

@ -0,0 +1,34 @@
{% extends "blog/base.html" %}
{% load render_table from django_tables2 %}
{% block content %}
<div class="container">
<div class="row">
<div class="col s12">
<div class="card">
<div class="card-content">
<span class="card-title">Mitglieder des Bundestags</span>
<p>Dies ist eine Liste aller Abgeordneten seit 1949 bis einschließlich der aktuellen Wahlperiode. Die Liste kann sortiert und durchsucht werden.</p>
<p>Ausgangsdaten für diese Liste können auf der <a href="https://www.bundestag.de/service/opendata">offiziellen Seite des Bundestags</a> heruntergeladen werden.</p>
<p>Für jede Person ist ein Profil angelegt, dass Informationen zu dieser bereithält und alle Reden bzw. Redebeiträge dieser gesammelt darstellt.</p>
<br />
<div class="row">
<form method="GET" class="col l4 offset-l8 m6 offset-m6 s12">
{% csrf_token %}
<div class="row">
<div class="input-field">
<i class="material-icons prefix">search</i>
{{form}}
</div>
</div>
</form>
</div>
<div style="overflow-x:auto;">
{% render_table table %}
</div>
</div>
</div>
</div>
</div>
</div>
{% endblock content %}

View File

@ -0,0 +1,95 @@
{% load django_tables2 %}
{% load i18n %}
{% block table-wrapper %}
{% block table %}
<table {% render_attrs table.attrs %} class="highlight">
{% block table.thead %}
{% if table.show_header %}
<thead {{ table.attrs.thead.as_html }}>
<tr>
{% for column in table.columns %}
<th {{ column.attrs.th.as_html }}>
{% if column.orderable %}
<a href="{% querystring table.prefixed_order_by_field=column.order_by_alias.next %}"><i class="material-icons ">sort</i> {{ column.header }}</a>
{% else %}
{{ column.header }}
{% endif %}
</th>
{% endfor %}
</tr>
</thead>
{% endif %}
{% endblock table.thead %}
{% block table.tbody %}
<tbody {{ table.attrs.tbody.as_html }}>
{% for row in table.paginated_rows %}
{% block table.tbody.row %}
<tr {{ row.attrs.as_html }}>
{% for column, cell in row.items %}
<td {{ column.attrs.td.as_html }}>{% if column.localize == None %}{{ cell }}{% else %}{% if column.localize %}{{ cell|localize }}{% else %}{{ cell|unlocalize }}{% endif %}{% endif %}</td>
{% endfor %}
</tr>
{% endblock table.tbody.row %}
{% empty %}
{% if table.empty_text %}
{% block table.tbody.empty_text %}
<tr><td colspan="{{ table.columns|length }}">{{ table.empty_text }}</td></tr>
{% endblock table.tbody.empty_text %}
{% endif %}
{% endfor %}
</tbody>
{% endblock table.tbody %}
{% block table.tfoot %}
{% if table.has_footer %}
<tfoot {{ table.attrs.tfoot.as_html }}>
<tr>
{% for column in table.columns %}
<td {{ column.attrs.tf.as_html }}>{{ column.footer }}</td>
{% endfor %}
</tr>
</tfoot>
{% endif %}
{% endblock table.tfoot %}
</table>
{% endblock table %}
{% block pagination %}
{% if table.page and table.paginator.num_pages > 1 %}
<ul class="pagination">
{% if table.page.has_previous %}
{% block pagination.previous %}
<li class="previous waves-effect">
<a href="{% querystring table.prefixed_page_field=table.page.previous_page_number %}">
{% trans '<i class="material-icons">chevron_left</i>' %}
</a>
</li>
{% endblock pagination.previous %}
{% endif %}
{% if table.page.has_previous or table.page.has_next %}
{% block pagination.range %}
{% for p in table.page|table_page_range:table.paginator %}
<li {% if p == table.page.number %}class="active light-green darken-3"{% endif %} class="waves-effect">
{% if p == '...' %}
<a href="#">{{ p }}</a>
{% else %}
<a href="{% querystring table.prefixed_page_field=p %}">
{{ p }}
</a>
{% endif %}
</li>
{% endfor %}
{% endblock pagination.range %}
{% endif %}
{% if table.page.has_next %}
{% block pagination.next %}
<li class="next waves-effect">
<a href="{% querystring table.prefixed_page_field=table.page.next_page_number %}">
{% trans '<i class="material-icons">chevron_right</i>' %}
</a>
</li>
{% endblock pagination.next %}
{% endif %}
</ul>
{% endif %}
{% endblock pagination %}
{% endblock table-wrapper %}

3
app/speakers/tests.py Executable file
View File

@ -0,0 +1,3 @@
from django.test import TestCase
# Create your tests here.

7
app/speakers/urls.py Executable file
View File

@ -0,0 +1,7 @@
from django.urls import path
from . import views
urlpatterns = [
path("", views.speakers, name="MdBs"),
path("mdb/<int:id>", views.speaker, name="MdB"),
]

52
app/speakers/views.py Executable file
View File

@ -0,0 +1,52 @@
from django.shortcuts import render
from django_tables2 import RequestConfig
from .models import Speaker, LegislativeInfo, LegislativeInstitution
from speeches.models import Speech
from .tables import SpeakerTable
from speeches.tables import SpeakerSpeechTable
from django.http import Http404
from watson import search as watson
from .forms import SearchForm
from speeches.forms import SearchFormSpeech
def speakers(request):
if(request.method == "GET"):
form = SearchForm(request.GET)
if(form.is_valid()):
query = form.cleaned_data["query"]
search_results = watson.filter(Speaker, query)
table = SpeakerTable(search_results)
RequestConfig(request, paginate={'per_page': 20}).configure(table)
context = {"title": "Suchergebnisse für " + query,
"form": form, "table": table}
return render(request, "speakers/speakers.html", context)
else:
form = SearchForm()
table = SpeakerTable(Speaker.objects.all().order_by("last_name"))
RequestConfig(request, paginate={'per_page': 20}).configure(table)
context = {"title": "Suche", "table": table, "form": form}
return render(request, "speakers/speakers.html", context)
def speaker(request, id):
try:
current_speaker = Speaker.objects.get(pk=id)
speech_count = len(Speech.objects.filter(foreign_speaker=id))
current_legislative_info = LegislativeInfo.objects.filter(foreign_speaker=id)
sorted_l_info = current_legislative_info.order_by("legislative_period")
institution_info = LegislativeInstitution.objects.filter(foreign_speaker=id)
sorted_i_info = institution_info.order_by("current_period")
table = SpeakerSpeechTable(Speech.objects.filter(foreign_speaker=id))
RequestConfig(request, paginate={'per_page': 20}).configure(table)
except Speaker.DoesNotExist:
raise Http404("Speaker does not exist")
context = {"title": ("MdB "
+ current_speaker.first_name
+ " " + current_speaker.last_name),
"current_speaker": current_speaker,
"sorted_l_info": sorted_l_info,
"sorted_i_info": sorted_i_info,
"speech_table": table,
"speech_count": speech_count}
return render(request, "speakers/speaker.html", context)

0
app/speeches/__init__.py Executable file
View File

3
app/speeches/admin.py Executable file
View File

@ -0,0 +1,3 @@
from django.contrib import admin
# Register your models here.

22
app/speeches/apps.py Executable file
View File

@ -0,0 +1,22 @@
from django.apps import AppConfig
from watson import search as watson
class SpeechesConfig(AppConfig):
name = 'speeches'
def ready(self):
Protocol = self.get_model("Protocol")
watson.register(Protocol, fields=["protocol_id",
"protocol_period",
"session_date_str"])
Speech = self.get_model("Speech")
watson.register(Speech,
fields=["speech_id",
"foreign_protocol__protocol_id",
"foreign_protocol__session_date_str",
"foreign_speaker__id",
"foreign_speaker__first_name",
"foreign_speaker__last_name"],
exclude=["speech_content"])

21
app/speeches/forms.py Executable file
View File

@ -0,0 +1,21 @@
from django import forms
class SearchForm(forms.Form):
"""
Configures the html input form for the protocol search.
"""
query = forms.CharField(label="Suche Protokoll", max_length="200")
query.widget.attrs.update({"class": "autocomplete materialize-textarea",
"id": "icon_prefix2"})
class SearchFormSpeech(forms.Form):
"""
Configures the html input form for the speech search.
"""
query = forms.CharField(label="Suche Rede", max_length="200")
query.widget.attrs.update({"class": "autocomplete materialize-textarea",
"id": "icon_prefix2"})

View File

@ -0,0 +1,120 @@
from django.core.management.base import BaseCommand
from speeches.models import Protocol, Speech
from speakers.models import Speaker
from lxml import etree
import os
import fnmatch
import datetime
from tqdm import tqdm
class Command(BaseCommand):
help = ("Adds protocols to the database using the django models"
" syntax. Protocols will be added from the xml protocol files."
" Input is a path pointing to all/multiple protocols in one"
" directory with one level of subdirectories. First imports"
" toc, attachments and metadata with model Protocol. Speeches will be put into realtion with the model Speech."
" to the protocols later on.")
def add_arguments(self, parser):
parser.add_argument("input_path",
type=str)
def handle(self, *args, **options):
path = options["input_path"]
list_of_files = []
for path, subdirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, "*.xml"):
list_of_files.append(os.path.join(path, name))
for file_path in tqdm(sorted(list_of_files), desc="Importing protocol data"):
# self.stdout.write("Reading data from file: " + file_path)
tree = etree.parse(file_path)
protocol = Protocol()
protocol.protocol_id = os.path.basename(file_path)[:-4]
# self.stdout.write("\tProtocol ID is: " + protocol.protocol_id)
# self.stdout.write("\tReading toc and attachment.")
session_nr = tree.xpath("//sitzungsnr")[0]
protocol.session_nr = session_nr
protocol_period = tree.xpath("@wahlperiode")[0]
protocol.protocol_period = protocol_period
session_date = tree.xpath("//@date")[0]
protocol.session_date_str = session_date
session_date = datetime.datetime.strptime(session_date, "%d.%m.%Y")
session_date = datetime.datetime.strftime(session_date, "%Y-%m-%d")
protocol.session_date = session_date
correct_start_time = None
start_of_session = tree.xpath("//@sitzung-start-uhrzeit")[0]
try:
start_of_session = datetime.datetime.strptime(start_of_session,
"%H:%M")
correct_start_time = True
except ValueError as e:
correct_start_time = False
if(correct_start_time is True):
protocol.start_of_session = start_of_session
else:
protocol.start_of_session = None
end_of_session = tree.xpath("//@sitzung-ende-uhrzeit")[0]
correct_end_time = None
try:
end_of_session = datetime.datetime.strptime(end_of_session,
"%H:%M")
correct_end_time = True
except ValueError as e:
correct_end_time = False
if(correct_end_time is True):
protocol.end_of_session = end_of_session
else:
protocol.end_of_session = None
session_nr = tree.xpath("//sitzungsnr")[0]
protocol.session_nr = session_nr.text
election_period = tree.xpath("//wahlperiode")[0]
protocol.election_period = election_period.text
toc = tree.xpath("//inhaltsverzeichnis")[0]
protocol.toc = toc.text
attachment = tree.xpath("//anlagen")[0]
protocol.attachment = attachment.text
protocol.save()
speeches = tree.xpath("//sitzungsbeginn | //rede")
for previous_e, current_e, next_e in zip([None]+speeches[:-1], speeches, speeches[1:]+[None]):
# self.stdout.write("\tReading speech from " + protocol.protocol_id)
speech = Speech()
speech.foreign_protocol = protocol
if(previous_e is not None):
previous_speech_id = previous_e.xpath("@id")[0]
speech.previous_speech_id = previous_speech_id
speech_id = current_e.xpath("@id")[0]
speech.speech_id = speech_id
if(next_e is not None):
next_speech_id = next_e.xpath("@id")[0]
speech.next_speech_id = next_speech_id
# self.stdout.write("\tSpeech ID is:" + str(speech.speech_id))
# self.stdout.write("\tPrevious Speech ID is:" + str(speech.previous_speech_id))
# self.stdout.write("\tNext Speech ID is:" + str(speech.next_speech_id))
speaker_type = current_e.xpath("//@typ")[0]
speech.speaker_type = speaker_type
speaker_id = current_e.xpath(".//redner/@id")[0]
# self.stdout.write("\tCurrent speaker ID is:" + str(speaker_id))
if(speaker_id != "None"):
speech.foreign_speaker = Speaker.objects.filter(pk=speaker_id)[0]
# self.stdout.write("\tSpeaker ID (Foreign key) is:" + str(speech.foreign_speaker))
speech_content = current_e.xpath(".//p")
speech_content = [str(etree.tostring(p)) for p in speech_content]
speech_content = "".join(speech_content)
speech.speech_content = speech_content
original_string = current_e.xpath(".//redner/name")[0]
speech.original_string = original_string.tail
# self.stdout.write("\t-------------------------------------------")
speech.save()

View File

44
app/speeches/models.py Executable file
View File

@ -0,0 +1,44 @@
from django.db import models
class Protocol(models.Model):
"""
This models contains the data about one protocol. Data will be imported from
the XML protocols via the custom django-admin command import_protocols.py.
Does not contain speeches. Speeches will be related to this model though.
Only contains table of contents, metadata etc.
"""
protocol_id = models.IntegerField(primary_key=True, verbose_name="Protokoll ID")
protocol_period = models.IntegerField(verbose_name="Wahlperiode", null=True, blank=True)
session_nr = models.IntegerField(verbose_name="Sitzungsnummer", null=True, blank=True)
session_date = models.DateField(verbose_name="Datum")
session_date_str = models.CharField(verbose_name="Datums String", max_length=12, blank=True, default=None, null=True)
start_of_session = models.TimeField(null=True, verbose_name="Startuhrzeit")
end_of_session = models.TimeField(null=True, verbose_name="Enduhrzeit")
toc = models.TextField(verbose_name="Inhaltsverzeichnis")
attachment = models.TextField(verbose_name="Anlagen")
def __str__(self):
return str(self.protocol_id) + " " + str(self.session_date)
class Speech(models.Model):
"""
This models contains the data about one speech. Data will be imported from
the XML protocols via the custom django-admin command import_speeches.py.
"""
foreign_protocol = models.ForeignKey("Protocol", on_delete=models.CASCADE,
verbose_name="Foreign Protokoll",
default=None)
speech_id = models.CharField(verbose_name="Rede ID", primary_key=True, max_length=14)
previous_speech_id = models.CharField(verbose_name="Vorherige Rede ID", max_length=14, blank=True, default=None, null=True)
next_speech_id = models.CharField(verbose_name="Nächste Rede ID", max_length=14, blank=True, default=None, null=True)
speaker_type = models.CharField(verbose_name="Rolle des MdBs", max_length=50)
foreign_speaker = models.ForeignKey("speakers.Speaker", on_delete=models.CASCADE,
null=True, blank=True, verbose_name="MdB ID", )
speech_content = models.TextField(verbose_name="Redeinhalt") # import as XML element to string
original_string = models.TextField(verbose_name="Original String")
def __str__(self):
return (str(self.foreign_protocol) + " " + str(self.speech_id) + " "
+ self.speech_content[:20])

53
app/speeches/tables.py Executable file
View File

@ -0,0 +1,53 @@
import django_tables2 as tables
from .models import Speech, Protocol
from django_tables2.utils import A # alias for Accessor
class SpeechTable(tables.Table):
"""
Configures the table showing all speeches. Inserts a column with links to
the speeches. Also defines all shown columns.
"""
link = tables.LinkColumn("Rede", text="Rede", args=[A("speech_id")],
orderable=False,
attrs={"a": {"class": "waves-effect waves-light btn light-green darken-3"}}) # Adds colum with Link to Rede
class Meta:
model = Speech
fields = ("speech_id", "foreign_protocol.protocol_id",
"foreign_protocol.session_date", "foreign_speaker.id",
"foreign_speaker.first_name", "foreign_speaker.last_name")
template_name = "speeches/table.html"
empty_text = ("Für den eingegebenen Suchbegriff gibt es leider keine Ergebnisse.")
class SpeakerSpeechTable(tables.Table):
"""
Configures the table showing all speeches of one speaker in his profile.
Inserts a column with links to the speeches. Also defines all shown columns.
"""
link = tables.LinkColumn("Rede", text="Rede", args=[A("speech_id")],
orderable=False,
attrs={"a": {"class": "waves-effect waves-light btn light-green darken-3"}}) # Adds colum with Link to Speaker
class Meta:
model = Speech
fields = ("speech_id", "foreign_protocol.protocol_id", "foreign_protocol.session_date")
template_name = "speeches/table.html"
empty_text = ("Für den eingegebenen Suchbegriff gibt es leider keine Ergebnisse.")
class ProtocolTable(tables.Table):
"""
Configures the table showing all protocols.
Inserts a column with links to the protocols. Also defines all shown columns.
"""
link = tables.LinkColumn("Protokoll", text="Protokoll", args=[A("protocol_id")],
orderable=False,
attrs={"a": {"class": "waves-effect waves-light btn light-green darken-3"}}) # Adds colum with Link to protocol
class Meta:
model = Protocol
fields = ("protocol_id", "session_date", "protocol_period")
template_name = "speeches/table.html"
empty_text = ("Für den eingegebenen Suchbegriff gibt es leider keine Ergebnisse.")

View File

@ -0,0 +1,84 @@
{% extends "blog/base.html" %}
{% load render_table from django_tables2 %}
{% block content %}
<div class="container">
<div class="row">
<div class="col s12 m4">
<div class="card">
<div class="card-content">
<span class="card-title center-align">
Protokoll:<br />{{current_protocol.protocol_id}}</span>
<p class="center-align"><i class="large material-icons blue-grey-text darken-4">insert_drive_file</i></p>
<span class="card-title">Metadaten</span>
<ul>
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">date_range</i>Wahlperiode: {{current_protocol.protocol_period}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">date_range</i>Sitzungsnummer: {{current_protocol.session_nr}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">date_range</i>Datum: {{current_protocol.session_date}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">access_time</i>Startuhrzeit der Sitzung: {{current_protocol.start_of_session}} Uhr</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">av_timer</i>Enduhrzeit der Sitzung: {{current_protocol.end_of_session}} Uhr</li>
<br />
</div>
</div>
<ul class="collapsible hoverable white">
<li>
<div class="collapsible-header"><i class="material-icons left blue-grey-text darken-4">account_circle</i>MdBs dieser Rede</div>
<div class="collapsible-body">
<ul>
{% for speaker in speakers %}
<li>
<a href="/mdbs/mdb/{{speaker.id}}">{{speaker.id}}: {{speaker.last_name}}, {{speaker.first_name}}</a>
</li>
<br />
{% endfor %}
</ul>
</div>
</li>
<li>
<div class="collapsible-header"><i class="material-icons left blue-grey-text darken-4">toc</i>Inhaltsverzeichnis</div>
<div class="collapsible-body"><span>{{current_protocol.toc}}</span></div>
</li>
{% if current_protocol.attachment %}
<li>
<div class="collapsible-header"><i class="material-icons left blue-grey-text darken-4">folder</i>Anlagen</div>
<div class="collapsible-body"><span>{{current_protocol.attachment}}</span></div>
</li>
{% endif %}
</ul>
</div>
<div class="col s12 m8">
<div class="card">
<div class="card-content">
<span class="card-title">Gesamtes Protokol</span>
{% for speaker, speech, related_speech in speaker_speech_html %}
{% autoescape off%}
{% if speaker.id %}
<h5><a href="/mdbs/mdb/{{speaker.id}}">{% if speaker.title %}
{{speaker.title}}
{% endif %}
{% if speaker.nobility %}
{{speaker.nobility}}
{% endif %}
{{speaker.first_name}}
{% if speaker.name_prefix %}
{{speaker.name_prefix}}
{% endif %}
{{speaker.last_name}}:</a></h5>
{% else %}
<span class="card-title">Rede von: Unbekannt<a class="tooltipped" data-position="bottom" data-tooltip="Dieses Mitglied des Bundestags konnte leider nicht automatisch erkannt werden. Grundlegende Infos zu Namen etc. können beim zugehörigen Redeeintrag unter dem Punkt Original String gefunden werden."><i
class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">info_outline</i></a></span>
{% endif %}
<h6><a href="/protokolle/rede/{{related_speech.speech_id}}">Rede ID: {{related_speech.speech_id}}</a></h6>
{{speech}}
{% endautoescape %}
{% endfor%}
</div>
</div>
</div>
</div>
</div>
{% endblock content %}

View File

@ -0,0 +1,31 @@
{% extends "blog/base.html" %}
{% load render_table from django_tables2 %}
{% block content %}
<div class="container">
<div class="row">
<div class="col s12">
<div class="card">
<div class="card-content">
<span class="card-title">Alle Protokolle des Bundestag</span>
<p>Liste aller Bundestagsplenarprotokolle von der ersten bis zur 18. Wahlperiode. Eine Volltextsuche ist zurzeit noch nicht implementiert.</p>
<div class="row">
<form method="GET" class="col l4 offset-l8 m6 offset-m6 s12">
{% csrf_token %}
<div class="row">
<div class="input-field">
<i class="material-icons prefix">search</i>
{{form}}
</div>
</div>
</form>
</div>
<div style="overflow-x:auto;">
{% render_table table %}
</div>
</div>
</div>
</div>
</div>
</div>
{% endblock content %}

View File

@ -0,0 +1,144 @@
{% extends "blog/base.html" %}
{% load render_table from django_tables2 %}
{% block content %}
<div class="container">
<div class="row">
<div class="col s12 m4">
<div class="card">
<div class="card-content">
<span class="card-title center-align">
Rede:<br />{{current_speech.speech_id}}</span>
<p class="center-align"><i class="large material-icons blue-grey-text darken-4">insert_comment</i></p>
<span class="card-title">Metadaten</span>
<ul>
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">insert_drive_file</i><a href="/protokolle/protokoll/{{current_speech.foreign_protocol.protocol_id}}">Aus Protokoll:
{{current_speech.foreign_protocol.protocol_id}}</a></li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">date_range</i>Datum: {{current_speech.foreign_protocol.session_date}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">access_time</i>Startuhrzeit der Sitzung: {{current_speech.foreign_protocol.start_of_session}} Uhr</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">av_timer</i>Enduhrzeit der Sitzung: {{current_speech.foreign_protocol.end_of_session}} Uhr</li>
<br />
{% if current_speech.foreign_speaker.id %}
<a href="/mdbs/mdb/{{current_speech.foreign_speaker.id}}">
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">fingerprint</i>Redner ID: {{current_speech.foreign_speaker.id}}</li>
</a>
{%else%}
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">fingerprint</i>Redner ID: Nicht erkannt {{current_speech.foreign_speaker.id}}</li>
{% endif %}
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">perm_identity</i>Rednertyp: {{current_speech.speaker_type}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">short_text</i>Original String: {{current_speech.original_string}}<a class="tooltipped" data-position="bottom" data-tooltip="Dies ist die Zeichenfolge, mit der der aktuelle Redner oder die aktuelle Rednerin im original Protokoll vor ihrem Redebeitrag genannt wurde. Passt dieser nicht zum Namen, der über der Rede steht, ist leider etwas bei der automatischen Erkennung schiefgegangen. Steht kein Name über der Rede, wurde der Redner nicht automatisch erkannt."><i
class="material-icons blue-grey-text darken-4">info_outline</i></a></li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">record_voice_over</i>Unterbrechungen/Zurufe: {{interruptions}}</li>
<br />
<li><i class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">subject</i>Länge: {{words}} Wörter</li>
</div>
</div>
<ul class="collapsible hoverable">
<li>
<div class="collapsible-header"><i class="material-icons left blue-grey-text darken-4">sort_by_alpha</i>Vokabular</div>
<div class="collapsible-body white"><span><b>Vokabeln: {{unique_words}}</b><br /><ol>{{ vocabulary|safe }}</ol></span></div>
</li>
<li>
<div class="collapsible-header"><i class="material-icons left blue-grey-text darken-4">toc</i>Inhaltsverzeichnis</div>
<div class="collapsible-body white"><span>{{current_speech.foreign_protocol.toc}}</span></div>
</li>
{% if current_speech.foreign_protocol.attachment %}
<li>
<div class="collapsible-header"><i class="material-icons left blue-grey-text darken-4">folder</i>Anlagen</div>
<div class="collapsible-body white"><span>{{current_speech.foreign_protocol.attachment}}</span></div>
</li>
{% endif %}
</ul>
</div>
<div class="col s12 m8">
{% if previous_speech.speech_content %}
<ul class="collapsible hoverable white">
<li>
<div class="collapsible-header"><i class="large material-icons blue-grey-text darken-4">insert_comment</i>Vorherige Rede als Kontext</div>
<div class="collapsible-body">
<h6>{% if previous_speech.foreign_speaker.id %}
Rede von {% if previous_speech.foreign_speaker.title %}
{{previous_speech.foreign_speaker.title}}
{% endif %}
{% if previous_speech.foreign_speaker.nobility %}
{{previous_speech.foreign_speaker.nobility}}
{% endif %}
{{previous_speech.foreign_speaker.first_name}}
{% if previous_speech.foreign_speaker.name_prefix %}
{{previous_speech.foreign_speaker.name_prefix}}
{% endif %}
{{previous_speech.foreign_speaker.last_name}}
({{previous_speech.foreign_speaker.party}})
{% else %}
<span class="card-title">Rede von: Unbekannt<a class="tooltipped" data-position="bottom" data-tooltip="Dieses Mitglied des Bundestags konnte leider nicht automatisch erkannt werden. Grundlegende Infos zu Namen etc. können beim zugehörigen Redeeintrag unter dem Punkt Original String gefunden werden."><i
class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">info_outline</i></a></span>
{% endif %}
</h6>
<span>{% autoescape off %}{{previous_speech_html}}{% endautoescape %}</span>
</div>
</li>
</ul>
{% endif %}
<div class="card">
<div class="card-content">
{% if previous_speech %}
<div class="center-align"><a href="/protokolle/rede/{{previous_speech.speech_id}}" class="waves-effect waves-light light-green darken-3 btn"><i class="material-icons left">arrow_upward</i>Zur Rede davor</a></div>
<br />
{% endif %}
{% if current_speech.foreign_speaker.id%}
<span class="card-title">Rede von {% if current_speech.foreign_speaker.title %}
{{current_speech.foreign_speaker.title}}
{% endif %}
{% if current_speech.foreign_speaker.nobility %}
{{current_speech.foreign_speaker.nobility}}
{% endif %}
{{current_speech.foreign_speaker.first_name}}
{% if current_speech.foreign_speaker.name_prefix %}
{{current_speech.foreign_speaker.name_prefix}}
{% endif %}
{{current_speech.foreign_speaker.last_name}}
({{current_speech.foreign_speaker.party}})</span>
{% else %}
<span class="card-title">Rede von: Unbekannt<a class="tooltipped" data-position="bottom" data-tooltip="Dieses Mitglied des Bundestags konnte leider nicht automatisch erkannt werden. Grundlegende Infos zu Namen etc. können links unter 'Original String' gelesen werden."><i
class="material-icons blue-grey-text darken-4" style="margin-right: 10px;">info_outline</i></a></span>
{% endif %}
{% autoescape off %}
{{current_speech_html}}
{% endautoescape %}
{% if next_speech %}
<div class="center-align"><br /><a href="/protokolle/rede/{{next_speech.speech_id}}" class="waves-effect waves-light light-green darken-3 btn"><i class="material-icons left">arrow_downward</i>Zur Rede danach</a></div>
{% endif %}
</div>
</div>
{% if next_speech.speech_content %}
<ul class="collapsible hoverable white">
<li>
<div class="collapsible-header"><i class="large material-icons blue-grey-text darken-4">insert_comment</i>Nächste Rede als Kontext</div>
<div class="collapsible-body">
<h6>Rede von {% if next_speech.foreign_speaker.title %}
{{next_speech.foreign_speaker.title}}
{% endif %}
{% if next_speech.foreign_speaker.nobility %}
{{next_speech.foreign_speaker.nobility}}
{% endif %}
{{next_speech.foreign_speaker.first_name}}
{% if next_speech.foreign_speaker.name_prefix %}
{{next_speech.foreign_speaker.name_prefix}}
{% endif %}
{{next_speech.foreign_speaker.last_name}}
({{next_speech.foreign_speaker.party}})</h6>
<span><span>{% autoescape off %}{{next_speech_html}}{% endautoescape %}</span>
</div>
</li>
</ul>
{% endif %}
</div>
</div>
</div>
{% endblock content %}

View File

@ -0,0 +1,31 @@
{% extends "blog/base.html" %}
{% load render_table from django_tables2 %}
{% block content %}
<div class="container">
<div class="row">
<div class="col s12">
<div class="card">
<div class="card-content">
<span class="card-title">Reden und Redebeiträge aller Mitglieder des Bundestags seit 1949 bis 2017</span>
<p>Hier ist eine liste aller Reden, die Mitglieder des Bundestags gehalten haben. Eine Volltextsuche ist zurzeit noch nicht implementiert.</p>
<div class="row">
<form method="GET" class="col l4 offset-l8 m6 offset-m6 s12">
{% csrf_token %}
<div class="row">
<div class="input-field">
<i class="material-icons prefix">search</i>
{{form}}
</div>
</div>
</form>
</div>
<div style="overflow-x:auto;">
{% render_table table %}
</div>
</div>
</div>
</div>
</div>
</div>
{% endblock content %}

View File

@ -0,0 +1,95 @@
{% load django_tables2 %}
{% load i18n %}
{% block table-wrapper %}
{% block table %}
<table {% render_attrs table.attrs %} class="highlight">
{% block table.thead %}
{% if table.show_header %}
<thead {{ table.attrs.thead.as_html }}>
<tr>
{% for column in table.columns %}
<th {{ column.attrs.th.as_html }}>
{% if column.orderable %}
<a href="{% querystring table.prefixed_order_by_field=column.order_by_alias.next %}"><i class="material-icons ">sort</i> {{ column.header }}</a>
{% else %}
{{ column.header }}
{% endif %}
</th>
{% endfor %}
</tr>
</thead>
{% endif %}
{% endblock table.thead %}
{% block table.tbody %}
<tbody {{ table.attrs.tbody.as_html }}>
{% for row in table.paginated_rows %}
{% block table.tbody.row %}
<tr {{ row.attrs.as_html }}>
{% for column, cell in row.items %}
<td {{ column.attrs.td.as_html }}>{% if column.localize == None %}{{ cell }}{% else %}{% if column.localize %}{{ cell|localize }}{% else %}{{ cell|unlocalize }}{% endif %}{% endif %}</td>
{% endfor %}
</tr>
{% endblock table.tbody.row %}
{% empty %}
{% if table.empty_text %}
{% block table.tbody.empty_text %}
<tr><td colspan="{{ table.columns|length }}">{{ table.empty_text }}</td></tr>
{% endblock table.tbody.empty_text %}
{% endif %}
{% endfor %}
</tbody>
{% endblock table.tbody %}
{% block table.tfoot %}
{% if table.has_footer %}
<tfoot {{ table.attrs.tfoot.as_html }}>
<tr>
{% for column in table.columns %}
<td {{ column.attrs.tf.as_html }}>{{ column.footer }}</td>
{% endfor %}
</tr>
</tfoot>
{% endif %}
{% endblock table.tfoot %}
</table>
{% endblock table %}
{% block pagination %}
{% if table.page and table.paginator.num_pages > 1 %}
<ul class="pagination">
{% if table.page.has_previous %}
{% block pagination.previous %}
<li class="previous waves-effect">
<a href="{% querystring table.prefixed_page_field=table.page.previous_page_number %}">
{% trans '<i class="material-icons">chevron_left</i>' %}
</a>
</li>
{% endblock pagination.previous %}
{% endif %}
{% if table.page.has_previous or table.page.has_next %}
{% block pagination.range %}
{% for p in table.page|table_page_range:table.paginator %}
<li {% if p == table.page.number %}class="active light-green darken-3"{% endif %} class="waves-effect">
{% if p == '...' %}
<a href="#">{{ p }}</a>
{% else %}
<a href="{% querystring table.prefixed_page_field=p %}">
{{ p }}
</a>
{% endif %}
</li>
{% endfor %}
{% endblock pagination.range %}
{% endif %}
{% if table.page.has_next %}
{% block pagination.next %}
<li class="next waves-effect">
<a href="{% querystring table.prefixed_page_field=table.page.next_page_number %}">
{% trans '<i class="material-icons">chevron_right</i>' %}
</a>
</li>
{% endblock pagination.next %}
{% endif %}
</ul>
{% endif %}
{% endblock pagination %}
{% endblock table-wrapper %}

3
app/speeches/tests.py Executable file
View File

@ -0,0 +1,3 @@
from django.test import TestCase
# Create your tests here.

9
app/speeches/urls.py Executable file
View File

@ -0,0 +1,9 @@
from django.urls import path
from . import views
urlpatterns = [
path("reden/", views.speeches, name="Reden"),
path("liste-protokolle/", views.protocols, name="Protokoll-list"),
path("protokoll/<int:protocol_id>", views.protocol, name="Protokoll"),
path("rede/<str:speech_id>", views.speech, name="Rede")
]

37
app/speeches/utils.py Executable file
View File

@ -0,0 +1,37 @@
import re
from lxml import etree
def create_html_speech(speech_content_xml_string):
"""
COnverts the XML speech content into styled html. Also counts the words and
shows the vocabulary.
"""
speech_html = "<div>" + speech_content_xml_string + "</div>"
speech_html = etree.fromstring(speech_html)
raw_text = []
interruptions = 0
for element in speech_html.iter():
if(element.tag == "p"):
raw_text.append(element.text)
element.tag = "span"
element.attrib["class"]="line"
element.attrib.pop("klasse", None)
elif(element.tag == "kommentar"):
interruptions += 1
element.tag = "span"
element.attrib["class"]="comment"
element.attrib.pop("klasse", None)
elif(element.tag == "metadata"):
element.tag = "blockquote"
element.attrib["class"]="metadata"
element.attrib.pop("klasse", None)
element.text = "Metadaten/Kopzeile:" + "\\n" + element.text
raw_text = [element for element in raw_text if element != None]
raw_text = "".join(raw_text)
speech_html = etree.tostring(speech_html, pretty_print=True, encoding='unicode')
speech_html = re.sub(r"b'", "", speech_html)
speech_html = re.sub(r"\\n\s+\'", "<br/>", speech_html)
speech_html = re.sub(r"\\n", "<br/>", speech_html)
speech_html = re.sub(r"\\'", "'", speech_html)
return(speech_html, raw_text, interruptions)

109
app/speeches/views.py Executable file
View File

@ -0,0 +1,109 @@
from django.shortcuts import render
from django_tables2 import RequestConfig
from .models import Speech, Protocol
from .tables import SpeechTable, ProtocolTable
from django.http import Http404
from .utils import create_html_speech
from .forms import SearchForm, SearchFormSpeech
from watson import search as watson
from collections import Counter
def speech(request, speech_id):
try:
current_speech = Speech.objects.get(pk=speech_id)
if(current_speech.previous_speech_id is not None):
previous_speech = Speech.objects.get(pk=current_speech.previous_speech_id)
previous_speech_html = create_html_speech(previous_speech.speech_content)[0]
else:
previous_speech = None
previous_speech_html = None
if(current_speech.next_speech_id is not None):
next_speech = Speech.objects.get(pk=current_speech.next_speech_id)
next_speech_html = create_html_speech(next_speech.speech_content)[0]
else:
next_speech = None
next_speech_html = None
current_speech_html, raw_text, interruptions = create_html_speech(current_speech.speech_content)
vocabulary = Counter(raw_text.split()).most_common()
unique_words = len(vocabulary)
tmp_str = []
for pair in vocabulary:
tmp_str.append("<li>{}: {}</li>".format(pair[0], pair[1]))
vocabulary = "".join(tmp_str)
except Speech.DoesNotExist:
raise Http404("Speech does not exist")
context = {"title": ("Rede "
+ " "
+ current_speech.speech_id),
"current_speech": current_speech,
"current_speech_html": current_speech_html,
"previous_speech_html": previous_speech_html,
"next_speech_html": next_speech_html,
"previous_speech": previous_speech,
"next_speech": next_speech,
"interruptions": interruptions,
"words": len(raw_text.split()),
"vocabulary": vocabulary,
"unique_words": unique_words}
return render(request, "speeches/speech.html", context)
def speeches(request):
if(request.method == "GET"):
form = SearchFormSpeech(request.GET)
if(form.is_valid()):
query = form.cleaned_data["query"]
search_results = watson.filter(Speech, query)
table = SpeechTable(search_results)
RequestConfig(request, paginate={'per_page': 20}).configure(table)
context = {"title": "Suchergebnisse für " + query,
"form": form, "table": table}
return render(request, "speeches/speeches.html", context)
else:
form = SearchFormSpeech()
table = SpeechTable(Speech.objects.all().order_by("speech_id"))
RequestConfig(request, paginate={'per_page': 20}).configure(table)
context = {"title": "Suche", "table": table, "form": form}
return render(request, "speeches/speeches.html", context)
def protocol(request, protocol_id):
try:
current_protocol = Protocol.objects.get(pk=protocol_id)
related_speeches = Speech.objects.filter(foreign_protocol=protocol_id).order_by("speech_id")
speakers = []
speeches_html = []
for speech in related_speeches:
speakers.append(speech.foreign_speaker)
speech_html = create_html_speech(speech.speech_content)[0]
speeches_html.append(speech_html)
speaker_speech_html = zip(speakers, speeches_html, related_speeches)
except Protocol.DoesNotExist:
raise Http404("Protocol does not exist")
context = {"title": ("Protokoll " + str(current_protocol.protocol_id)),
"current_protocol": current_protocol,
"related_speeches": related_speeches,
"speeches_html": speeches_html,
"speakers": set(speakers),
"speaker_speech_html": speaker_speech_html}
return render(request, "speeches/protocol.html", context)
def protocols(request):
if(request.method == "GET"):
form = SearchForm(request.GET)
if(form.is_valid()):
query = form.cleaned_data["query"]
search_results = watson.filter(Protocol, query)
table = ProtocolTable(search_results)
RequestConfig(request, paginate={'per_page': 20}).configure(table)
context = {"title": "Suchergebnisse für " + query,
"form": form, "table": table}
return render(request, "speeches/protocols.html", context)
else:
form = SearchForm()
table = ProtocolTable(Protocol.objects.all().order_by("session_date"))
RequestConfig(request, paginate={'per_page': 20}).configure(table)
context = {"title": "Suche", "table": table, "form": form}
return render(request, "speeches/protocols.html", context)

9620
app/utils/classes.txt Executable file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,36 @@
"""
Small script creating the models for the N-Gramm Viewer holding containing all
the different n-gramm data.
"""
corpus_type_list = ["lm_ns_year", "tk_ws_year", "lm_ns_speaker", "tk_ws_speaker"]
sort_key_list = ([i for i in range(10)]
+ "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z".split()
+ ["_Non_ASCII"])
ngram_kinds = ["One", "Two", "Three", "Four", "Five"]
template_class = """
class Key{}_{}Gram_{}(models.Model):
ngram = models.CharField(verbose_name='{}Gram',
max_length=255,
default=None,
null=True,
blank=True)
key = models.CharField(max_length=255)
count = models.IntegerField()
def __str__(self):
return str(self.ngram) + " " + str(self.key)
"""
classes = []
for corpus_type in corpus_type_list:
for ngram_kind in ngram_kinds:
for key in sort_key_list:
cls = template_class.format(key, ngram_kind, corpus_type,
ngram_kind)
classes.append(cls)
with open("classes.txt", "w") as file:
for cls in classes:
file.write("{}\n".format(cls))

35
docker-compose.yml Normal file
View File

@ -0,0 +1,35 @@
version: '3.7'
services:
web:
build: ./app
command: gunicorn bundesdata_app.wsgi:application --bind 0.0.0.0:8000
volumes:
- ./app/:/usr/src/app/
- ./input_volume/:/usr/src/app/input_data
- ./static_volume:/usr/src/app/staticfiles
expose:
- 8000
depends_on:
- db
db:
image: postgres:11.2
environment:
- POSTGRES_USER=postgresUser
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=bundesdataDB
volumes:
- ./postgres_data:/var/lib/postgresql/data/
nginx:
build: ./nginx
volumes:
- ./static_volume:/usr/src/app/staticfiles
ports:
- 8000:80
depends_on:
- web
volumes:
postgres_data:
static_volume:
input_volume:

4
nginx/Dockerfile Normal file
View File

@ -0,0 +1,4 @@
FROM nginx:1.15.8
RUN rm /etc/nginx/conf.d/default.conf
COPY nginx.conf /etc/nginx/conf.d

19
nginx/nginx.conf Normal file
View File

@ -0,0 +1,19 @@
upstream bundesdata_app {
server web:8000;
}
server {
listen 80;
location / {
proxy_pass http://bundesdata_app;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_redirect off;
}
location /staticfiles/ {
alias /usr/src/app/staticfiles/;
}
}