Add author and title tags. Fix some errors in note.txt

This commit is contained in:
Stephan Porada 2019-10-14 14:00:41 +02:00
parent 28cc17a479
commit ebff8fb7f1
2 changed files with 28 additions and 25 deletions

View File

@ -1,7 +1,9 @@
# Enter bash in cqpwebserver
docker exec -it cqperver_cqpserver_1 /bin/bash
docker exec -it cqpserver_cqpserver_1 /bin/bash
## Encode example corpus
mkdir /corpora/data/example
cwb-encode -d /corpora/data/example \
-f /root/files/example.vrt \
-R /usr/local/share/cwb/registry/example \
@ -11,6 +13,7 @@ cwb-make -V EXAMPLE
cwb-describe-corpus EXAMPLE
## Encode utopien corpus
mkdir /corpora/data/utopien
cwb-encode -d /corpora/data/utopien \
-f /root/files/utopien.vrt \
-sxB \

View File

@ -1,6 +1,6 @@
<?xml version='1.0' encoding='UTF8'?>
<corpus>
<text title="jack_london__the_iron_heel">
<entry title="the_iron_heel" author="jack_london">
<s>
The the AT |Z5|
Project project NN1 |X7+|X2.4|
@ -118560,7 +118560,7 @@ eBooks ebook NN2 |Q4.1/Y2|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="george_orwell__1984">
<entry title="1984" author="george_orwell">
<s>
1984 1984 MC |N1|T1.2|T3|
By by II |Z5|
@ -251365,7 +251365,7 @@ as as CSA |Z5|
393 393 MC |N1|T1.2|T3|
</s>
</text>
<text title="eugen_richter__pictures_of_a_socialistic_future">
<entry title="pictures_of_a_socialistic_future" author="eugen_richter">
<s>
PICTURES picture NN2 |C1|Q4.3|X4.1|
OF of IO |Z5|
@ -296907,7 +296907,7 @@ ERNST ernst NP1 |Z99|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="ernest_callenbach__ecotopia">
<entry title="ecotopia" author="ernest_callenbach">
<s>
Ernest ernest NP1 |Z1m|
gallenbach gallenbach JJ |Z99|
@ -382648,7 +382648,7 @@ home home RL |H4|M6|
WILL will VM |T1.1.3|
</s>
</text>
<text title="karin_boye__kallocain">
<entry title="kallocain" author="karin_boye">
<s>
KALLOCAIN kallocain VV0 |Z99|
BY by II |Z5|
@ -462032,7 +462032,7 @@ PAIPHO paipho NN1 |Z99|
Censor censor VV0 |Q4/S7.4-|
</s>
</text>
<text title="anthony_burgess__clockwork_orange">
<entry title="clockwork_orange" author="anthony_burgess">
<s>
SHEPHERD shepherd NP1 |F4/S2mf|
: PUNC YCOL |PUNC|
@ -472856,7 +472856,7 @@ not not XX |Z6|
be be VBI |A3+|Z5|
</s>
</text>
<text title="adam_sternberg__shovel_ready">
<entry title="shovel_ready" author="adam_sternberg">
<s>
Every every AT1 |N5.1+|
human human JJ |S2mf|
@ -479322,7 +479322,7 @@ curse curse VV0 |Q2.2|A1.4-|
Goddamn goddamn UH |Z99|
</s>
</text>
<text title="thomas_moore__utopia">
<entry title="utopia" author="thomas_moore">
<s>
The the AT |Z5|
Project project NN1 |X7+|X2.4|
@ -534794,7 +534794,7 @@ eBooks ebook NN2 |Q4.1/Y2|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="william_gibson__neuromancer">
<entry title="neuromancer" author="william_gibson">
<s>
Neuromancer neuromancer NP1 |Z99|
William william NP1 |Z1m|
@ -651295,7 +651295,7 @@ why why RRQ |A2.2|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="mercier__memoirs_of_the_year_2500">
<entry title="memoirs_of_the_year_2500" author="mercier">
<s>
Introduction introduction NN1 |T2+|Q4|S1.1.1|
1LITERARY 1literary FO |Z99|
@ -720473,7 +720473,7 @@ and and CC |Z5|
even even RR |A13.1|
</s>
</text>
<text title="henry_thomas__the_american">
<entry title="the_american" author="henry_thomas">
<s>
The the AT |Z5|
Project project NN1 |X7+|X2.4|
@ -906270,7 +906270,7 @@ eBooks ebook NN2 |Q4.1/Y2|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="michael_young__rise_of_the_meritocracy">
<entry title="rise_of_the_meritocracy" author="michael_young">
<s>
The the AT |Z5|
courage courage NN1 |E5+/S1.2|
@ -922466,7 +922466,7 @@ i0% i0% FO |Z99|
i i ZZ1 |Z5|
</s>
</text>
<text title="ray_bradbury__fahrenheit_451">
<entry title="fahrenheit_451" author="ray_bradbury">
<s>
FAHRENHEIT fahrenheit NP1 |Z99|
451 451 MC |N1|T1.2|T3|
@ -988288,7 +988288,7 @@ THE the AT |Z5|
END end NN1 |M6|T2-|O2|
</s>
</text>
<text title="edward_bellamy__looking_backward">
<entry title="looking_backward" author="edward_bellamy">
<s>
The the AT |Z5|
Project project NN1 |X7+|X2.4|
@ -1093742,7 +1093742,7 @@ eBooks ebook NN2 |Q4.1/Y2|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="isaac_asimov__i_robot">
<entry title="i_robot" author="isaac_asimov">
<s>
I i PPIS1 |Z8mf|
, PUNC YCOM |PUNC|
@ -1188344,7 +1188344,7 @@ end end NN1 |M6|T2-|O2|
---- ---- NN1 |Z99|
</s>
</text>
<text title="philipp_k_dick__do_androids_dream_of_electric_sheep">
<entry title="do_androids_dream_of_electric_sheep" author="philipp_k_dick">
<s>
Do do VD0 |A1.1.1|G2.2-|X9.2+|
Androids android NN2 |O3|
@ -1279301,7 +1279301,7 @@ coffee coffee NN1 |F2|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="jewgnij_samjatin__we">
<entry title="we" author="jewgnij_samjatin">
<s>
RECORD record VV0 |Q1.2|K3|
ONE one PN1 |Z8|
@ -1364770,7 +1364770,7 @@ prevail prevail VVI |X9.2+|S7.1+|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="william_morris__news_from_nowhere">
<entry title="news_from_nowhere" author="william_morris">
<s>
Project project NP1 |A10+|X7+|X2.6+|
Gutenberg gutenberg NP1 |Z99|
@ -1464183,7 +1464183,7 @@ William william NP1 |Z1m|
Morris morris NP1 |Z1mf|
</s>
</text>
<text title="samuel_butler__erewhon">
<entry title="erewhon" author="samuel_butler">
<s>
The the AT |Z5|
Project project NN1 |X7+|X2.4|
@ -1566447,7 +1566447,7 @@ eBooks ebook NN2 |Q4.1/Y2|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="francis_bacon__new_atlantis">
<entry title="new_atlantis" author="francis_bacon">
<s>
The the AT |Z5|
Project project NN1 |X7+|X2.4|
@ -1588246,7 +1588246,7 @@ Francis francis NP1 |Z1m|
Bacon bacon NP1 |F1|
</s>
</text>
<text title="murray_leinster__a_logic_named_joe">
<entry title="a_logic_named_joe" author="murray_leinster">
<s>
A a AT1 |Z5|
Logic logic NN1 |X2.1|S1.2.6+|N2|
@ -1597349,7 +1597349,7 @@ hand hand NN1 |Z4|
maybe maybe RR |A7|
</s>
</text>
<text title="mary_shelly__the_last_man">
<entry title="the_last_man" author="mary_shelly">
<s>
The the AT |Z5|
Project project NN1 |X7+|X2.4|
@ -1817595,7 +1817595,7 @@ LICENSE license NN1 |G1.1/Q1.2|S7.4+|
*** *** FO |Z99|
</s>
</text>
<text title="kurt_vonnegut__player_piano">
<entry title="player_piano" author="kurt_vonnegut">
<s>
CONSIDER consider VV0 |X2.1|X2.4|X6|
THE the AT |Z5|
@ -1957008,7 +1957008,7 @@ March march NPM1 |T1.3|
. PUNC YSTP |PUNC|
</s>
</text>
<text title="aldous_huxley__brave_new_world">
<entry title="brave_new_world" author="aldous_huxley">
<s>
BRAVE brave JJ |E5+|
NEW new JJ |T3-|