From 62b70a7532a0ce150264652a8acf474c3918a6f1 Mon Sep 17 00:00:00 2001 From: Patrick Jentsch Date: Thu, 27 Feb 2020 16:26:04 +0100 Subject: [PATCH] Update --- .../services/corpus_analysis.html.j2 | 20 ++- app/templates/services/nlp.html.j2 | 51 ++++--- app/templates/services/ocr.html.j2 | 139 ++++++++---------- 3 files changed, 96 insertions(+), 114 deletions(-) diff --git a/app/templates/services/corpus_analysis.html.j2 b/app/templates/services/corpus_analysis.html.j2 index 68a12ec2..9675c9bd 100644 --- a/app/templates/services/corpus_analysis.html.j2 +++ b/app/templates/services/corpus_analysis.html.j2 @@ -9,23 +9,21 @@
- layersTokenisierung -

Aufteilung eines Textes in Sätze und Wörter. Dies ist zur weiteren Verarbeitung notwendig.

+ layersTokenization +

Your text is split up into sentences and words, so called tokens, which can then be analyzed.

-
 
- layersLemmatisierung -

Reduktion der Flexionsformen eines Wortes auf dessen Grundform.

+ layersLemmatization +

All inflected forms of a word are grouped together so that it can be analyzed as a single item.

-
 
+

 

- layersPart-of-speech-Tagging -

Kontext- und definitionsbezogene Zuordnung von Wörtern und Satzzeichen zu Wortarten.

+ layersPart-of-speech Tagging +

In accordance with its definition and context, each word is marked up as corresponding to a particular part of speech.

-
 
- layersEigennamenerkennung -

Identifikation von Wörtern, die eine Entitätbeschreiben, wie Firmen- und Personennamen.

+ layersNamed-Entity Recognition +

Named entities are located and classified into specific categories like persons or locations.

diff --git a/app/templates/services/nlp.html.j2 b/app/templates/services/nlp.html.j2 index e53ad862..41f27fa3 100644 --- a/app/templates/services/nlp.html.j2 +++ b/app/templates/services/nlp.html.j2 @@ -9,23 +9,21 @@
- layersTokenisierung -

Aufteilung eines Textes in Sätze und Wörter. Dies ist zur weiteren Verarbeitung notwendig.

+ layersTokenization +

Your text is split up into sentences and words, so called tokens, which can then be analyzed.

-
 
- layersLemmatisierung -

Reduktion der Flexionsformen eines Wortes auf dessen Grundform.

+ layersLemmatization +

All inflected forms of a word are grouped together so that it can be analyzed as a single item.

-
 
+

 

- layersPart-of-speech-Tagging -

Kontext- und definitionsbezogene Zuordnung von Wörtern und Satzzeichen zu Wortarten.

+ layersPart-of-speech Tagging +

In accordance with its definition and context, each word is marked up as corresponding to a particular part of speech.

-
 
- layersEigennamenerkennung -

Identifikation von Wörtern, die eine Entitätbeschreiben, wie Firmen- und Personennamen.

+ layersNamed-Entity Recognition +

Named entities are located and classified into specific categories like persons or locations.

@@ -95,18 +93,27 @@ {% endfor %} - -
-
- Check Encoding -

If the input files are not created with the nopaque OCR service or you do not know if your text files are UTF-8 encoded, check this switch. We will try to automatically determine the right encoding for your texts to process them.

-
- -
+
+ Preprocessing options
+
+

{{ add_job_form.check_encoding.label.text }}

+

If the input files are not created with the nopaque OCR service or you do not know if your text files are UTF-8 encoded, check this switch. We will try to automatically determine the right encoding for your texts to process them.

+
+
+
+ +
+
+
diff --git a/app/templates/services/ocr.html.j2 b/app/templates/services/ocr.html.j2 index 8ae0834c..e064beda 100644 --- a/app/templates/services/ocr.html.j2 +++ b/app/templates/services/ocr.html.j2 @@ -93,95 +93,72 @@ {% endfor %}
-

 

-
-
-
- Page range (N.a.) -

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut

-
-
-
- - -
-
-
-

-
-
-
- - -
-
-
-
-
-
- -
-
+
+ Preprocessing options +
+
+

{{ add_job_form.binarization.label.text }}

+

Based on a brightness threshold pixels are converted to either black or white. Reduces noise in images.

+
+
+
+
-
-
-
- Page split (N.a.) -

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut

-

- -

-
-
-
- -
-
+

 

+
+

 

+
+

Page range

+

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut

+
+
+
+
-
-
-
- Page rotation (N.a.) -

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut

-

- -

-
-
-
- -
-
+

 

+
+

 

+
+

Page rotation

+

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut

+
+
+
+
-
-
-
-
- Binarization -

Based on a brightness threshold pixels are converted to either black or white. It's usefull to reduce noise in images. (long duration)

-
-
-
- -
-
+

 

+
+

 

+
+

Page split

+

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut

+
+
+
+
+