From 62b70a7532a0ce150264652a8acf474c3918a6f1 Mon Sep 17 00:00:00 2001 From: Patrick Jentsch
Aufteilung eines Textes in Sätze und Wörter. Dies ist zur weiteren Verarbeitung notwendig.
+ layersTokenization +Your text is split up into sentences and words, so called tokens, which can then be analyzed.
Reduktion der Flexionsformen eines Wortes auf dessen Grundform.
+ layersLemmatization +All inflected forms of a word are grouped together so that it can be analyzed as a single item.
Kontext- und definitionsbezogene Zuordnung von Wörtern und Satzzeichen zu Wortarten.
+ layersPart-of-speech Tagging +In accordance with its definition and context, each word is marked up as corresponding to a particular part of speech.
Identifikation von Wörtern, die eine Entitätbeschreiben, wie Firmen- und Personennamen.
+ layersNamed-Entity Recognition +Named entities are located and classified into specific categories like persons or locations.
Aufteilung eines Textes in Sätze und Wörter. Dies ist zur weiteren Verarbeitung notwendig.
+ layersTokenization +Your text is split up into sentences and words, so called tokens, which can then be analyzed.
Reduktion der Flexionsformen eines Wortes auf dessen Grundform.
+ layersLemmatization +All inflected forms of a word are grouped together so that it can be analyzed as a single item.
Kontext- und definitionsbezogene Zuordnung von Wörtern und Satzzeichen zu Wortarten.
+ layersPart-of-speech Tagging +In accordance with its definition and context, each word is marked up as corresponding to a particular part of speech.
Identifikation von Wörtern, die eine Entitätbeschreiben, wie Firmen- und Personennamen.
+ layersNamed-Entity Recognition +Named entities are located and classified into specific categories like persons or locations.
If the input files are not created with the nopaque OCR service or you do not know if your text files are UTF-8 encoded, check this switch. We will try to automatically determine the right encoding for your texts to process them.
-{{ add_job_form.check_encoding.label.text }}
+If the input files are not created with the nopaque OCR service or you do not know if your text files are UTF-8 encoded, check this switch. We will try to automatically determine the right encoding for your texts to process them.
+
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut
-–
-{{ add_job_form.binarization.label.text }}
+Based on a brightness threshold pixels are converted to either black or white. Reduces noise in images.
+Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut
-- -
-
Page range
+Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut
+Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut
-- -
-
Page rotation
+Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut
+Based on a brightness threshold pixels are converted to either black or white. It's usefull to reduce noise in images. (long duration)
-
Page split
+Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempora invidunt ut
+