mirror of
				https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nopaque.git
				synced 2025-11-03 20:02:47 +00:00 
			
		
		
		
	more manual updates
This commit is contained in:
		@@ -11,7 +11,7 @@
 | 
			
		||||
  <li><b>Image-to-text conversion tools:</b></li>
 | 
			
		||||
    <ol style="list-style-type:circle; margin-left:1em; padding-bottom:0;"><li><b>Optical Character Recognition</b> converts photos and 
 | 
			
		||||
    scans into text data, making them machine-readable.</li>
 | 
			
		||||
    <li><b>Transkribus HTR (Handwritten Text Recognition) Pipeline (currently deactivated)* </b> 
 | 
			
		||||
    <li><b>Transkribus HTR (Handwritten Text Recognition) Pipeline</b> (currently deactivated)* 
 | 
			
		||||
    also converts images into text data, making them machine-readable.</li>
 | 
			
		||||
    </ol>
 | 
			
		||||
  <li><b>Natural Language Processing</b> extracts information from your text via 
 | 
			
		||||
 
 | 
			
		||||
@@ -57,10 +57,21 @@ version* you want to use. When the job is finished, find and download the files
 | 
			
		||||
<b>.json</b> and <b>.vrt</b> format under “Results.”</p>
 | 
			
		||||
<h5 id="creating-a-corpus">Creating a corpus</h5>
 | 
			
		||||
<p>Now, using the files in .vrt format, you can create a corpus. This can be done 
 | 
			
		||||
in the Dashboard or Corpus Analysis under “My Corpora.” Click on “Create corpus” 
 | 
			
		||||
and add a title and description for your corpus. After submitting, navigate down to 
 | 
			
		||||
the “Corpus files” section. Once you have added the desired .vrt files, select “Build” 
 | 
			
		||||
on the corpus page under “Actions.” Now, your corpus is ready for analysis.</p>
 | 
			
		||||
in the <a href="{{ url_for('main.dashboard') }}">Dashboard</a> or 
 | 
			
		||||
<a href="{{ url_for('services.corpus_analysis') }}">Corpus Analysis</a> sections under “My Corpora.” Click on “Create corpus” 
 | 
			
		||||
and add a title and description for your corpus. After submitting, you will automatically 
 | 
			
		||||
be taken to the corpus overview page (which can be called up again via the corpus lists) 
 | 
			
		||||
of your new, still empty corpus. </p>
 | 
			
		||||
<p>
 | 
			
		||||
Further down in the “Corpus files” section, you can add texts in .vrt format 
 | 
			
		||||
(results of the NLP service) to your new corpus. To do this, use the "Add Corpus File" 
 | 
			
		||||
button and fill in the form that appears. Here, you can add 
 | 
			
		||||
metadata to each text. After adding all texts to the corpus, it must 
 | 
			
		||||
be prepared for analysis. This process can be initiated by clicking on the 
 | 
			
		||||
"Build" button under "Actions". 
 | 
			
		||||
On the corpus overview page, you can see information about the current status of 
 | 
			
		||||
the corpus in the upper right corner. After the build process, the status "built" should be shown here.
 | 
			
		||||
Now, your corpus is ready for analysis.</p>
 | 
			
		||||
<h5 id="analyzing-a-corpus">Analyzing a corpus</h5>
 | 
			
		||||
<p>Navigate to the corpus you would like to analyze and click on the Analyze button. 
 | 
			
		||||
This will take you to an analysis overview page for your corpus. Here, you can find a 
 | 
			
		||||
 
 | 
			
		||||
@@ -16,7 +16,7 @@
 | 
			
		||||
      A <b>job</b> is an initiated file processing procedure. 
 | 
			
		||||
      A <b>model</b> is a mathematical system for pattern recognition based on data examples that have been processed by AI. One can search for jobs as 
 | 
			
		||||
      well as corpus listings using the search field displayed above them on the dashboard. 
 | 
			
		||||
      Models can be found and edited by clicking on the corresponding service under <b>My Contributions</b>.      
 | 
			
		||||
      Uploaded models can be found and edited by clicking on the corresponding service under <b>My Contributions</b>.      
 | 
			
		||||
    </p>
 | 
			
		||||
  </div>
 | 
			
		||||
  <div class="col s12"> </div>
 | 
			
		||||
 
 | 
			
		||||
@@ -61,12 +61,41 @@ deactivated. We are working on an alternative solution. In the meantime, you can
 | 
			
		||||
try using Tesseract OCR, though the results will likely be poor.</p>
 | 
			
		||||
 | 
			
		||||
<h4>Natural Language Processing (NLP)</h4>
 | 
			
		||||
<p>Coming soon...</p>
 | 
			
		||||
<p>The <a href="{{ url_for('services.spacy_nlp_pipeline') }}">SpaCy NLP Pipeline</a> extracts 
 | 
			
		||||
information from plain text files (.txt format) via computational linguistic data processing 
 | 
			
		||||
(tokenization, lemmatization, part-of-speech tagging and named-entity recognition). 
 | 
			
		||||
To use this service, select the corresponding .txt file, the language model, and the 
 | 
			
		||||
version you want to use. When the job is finished, find and download the files in 
 | 
			
		||||
<b>.json</b> and <b>.vrt</b> format under “Results.”</p>
 | 
			
		||||
 | 
			
		||||
<h4>Corpus Analysis</h4>
 | 
			
		||||
<p>
 | 
			
		||||
  With the corpus analysis service, it is possible to create a text corpus
 | 
			
		||||
  With the <a href="{{ url_for('services.corpus_analysis') }}">Corpus Analysis</a> 
 | 
			
		||||
  service, it is possible to create a text corpus
 | 
			
		||||
  and then explore through it with analytical tools. The analysis session is realized
 | 
			
		||||
  on the server side by the Open Corpus Workbench software, which enables
 | 
			
		||||
  efficient and complex searches with the help of the CQP Query Language.
 | 
			
		||||
  efficient and complex searches with the help of the CQP Query Language.</p>
 | 
			
		||||
  <p>
 | 
			
		||||
  To use this service, navigate to the corpus you would like to analyze and click on the Analyze button. 
 | 
			
		||||
  This will take you to an analysis overview page for your corpus. Here, you can find 
 | 
			
		||||
  a visualization of general linguistic information of your corpus, including tokens, 
 | 
			
		||||
  sentences, unique words, unique lemmas, unique parts of speech and unique simple 
 | 
			
		||||
  parts of speech. You will also find a pie chart of the proportional textual makeup 
 | 
			
		||||
  of your corpus and can view the linguistic information for each individual text file. 
 | 
			
		||||
  A more detailed visualization of token frequencies with a search option is also on 
 | 
			
		||||
  this page.
 | 
			
		||||
  </p>
 | 
			
		||||
  <p>
 | 
			
		||||
  From the corpus analysis overview page, you can navigate to other analysis modules: 
 | 
			
		||||
  the Query Builder (under Concordance) and the Reader. With the Reader, you can read 
 | 
			
		||||
  your corpus texts tokenized with the associated linguistic information. The tokens 
 | 
			
		||||
  can be shown as lemmas, parts of speech, words, and can be displayed in different 
 | 
			
		||||
  ways: visually as plain text with the option of highlighted entities or as chips.
 | 
			
		||||
  </p>
 | 
			
		||||
  <p>
 | 
			
		||||
  The Concordance module allows for more specific, query-oriented text analyses. 
 | 
			
		||||
  Here, you can filter out text parameters and structural attributes in different 
 | 
			
		||||
  combinations. This is explained in more detail in the Query Builder section of the 
 | 
			
		||||
  manual.
 | 
			
		||||
  </p>
 | 
			
		||||
</p>
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user