mirror of
				https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nopaque.git
				synced 2025-11-03 20:02:47 +00:00 
			
		
		
		
	Manual Ergänzungen Intro /Getting Started
This commit is contained in:
		@@ -1,9 +1,27 @@
 | 
			
		||||
<h3 class="manual-chapter-title">Introduction</h3>
 | 
			
		||||
<h4>Introduction</h4>
 | 
			
		||||
<p>
 | 
			
		||||
  nopaque is a web-based digital working environment. It implements a
 | 
			
		||||
  workflow based on the research process in the humanities and supports its
 | 
			
		||||
  users in processing their data in order to subsequently apply digital
 | 
			
		||||
  analysis methods to them. All processes are implemented in a specially
 | 
			
		||||
  provided cloud environment with established open source software. This
 | 
			
		||||
  always ensures that no personal data of the users is disclosed.
 | 
			
		||||
  Nopaque is a web application that offers different services and tools to support 
 | 
			
		||||
  researchers working with image and text-based data. These services are logically 
 | 
			
		||||
  connected and build upon each other. They include:
 | 
			
		||||
</p>
 | 
			
		||||
 <ol style="list-style-type:disc; margin-left:2em; padding-bottom:0;">
 | 
			
		||||
  <li><b>File setup</b>, which converts and merges different data  (e.g., books, letters) 
 | 
			
		||||
  for further processing.</li>
 | 
			
		||||
  <li><b>Image-to-text conversion tools:</b></li>
 | 
			
		||||
    <ol style="list-style-type:circle; margin-left:1em; padding-bottom:0;"><li><b>Optical Character Recognition</b> converts photos and 
 | 
			
		||||
    scans into text data, making them machine-readable.</li>
 | 
			
		||||
    <li><b>Transkribus HTR (Handwritten Text Recognition) Pipeline</b> 
 | 
			
		||||
    also converts images into text data, making them machine-readable.</li>
 | 
			
		||||
    </ol>
 | 
			
		||||
  <li><b>Natural Language Processing</b> extracts information from your text via 
 | 
			
		||||
  computational linguistic data processing (tokenization, lemmatization, part-of-speech 
 | 
			
		||||
  tagging and named-entity recognition.</li>
 | 
			
		||||
  <li><b>Corpus analysis</b> makes use of CQP Query Language to search through text 
 | 
			
		||||
  corpora with the aid of metadata and Natural Language Processing tags.</li>
 | 
			
		||||
 </ol>
 | 
			
		||||
 | 
			
		||||
Nopaque also features a <b>Social Area</b>, where researchers can create a personal profile, connect with other users and share corpora if desired.
 | 
			
		||||
These services can be accessed from the sidebar in nopaque.
 | 
			
		||||
All processes are implemented in a specially provided cloud environment with established open-source software. This always ensures that no personal data of the users is disclosed.
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
@@ -1,18 +1,76 @@
 | 
			
		||||
<h3 class="manual-chapter-title">Registration and Log in</h3>
 | 
			
		||||
<div class="row">
 | 
			
		||||
  <div class="col s12 m4">
 | 
			
		||||
    <img alt="Registration and Log in" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/registration-and-log-in.png') }}">
 | 
			
		||||
  </div>
 | 
			
		||||
  <div class="col s12 m8">
 | 
			
		||||
    <p>
 | 
			
		||||
      Before you can start using the web platform, you need to create a user
 | 
			
		||||
      account. This requires only a few details: just a user name, an e-mail
 | 
			
		||||
      address and a password are needed. In order to register yourself, fill out
 | 
			
		||||
      the form on the <a href="{{ url_for('auth.register') }}">registration page</a>. After successful registration, the
 | 
			
		||||
      created account must be verified. To do this, follow the instructions
 | 
			
		||||
      given in the automatically sent e-mail. Afterwards, you can log in as
 | 
			
		||||
      usual with your username/email address and password in the log-in form
 | 
			
		||||
      located next to the registration button.
 | 
			
		||||
    </p>
 | 
			
		||||
  </div>
 | 
			
		||||
<h3 class="manual-chapter-title">Getting Started</h3>
 | 
			
		||||
<h4>Getting Started</h4>
 | 
			
		||||
<br>
 | 
			
		||||
 | 
			
		||||
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
 | 
			
		||||
  <h5>Content</h5>
 | 
			
		||||
  <ol style="list-style-type:disc">
 | 
			
		||||
    <li><a href="#registration-and-login">Registration and login</a></li>
 | 
			
		||||
    <li><a href="#preparing-files">Preparing files for analysis</a></li>
 | 
			
		||||
    <li><a href="#converting-a-pdf-into-text">Converting a PDF into text data</a></li>
 | 
			
		||||
    <li><a href="#extracting-linguistic-data">Extracting linguistic data from text</a></li>
 | 
			
		||||
    <li><a href="#creating-a-corpus">Creating a corpus</a></li>
 | 
			
		||||
    <li><a href="#analyzing-a-corpus">Analyzing a corpus</a></li>
 | 
			
		||||
  </ol>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
<p></p>
 | 
			
		||||
 | 
			
		||||
<h5 id="registration-and-login">Registration and login</h5>
 | 
			
		||||
<p>Before you can begin using nopaque, you will need to create a personal user account. 
 | 
			
		||||
Open the menu (three dots) at the top right of the screen and choose “Register.” Enter 
 | 
			
		||||
the required details listed on the registration page (username, password, email address). 
 | 
			
		||||
After verifying your account via the link sent to your email, you can log in.</p>
 | 
			
		||||
<h5 id="preparing-files">Preparing files for analysis</h5>
 | 
			
		||||
<p>A few steps need to be taken before images, scans, or other text data are ready for 
 | 
			
		||||
analysis in nopaque. The SpaCy NLP Pipeline service can only extract linguistic data 
 | 
			
		||||
from texts in plain text (.txt) format. If your text is already in this format, you 
 | 
			
		||||
can skip the next steps and go directly to <b>Extracting linguistic data from text</b>. 
 | 
			
		||||
Otherwise, the next steps assume that you are starting off with image data.</p>
 | 
			
		||||
<p>
 | 
			
		||||
First, all data needs to be converted into PDF format. Using the <b>File Setup</b> service, 
 | 
			
		||||
you can bundle images together – even of different formats – and convert them all into 
 | 
			
		||||
one PDF file. Note that the File Setup service will sort the images based on their file 
 | 
			
		||||
name in ascending order. It is thus recommended to name them accordingly, for example: 
 | 
			
		||||
page-01.png, page-02.jpg, page-03.tiff.
 | 
			
		||||
</p>
 | 
			
		||||
<p>
 | 
			
		||||
After uploading the images and completing the File Setup job, the list of files added 
 | 
			
		||||
can be seen under “Inputs.” Further below, under “Results,” you can find and download 
 | 
			
		||||
the PDF output.</p>
 | 
			
		||||
<h5 id="converting-a-pdf-into-text">Converting a PDF into text data</h5>
 | 
			
		||||
<p>Select an image-to-text conversion tool depending on whether your PDF is primarily 
 | 
			
		||||
composed of handwritten text or printed text. For printed text, select the <b>Tesseract OCR 
 | 
			
		||||
Pipeline</b>. For handwritten text, select the <b>Transkribus HTR Pipeline</b>. Select the desired 
 | 
			
		||||
language model or upload your own. Select the version of Tesseract OCR you want to use 
 | 
			
		||||
and click on submit to start the conversion. When the job is finished, various output 
 | 
			
		||||
files can be seen and downloaded further below, under “Results.” You may want to review 
 | 
			
		||||
the text output for errors and coherence.</p>
 | 
			
		||||
<h5 id="extracting-linguistic-data">Extracting linguistic data from text</h5>
 | 
			
		||||
<p>The <b>SpaCy NLP Pipeline</b> service extracts linguistic information from plain text files 
 | 
			
		||||
(in .txt format). Select the corresponding .txt file, the language model, and the 
 | 
			
		||||
version you want to use. When the job is finished, find and download the files in 
 | 
			
		||||
<b>.json</b> and <b>.vrt</b> format under “Results.”</p>
 | 
			
		||||
<h5 id="creating-a-corpus">Creating a corpus</h5>
 | 
			
		||||
<p>Now, using the files in .vrt format, you can create a corpus. This can be done 
 | 
			
		||||
in the Dashboard or Corpus Analysis under “My Corpora.” Click on “Create corpus” 
 | 
			
		||||
and add a title and description for your corpus. After submitting, navigate down to 
 | 
			
		||||
the “Corpus files” section. Once you have added the desired .vrt files, select “Build” 
 | 
			
		||||
on the corpus page under “Actions.” Now, your corpus is ready for analysis.</p>
 | 
			
		||||
<h5 id="analyzing-a-corpus">Analyzing a corpus</h5>
 | 
			
		||||
<p>Navigate to the corpus you would like to analyze and click on the Analyze button. 
 | 
			
		||||
This will take you to an analysis overview page for your corpus. Here, you can find a 
 | 
			
		||||
visualization of general linguistic information of your corpus, including tokens, 
 | 
			
		||||
sentences, unique words, unique lemmas, unique parts of speech and unique simple parts 
 | 
			
		||||
of speech. You will also find a pie chart of the proportional textual makeup of your 
 | 
			
		||||
corpus and can view the linguistic information for each individual text file. A more 
 | 
			
		||||
detailed visualization of token frequencies with a search option is also on this page.</p>
 | 
			
		||||
<p>From the corpus analysis overview page, you can navigate to other analysis modules: 
 | 
			
		||||
the <b>Query Builder</b> (under <b>Concordance</b>) and the <b>Reader</b>. With the Reader, you can read 
 | 
			
		||||
your corpus texts tokenized with the associated linguistic information. The tokens can 
 | 
			
		||||
be shown as lemmas, parts of speech, words, and can be displayed in different ways: 
 | 
			
		||||
visually as plain text with the option of highlighted entities or as chips.</p>
 | 
			
		||||
<p>The <b>Concordance</b> module allows for more specific, query-oriented text analyses. 
 | 
			
		||||
Here, you can filter out text parameters and structural attributes in different 
 | 
			
		||||
combinations. This is explained in more detail in the Query Builder section of the 
 | 
			
		||||
manual.</p>
 | 
			
		||||
 
 | 
			
		||||
@@ -1,15 +1,20 @@
 | 
			
		||||
<h3 class="manual-chapter-title">Dashboard</h3>
 | 
			
		||||
<h4>About the dashboard</h4>
 | 
			
		||||
<br>
 | 
			
		||||
<div class="row">
 | 
			
		||||
  <div class="col s12 m4">
 | 
			
		||||
    <img alt="Dashboard" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/dashboard.png') }}">
 | 
			
		||||
  </div>
 | 
			
		||||
  <div class="col s12 m8">
 | 
			
		||||
    <p>
 | 
			
		||||
      The <a href="{{ url_for('main.dashboard') }}">dashboard</a> provides a central overview of all resources assigned to the
 | 
			
		||||
      user. These are <a href="{{ url_for('main.dashboard', _anchor='corpora') }}">corpora</a> and created <a href="{{ url_for('main.dashboard', _anchor='jobs') }}">jobs</a>. Corpora are freely composable
 | 
			
		||||
      annotated text collections and jobs are the initiated file processing
 | 
			
		||||
      procedures. One can search for jobs as well as corpus listings using
 | 
			
		||||
      the search field displayed above them.
 | 
			
		||||
      The <a href="{{ url_for('main.dashboard') }}">dashboard</a> provides a central 
 | 
			
		||||
      overview of all user-specific resources.
 | 
			
		||||
      These are <a href="{{ url_for('main.dashboard', _anchor='corpora') }}">corpora</a>, 
 | 
			
		||||
      created <a href="{{ url_for('main.dashboard', _anchor='jobs') }}">jobs</a>, and 
 | 
			
		||||
      model contributions. 
 | 
			
		||||
      A corpus is a freely composable annotated text collection. 
 | 
			
		||||
      A job is an initiated file processing procedure. One can search for jobs as 
 | 
			
		||||
      well as corpus listings using the search field displayed above them on the dashboard.
 | 
			
		||||
    </p>
 | 
			
		||||
  </div>
 | 
			
		||||
  <div class="col s12"> </div>
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user