Manual Ergänzungen Intro /Getting Started

This commit is contained in:
Gloria Glinphratum 2024-03-05 15:41:17 +01:00
parent 5dce269736
commit ffd7a3ad91
3 changed files with 109 additions and 28 deletions

View File

@ -1,9 +1,27 @@
<h3 class="manual-chapter-title">Introduction</h3>
<h4>Introduction</h4>
<p>
nopaque is a web-based digital working environment. It implements a
workflow based on the research process in the humanities and supports its
users in processing their data in order to subsequently apply digital
analysis methods to them. All processes are implemented in a specially
provided cloud environment with established open source software. This
always ensures that no personal data of the users is disclosed.
Nopaque is a web application that offers different services and tools to support
researchers working with image and text-based data. These services are logically
connected and build upon each other. They include:
</p>
<ol style="list-style-type:disc; margin-left:2em; padding-bottom:0;">
<li><b>File setup</b>, which converts and merges different data (e.g., books, letters)
for further processing.</li>
<li><b>Image-to-text conversion tools:</b></li>
<ol style="list-style-type:circle; margin-left:1em; padding-bottom:0;"><li><b>Optical Character Recognition</b> converts photos and
scans into text data, making them machine-readable.</li>
<li><b>Transkribus HTR (Handwritten Text Recognition) Pipeline</b>
also converts images into text data, making them machine-readable.</li>
</ol>
<li><b>Natural Language Processing</b> extracts information from your text via
computational linguistic data processing (tokenization, lemmatization, part-of-speech
tagging and named-entity recognition.</li>
<li><b>Corpus analysis</b> makes use of CQP Query Language to search through text
corpora with the aid of metadata and Natural Language Processing tags.</li>
</ol>
Nopaque also features a <b>Social Area</b>, where researchers can create a personal profile, connect with other users and share corpora if desired.
These services can be accessed from the sidebar in nopaque.
All processes are implemented in a specially provided cloud environment with established open-source software. This always ensures that no personal data of the users is disclosed.

View File

@ -1,18 +1,76 @@
<h3 class="manual-chapter-title">Registration and Log in</h3>
<div class="row">
<div class="col s12 m4">
<img alt="Registration and Log in" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/registration-and-log-in.png') }}">
</div>
<div class="col s12 m8">
<p>
Before you can start using the web platform, you need to create a user
account. This requires only a few details: just a user name, an e-mail
address and a password are needed. In order to register yourself, fill out
the form on the <a href="{{ url_for('auth.register') }}">registration page</a>. After successful registration, the
created account must be verified. To do this, follow the instructions
given in the automatically sent e-mail. Afterwards, you can log in as
usual with your username/email address and password in the log-in form
located next to the registration button.
</p>
</div>
<h3 class="manual-chapter-title">Getting Started</h3>
<h4>Getting Started</h4>
<br>
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
<h5>Content</h5>
<ol style="list-style-type:disc">
<li><a href="#registration-and-login">Registration and login</a></li>
<li><a href="#preparing-files">Preparing files for analysis</a></li>
<li><a href="#converting-a-pdf-into-text">Converting a PDF into text data</a></li>
<li><a href="#extracting-linguistic-data">Extracting linguistic data from text</a></li>
<li><a href="#creating-a-corpus">Creating a corpus</a></li>
<li><a href="#analyzing-a-corpus">Analyzing a corpus</a></li>
</ol>
</div>
<p></p>
<h5 id="registration-and-login">Registration and login</h5>
<p>Before you can begin using nopaque, you will need to create a personal user account.
Open the menu (three dots) at the top right of the screen and choose “Register.” Enter
the required details listed on the registration page (username, password, email address).
After verifying your account via the link sent to your email, you can log in.</p>
<h5 id="preparing-files">Preparing files for analysis</h5>
<p>A few steps need to be taken before images, scans, or other text data are ready for
analysis in nopaque. The SpaCy NLP Pipeline service can only extract linguistic data
from texts in plain text (.txt) format. If your text is already in this format, you
can skip the next steps and go directly to <b>Extracting linguistic data from text</b>.
Otherwise, the next steps assume that you are starting off with image data.</p>
<p>
First, all data needs to be converted into PDF format. Using the <b>File Setup</b> service,
you can bundle images together even of different formats and convert them all into
one PDF file. Note that the File Setup service will sort the images based on their file
name in ascending order. It is thus recommended to name them accordingly, for example:
page-01.png, page-02.jpg, page-03.tiff.
</p>
<p>
After uploading the images and completing the File Setup job, the list of files added
can be seen under “Inputs.” Further below, under “Results,” you can find and download
the PDF output.</p>
<h5 id="converting-a-pdf-into-text">Converting a PDF into text data</h5>
<p>Select an image-to-text conversion tool depending on whether your PDF is primarily
composed of handwritten text or printed text. For printed text, select the <b>Tesseract OCR
Pipeline</b>. For handwritten text, select the <b>Transkribus HTR Pipeline</b>. Select the desired
language model or upload your own. Select the version of Tesseract OCR you want to use
and click on submit to start the conversion. When the job is finished, various output
files can be seen and downloaded further below, under “Results.” You may want to review
the text output for errors and coherence.</p>
<h5 id="extracting-linguistic-data">Extracting linguistic data from text</h5>
<p>The <b>SpaCy NLP Pipeline</b> service extracts linguistic information from plain text files
(in .txt format). Select the corresponding .txt file, the language model, and the
version you want to use. When the job is finished, find and download the files in
<b>.json</b> and <b>.vrt</b> format under “Results.”</p>
<h5 id="creating-a-corpus">Creating a corpus</h5>
<p>Now, using the files in .vrt format, you can create a corpus. This can be done
in the Dashboard or Corpus Analysis under “My Corpora.” Click on “Create corpus”
and add a title and description for your corpus. After submitting, navigate down to
the “Corpus files” section. Once you have added the desired .vrt files, select “Build”
on the corpus page under “Actions.” Now, your corpus is ready for analysis.</p>
<h5 id="analyzing-a-corpus">Analyzing a corpus</h5>
<p>Navigate to the corpus you would like to analyze and click on the Analyze button.
This will take you to an analysis overview page for your corpus. Here, you can find a
visualization of general linguistic information of your corpus, including tokens,
sentences, unique words, unique lemmas, unique parts of speech and unique simple parts
of speech. You will also find a pie chart of the proportional textual makeup of your
corpus and can view the linguistic information for each individual text file. A more
detailed visualization of token frequencies with a search option is also on this page.</p>
<p>From the corpus analysis overview page, you can navigate to other analysis modules:
the <b>Query Builder</b> (under <b>Concordance</b>) and the <b>Reader</b>. With the Reader, you can read
your corpus texts tokenized with the associated linguistic information. The tokens can
be shown as lemmas, parts of speech, words, and can be displayed in different ways:
visually as plain text with the option of highlighted entities or as chips.</p>
<p>The <b>Concordance</b> module allows for more specific, query-oriented text analyses.
Here, you can filter out text parameters and structural attributes in different
combinations. This is explained in more detail in the Query Builder section of the
manual.</p>

View File

@ -1,15 +1,20 @@
<h3 class="manual-chapter-title">Dashboard</h3>
<h4>About the dashboard</h4>
<br>
<div class="row">
<div class="col s12 m4">
<img alt="Dashboard" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/dashboard.png') }}">
</div>
<div class="col s12 m8">
<p>
The <a href="{{ url_for('main.dashboard') }}">dashboard</a> provides a central overview of all resources assigned to the
user. These are <a href="{{ url_for('main.dashboard', _anchor='corpora') }}">corpora</a> and created <a href="{{ url_for('main.dashboard', _anchor='jobs') }}">jobs</a>. Corpora are freely composable
annotated text collections and jobs are the initiated file processing
procedures. One can search for jobs as well as corpus listings using
the search field displayed above them.
The <a href="{{ url_for('main.dashboard') }}">dashboard</a> provides a central
overview of all user-specific resources.
These are <a href="{{ url_for('main.dashboard', _anchor='corpora') }}">corpora</a>,
created <a href="{{ url_for('main.dashboard', _anchor='jobs') }}">jobs</a>, and
model contributions.
A corpus is a freely composable annotated text collection.
A job is an initiated file processing procedure. One can search for jobs as
well as corpus listings using the search field displayed above them on the dashboard.
</p>
</div>
<div class="col s12">&nbsp;</div>