mirror of
https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nopaque.git
synced 2025-01-03 23:14:18 +00:00
107 lines
4.8 KiB
Django/Jinja
107 lines
4.8 KiB
Django/Jinja
<h2>Services</h2>
|
|
<div class="row">
|
|
<div class="col s12 m4">
|
|
<br class="hide-on-small-only">
|
|
<img alt="Services" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/services.png') }}">
|
|
</div>
|
|
|
|
<div class="col s12 m8">
|
|
<p>
|
|
nopaque was designed from the ground up to be modular. This modularity
|
|
means that the offered workflow provides variable entry and exit points,
|
|
so that different starting points and goals can be flexibly addressed.
|
|
Each of these modules are implemented in a self-contained service, each of
|
|
which represents a step in the workflow. The services are coordinated in
|
|
such a way that they can be used consecutively. The order can either be
|
|
taken from the listing of the services in the left sidebar or from the
|
|
roadmap (accessible via the pink compass in the upper right corner). All
|
|
services are versioned, so the data generated with nopaque is always
|
|
reproducible.
|
|
</p>
|
|
</div>
|
|
</div>
|
|
|
|
<h3>File Setup</h3>
|
|
<p>
|
|
The <a href="{{ url_for('services.file_setup_pipeline') }}">File Setup Service</a> bundles image data, such as scans and photos,
|
|
together in a handy PDF file. To use this service, use the job form to
|
|
select the images to be bundled, choose the desired service version, and
|
|
specify a title and description. Please note that the service sorts the
|
|
images into the resulting PDF file based on the file names. So naming the
|
|
images correctly is of great importance. It has proven to be a good practice
|
|
to name the files according to the following scheme:
|
|
page-01.png, page-02.jpg, page-03.tiff, etc. In general, you can assume
|
|
that the images will be sorted in the order in which the file explorer of
|
|
your operating system lists them when you view the files in a folder
|
|
sorted in ascending order by file name.
|
|
</p>
|
|
|
|
<h3>Optical Character Recognition (OCR)</h3>
|
|
<p>Coming soon...</p>
|
|
|
|
<h3>Handwritten Text Recognition (HTR)</h3>
|
|
<p>Coming soon...</p>
|
|
|
|
<h3>Natural Language Processing (NLP)</h3>
|
|
<p>Coming soon...</p>
|
|
|
|
<h3>Corpus Analysis</h3>
|
|
<p>
|
|
With the corpus analysis service, it is possible to create a text corpus
|
|
and then explore it in an analysis session. The analysis session is realized
|
|
on the server side by the Open Corpus Workbench software, which enables
|
|
efficient and complex searches with the help of the CQP Query Language.
|
|
</p>
|
|
|
|
<div class="row">
|
|
<div class="col s12 m4">
|
|
<br class="hide-on-small-only">
|
|
<img alt="Create a Corpus" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/create-a-corpus.png') }}">
|
|
</div>
|
|
|
|
<div class="col s12 m8">
|
|
<p>
|
|
To <a href="{{ url_for('corpora.create_corpus') }}">create a corpus</a>, you
|
|
can use the "New Corpus" button, which can be found on both the Corpus
|
|
Analysis Service page and the Dashboard below the corpus list. Fill in the input
|
|
mask to Create a corpus. After you have completed the input mask, you will
|
|
be automatically taken to the corpus overview page (which can be called up
|
|
again via the corpus lists) of your new and accordingly still empty corpus.
|
|
</p>
|
|
</div>
|
|
</div>
|
|
|
|
<div class="row">
|
|
<div class="col s12 m4">
|
|
<br class="hide-on-small-only">
|
|
<img alt="Create a Corpus" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/add-corpus-file.png') }}">
|
|
</div>
|
|
|
|
<div class="col s12 m8">
|
|
<p>
|
|
Now you can add texts in vrt format (results of the NLP service) to your new
|
|
corpus. To do this, use the "Add Corpus File" button and fill in the form
|
|
that appears. You will get the possibility to add metadata to each text.
|
|
After you have added all the desired texts to the corpus, the corpus must be
|
|
prepared for the analysis, this process can be initiated by clicking on the
|
|
"Build" button. On the corpus overview page you can always see information
|
|
about the current status of the corpus in the upper right corner. After the
|
|
build process the status should be "built".
|
|
</p>
|
|
</div>
|
|
</div>
|
|
|
|
<h4>Analyze a corpus</h4>
|
|
<p>
|
|
After you have created and built a corpus, it can be analyzed. To do this,
|
|
use the button labeled Analyze. The corpus analysis currently offers two
|
|
modules, the Reader and the Concordance module. The reader module can be
|
|
used to read your tokenized corpus in different ways. You can select a token
|
|
representation option, it determines the property of a token to be shown.
|
|
You can for example read your text completly lemmatized. You can also change
|
|
the way of how a token is displayed, by using the text style switch. The
|
|
concordance module offers some more options regarding the context size of
|
|
search results. If the context does not provide enough information you can
|
|
hop into the reader module by using the magnifier icon next to a match.
|
|
</p>
|