mirror of
https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nopaque.git
synced 2025-07-01 10:20:34 +00:00
Compare commits
6 Commits
access-pip
...
manual
Author | SHA1 | Date | |
---|---|---|---|
48fe7c0702 | |||
5a2723b617 | |||
4425d50140 | |||
39113a6f17 | |||
a53f1d216b | |||
ffd7a3ad91 |
@ -1,9 +1,34 @@
|
|||||||
<h3 class="manual-chapter-title">Introduction</h3>
|
<h3 class="manual-chapter-title">Introduction</h3>
|
||||||
|
<h4>Introduction</h4>
|
||||||
<p>
|
<p>
|
||||||
nopaque is a web-based digital working environment. It implements a
|
Nopaque is a web application that offers different services and tools to support
|
||||||
workflow based on the research process in the humanities and supports its
|
researchers working with image and text-based data. These services are logically
|
||||||
users in processing their data in order to subsequently apply digital
|
connected and build upon each other. They include:
|
||||||
analysis methods to them. All processes are implemented in a specially
|
|
||||||
provided cloud environment with established open source software. This
|
|
||||||
always ensures that no personal data of the users is disclosed.
|
|
||||||
</p>
|
</p>
|
||||||
|
<ol style="list-style-type:disc; margin-left:2em; padding-bottom:0;">
|
||||||
|
<li><b>File setup</b>, which converts and merges different data (e.g., books, letters)
|
||||||
|
for further processing.</li>
|
||||||
|
<li><b>Image-to-text conversion tools:</b></li>
|
||||||
|
<ol style="list-style-type:circle; margin-left:1em; padding-bottom:0;"><li><b>Optical Character Recognition</b> converts photos and
|
||||||
|
scans into text data, making them machine-readable.</li>
|
||||||
|
<li><b>Transkribus HTR (Handwritten Text Recognition) Pipeline</b> (currently deactivated)*
|
||||||
|
also converts images into text data, making them machine-readable.</li>
|
||||||
|
</ol>
|
||||||
|
<li><b>Natural Language Processing</b> extracts information from your text via
|
||||||
|
computational linguistic data processing (tokenization, lemmatization, part-of-speech
|
||||||
|
tagging and named-entity recognition.</li>
|
||||||
|
<li><b>Corpus analysis</b> makes use of CQP Query Language to search through text
|
||||||
|
corpora with the aid of metadata and Natural Language Processing tags.</li>
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
Nopaque also features a <b>Social Area</b>, where researchers can create a personal profile, connect with other users and share corpora if desired.
|
||||||
|
These services can be accessed from the sidebar in nopaque.
|
||||||
|
All processes are implemented in a specially provided cloud environment with established open-source software.
|
||||||
|
This always ensures that no personal data of the users is disclosed.
|
||||||
|
<p>
|
||||||
|
*Note: the Transkribus HTR Pipeline is currently
|
||||||
|
deactivated; we are working on an alternative solution. You can try using Tesseract OCR,
|
||||||
|
though the results will likely be poor.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
|
104
app/templates/_base/_modals/_manual/02_getting_started.html.j2
Normal file
104
app/templates/_base/_modals/_manual/02_getting_started.html.j2
Normal file
@ -0,0 +1,104 @@
|
|||||||
|
<h3 class="manual-chapter-title">Getting Started</h3>
|
||||||
|
<h4>Getting Started</h4>
|
||||||
|
<p>
|
||||||
|
In this section, we will take you through all the steps you need to start analyzing your data with nopaque.
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
|
||||||
|
<h5>Content</h5>
|
||||||
|
<ol style="list-style-type:disc">
|
||||||
|
<li><a href="#registration-and-login">Registration and login</a></li>
|
||||||
|
<li><a href="#preparing-files">Preparing files for analysis</a></li>
|
||||||
|
<li><a href="#converting-a-pdf-into-text">Converting a PDF into text data</a></li>
|
||||||
|
<li><a href="#extracting-linguistic-data">Extracting linguistic data from text</a></li>
|
||||||
|
<li><a href="#creating-a-corpus">Creating a corpus</a></li>
|
||||||
|
<li><a href="#analyzing-a-corpus">Analyzing a corpus</a></li>
|
||||||
|
</ol>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<p></p>
|
||||||
|
|
||||||
|
<h5 id="registration-and-login">Registration and login</h5>
|
||||||
|
<p>Before you can begin using nopaque, you will need to create a personal user account.
|
||||||
|
Open the menu (three dots) at the top right of the screen and choose “Register.” Enter
|
||||||
|
the required details listed on the registration page (username, password, email address).
|
||||||
|
After verifying your account via the link sent to your email, you can log in.</p>
|
||||||
|
|
||||||
|
<h5 id="preparing-files">Preparing files for analysis</h5>
|
||||||
|
<p>A few steps need to be taken before images, scans, or other text data are ready for
|
||||||
|
analysis in nopaque. The SpaCy NLP Pipeline service can only extract linguistic data
|
||||||
|
from texts in plain text (.txt) format. If your text is already in this format, you
|
||||||
|
can skip the next steps and go directly to <b>Extracting linguistic data from text</b>.
|
||||||
|
Otherwise, the next steps assume that you are starting off with image data.</p>
|
||||||
|
<p>
|
||||||
|
First, all data needs to be converted into PDF format. Using the <b>File Setup</b> service,
|
||||||
|
you can bundle images together – even of different formats – and convert them all into
|
||||||
|
one PDF file. Note that the File Setup service will sort the images based on their file
|
||||||
|
name in ascending order. It is thus recommended to name them accordingly, for example:
|
||||||
|
page-01.png, page-02.jpg, page-03.tiff.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
Add a title and description to your job and select the File Setup version* you want to use.
|
||||||
|
After uploading the images and completing the File Setup job, the list of files added
|
||||||
|
can be seen under “Inputs.” Further below, under “Results,” you can find and download
|
||||||
|
the PDF output.</p>
|
||||||
|
|
||||||
|
<h5 id="converting-a-pdf-into-text">Converting a PDF into text data</h5>
|
||||||
|
<p>Select an image-to-text conversion tool depending on whether your PDF is primarily
|
||||||
|
composed of handwritten text or printed text. For printed text, select the <b>Tesseract OCR
|
||||||
|
Pipeline</b>. For handwritten text, select the <b>Transkribus HTR Pipeline</b>. Select the desired
|
||||||
|
language model or upload your own. Select the version* of Tesseract OCR you want to use
|
||||||
|
and click on submit to start the conversion. When the job is finished, various output
|
||||||
|
files can be seen and downloaded further below, under “Results.” You may want to review
|
||||||
|
the text output for errors and coherence. (Note: the Transkribus HTR Pipeline is currently
|
||||||
|
deactivated; we are working on an alternative solution. You can try using Tesseract OCR,
|
||||||
|
though the results will likely be poor.)
|
||||||
|
</p>
|
||||||
|
|
||||||
|
<h5 id="extracting-linguistic-data">Extracting linguistic data from text</h5>
|
||||||
|
<p>The <b>SpaCy NLP Pipeline</b> service extracts linguistic information from plain text files
|
||||||
|
(in .txt format). Select the corresponding .txt file, the language model, and the
|
||||||
|
version* you want to use. When the job is finished, find and download the files in
|
||||||
|
<b>.json</b> and <b>.vrt</b> format under “Results.”</p>
|
||||||
|
|
||||||
|
<h5 id="creating-a-corpus">Creating a corpus</h5>
|
||||||
|
<p>Now, using the files in .vrt format, you can create a corpus. This can be done
|
||||||
|
in the <a href="{{ url_for('main.dashboard') }}">Dashboard</a> or
|
||||||
|
<a href="{{ url_for('services.corpus_analysis') }}">Corpus Analysis</a> sections under “My Corpora.” Click on “Create corpus”
|
||||||
|
and add a title and description for your corpus. After submitting, you will automatically
|
||||||
|
be taken to the corpus overview page (which can be called up again via the corpus lists)
|
||||||
|
of your new, still empty corpus. </p>
|
||||||
|
<p>
|
||||||
|
Further down in the “Corpus files” section, you can add texts in .vrt format
|
||||||
|
(results of the NLP service) to your new corpus. To do this, use the "Add Corpus File"
|
||||||
|
button and fill in the form that appears. Here, you can add
|
||||||
|
metadata to each text. After adding all texts to the corpus, it must
|
||||||
|
be prepared for analysis. This process can be initiated by clicking on the
|
||||||
|
"Build" button under "Actions".
|
||||||
|
On the corpus overview page, you can see information about the current status of
|
||||||
|
the corpus in the upper right corner. After the build process, the status "built" should be shown here.
|
||||||
|
Now, your corpus is ready for analysis.</p>
|
||||||
|
|
||||||
|
<h5 id="analyzing-a-corpus">Analyzing a corpus</h5>
|
||||||
|
<p>Navigate to the corpus you would like to analyze and click on the Analyze button.
|
||||||
|
This will take you to an analysis overview page for your corpus. Here, you can find a
|
||||||
|
visualization of general linguistic information of your corpus, including tokens,
|
||||||
|
sentences, unique words, unique lemmas, unique parts of speech and unique simple parts
|
||||||
|
of speech. You will also find a pie chart of the proportional textual makeup of your
|
||||||
|
corpus and can view the linguistic information for each individual text file. A more
|
||||||
|
detailed visualization of token frequencies with a search option is also on this page.</p>
|
||||||
|
<p>From the corpus analysis overview page, you can navigate to other analysis modules:
|
||||||
|
the <b>Query Builder</b> (under <b>Concordance</b>) and the <b>Reader</b>. With the Reader, you can read
|
||||||
|
your corpus texts tokenized with the associated linguistic information. The tokens can
|
||||||
|
be shown as lemmas, parts of speech, words, and can be displayed in different ways:
|
||||||
|
visually as plain text with the option of highlighted entities or as chips.</p>
|
||||||
|
<p>The <b>Concordance</b> module allows for more specific, query-oriented text analyses.
|
||||||
|
Here, you can filter out text parameters and structural attributes in different
|
||||||
|
combinations. This is explained in more detail in the Query Builder section of the
|
||||||
|
manual.</p>
|
||||||
|
|
||||||
|
<br>
|
||||||
|
<br>
|
||||||
|
*For all services, it is recommended to use the latest version unless you need a model
|
||||||
|
only available in an earlier version or are looking to reproduce data that was originally generated
|
||||||
|
using an older version.
|
@ -1,18 +0,0 @@
|
|||||||
<h3 class="manual-chapter-title">Registration and Log in</h3>
|
|
||||||
<div class="row">
|
|
||||||
<div class="col s12 m4">
|
|
||||||
<img alt="Registration and Log in" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/registration-and-log-in.png') }}">
|
|
||||||
</div>
|
|
||||||
<div class="col s12 m8">
|
|
||||||
<p>
|
|
||||||
Before you can start using the web platform, you need to create a user
|
|
||||||
account. This requires only a few details: just a user name, an e-mail
|
|
||||||
address and a password are needed. In order to register yourself, fill out
|
|
||||||
the form on the <a href="{{ url_for('auth.register') }}">registration page</a>. After successful registration, the
|
|
||||||
created account must be verified. To do this, follow the instructions
|
|
||||||
given in the automatically sent e-mail. Afterwards, you can log in as
|
|
||||||
usual with your username/email address and password in the log-in form
|
|
||||||
located next to the registration button.
|
|
||||||
</p>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
@ -1,15 +1,22 @@
|
|||||||
<h3 class="manual-chapter-title">Dashboard</h3>
|
<h3 class="manual-chapter-title">Dashboard</h3>
|
||||||
|
<h4>About the dashboard</h4>
|
||||||
|
<br>
|
||||||
<div class="row">
|
<div class="row">
|
||||||
<div class="col s12 m4">
|
<div class="col s12 m4">
|
||||||
<img alt="Dashboard" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/dashboard.png') }}">
|
<img alt="Dashboard" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/dashboard.png') }}">
|
||||||
</div>
|
</div>
|
||||||
<div class="col s12 m8">
|
<div class="col s12 m8">
|
||||||
<p>
|
<p>
|
||||||
The <a href="{{ url_for('main.dashboard') }}">dashboard</a> provides a central overview of all resources assigned to the
|
The <a href="{{ url_for('main.dashboard') }}">dashboard</a> provides a central
|
||||||
user. These are <a href="{{ url_for('main.dashboard', _anchor='corpora') }}">corpora</a> and created <a href="{{ url_for('main.dashboard', _anchor='jobs') }}">jobs</a>. Corpora are freely composable
|
overview of all user-specific resources.
|
||||||
annotated text collections and jobs are the initiated file processing
|
These are <a href="{{ url_for('main.dashboard', _anchor='corpora') }}">corpora</a>,
|
||||||
procedures. One can search for jobs as well as corpus listings using
|
created <a href="{{ url_for('main.dashboard', _anchor='jobs') }}">jobs</a>, and
|
||||||
the search field displayed above them.
|
model <a href="{{ url_for('main.dashboard', _anchor='contributions') }}"">contributions</a>.
|
||||||
|
A <b>corpus</b> is a freely composable annotated text collection.
|
||||||
|
A <b>job</b> is an initiated file processing procedure.
|
||||||
|
A <b>model</b> is a mathematical system for pattern recognition based on data examples that have been processed by AI. One can search for jobs as
|
||||||
|
well as corpus listings using the search field displayed above them on the dashboard.
|
||||||
|
Uploaded models can be found and edited by clicking on the corresponding service under <b>My Contributions</b>.
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
<div class="col s12"> </div>
|
<div class="col s12"> </div>
|
||||||
|
@ -1,52 +1,107 @@
|
|||||||
<h3 class="manual-chapter-title">Services</h5>
|
<h3 class="manual-chapter-title">Services</h5>
|
||||||
|
<h4>Services</h4>
|
||||||
|
<p>
|
||||||
|
In this section, we will describe the different services nopaque has to offer.
|
||||||
|
</p>
|
||||||
|
|
||||||
<div class="row">
|
<div class="row">
|
||||||
<div class="col s12 m4">
|
<div class="col s12 m4">
|
||||||
<img alt="Services" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/services.png') }}">
|
<img alt="Services" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/services.png') }}">
|
||||||
</div>
|
</div>
|
||||||
<div class="col s12 m8">
|
<div class="col s12 m8">
|
||||||
<p>
|
<p>
|
||||||
nopaque was designed from the ground up to be modular. This modularity
|
Nopaque was designed to be modular. Its modules are implemented in
|
||||||
means that the offered workflow provides variable entry and exit points,
|
self-contained <b>services</b>, each of which represents a step in the
|
||||||
so that different starting points and goals can be flexibly addressed.
|
workflow. The typical workflow involves using services one after another,
|
||||||
Each of these modules are implemented in a self-contained service, each of
|
consecutively.
|
||||||
which represents a step in the workflow. The services are coordinated in
|
The typical workflow order can be taken from the listing of the
|
||||||
such a way that they can be used consecutively. The order can either be
|
services in the left sidebar or from the nopaque manual (accessible via the pink
|
||||||
taken from the listing of the services in the left sidebar or from the
|
button in the upper right corner).
|
||||||
roadmap (accessible via the pink compass in the upper right corner). All
|
The services can also be applied at different starting and ending points,
|
||||||
services are versioned, so the data generated with nopaque is always
|
which allows you to conduct your work flexibly.
|
||||||
|
All services are versioned, so the data generated with nopaque is always
|
||||||
reproducible.
|
reproducible.
|
||||||
|
|
||||||
|
<p>For all services, it is recommended to use the latest version (selected
|
||||||
|
in the drop-down menu on the service page) unless you need a model
|
||||||
|
only available in an earlier version or are looking to reproduce data that was originally generated
|
||||||
|
using an older version.</p>
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<h4 class="manual-chapter-title">File Setup</h4>
|
|
||||||
|
|
||||||
|
<h4>File Setup</h4>
|
||||||
<p>
|
<p>
|
||||||
The <a href="{{ url_for('services.file_setup_pipeline') }}">File Setup Service</a> bundles image data, such as scans and photos,
|
The <a href="{{ url_for('services.file_setup_pipeline') }}">File Setup Service</a> bundles image data, such as scans and photos,
|
||||||
together in a handy PDF file. To use this service, use the job form to
|
together in a handy PDF file. To use this service, use the job form to
|
||||||
select the images to be bundled, choose the desired service version, and
|
select the images to be bundled, choose the desired service version, and
|
||||||
specify a title and description. Please note that the service sorts the
|
specify a title and description.
|
||||||
images into the resulting PDF file based on the file names. So naming the
|
Note that the File Setup service will sort the images based on their file name in
|
||||||
images correctly is of great importance. It has proven to be a good practice
|
ascending order. It is thus important and highly recommended to name
|
||||||
to name the files according to the following scheme:
|
them accordingly, for example:
|
||||||
page-01.png, page-02.jpg, page-03.tiff, etc. In general, you can assume
|
page-01.png, page-02.jpg, page-03.tiff. Generally, you can assume
|
||||||
that the images will be sorted in the order in which the file explorer of
|
that the images will be sorted in the order in which the file explorer of
|
||||||
your operating system lists them when you view the files in a folder
|
your operating system lists them when you view the files in a folder
|
||||||
sorted in ascending order by file name.
|
sorted in ascending order by file name.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h4>Optical Character Recognition (OCR)</h4>
|
<h4>Optical Character Recognition (OCR)</h4>
|
||||||
<p>Coming soon...</p>
|
<p>
|
||||||
|
The <a href="{{ url_for('services.tesseract_ocr_pipeline') }}">Tesseract OCR Pipeline</a>
|
||||||
|
converts image data - like photos and scans - into text data, making them machine-readable.
|
||||||
|
This step enables you to proceed with the computational analysis of your documents.
|
||||||
|
To use this service, use the job form to select the file you want to convert into text data.
|
||||||
|
Then, choose the language model and service version you would like to use. Enter a title and description for your file and then
|
||||||
|
submit your job. Once the job is finished, the results can be found and downloaded further below on the page, under
|
||||||
|
the section labeled "Inputs."
|
||||||
|
|
||||||
|
</p>
|
||||||
|
|
||||||
<h4>Handwritten Text Recognition (HTR)</h4>
|
<h4>Handwritten Text Recognition (HTR)</h4>
|
||||||
<p>Coming soon...</p>
|
<p>The Transkribus HTR Pipeline is currently
|
||||||
|
deactivated. We are working on an alternative solution. In the meantime, you can
|
||||||
|
try using Tesseract OCR, though the results will likely be poor.</p>
|
||||||
|
|
||||||
<h4>Natural Language Processing (NLP)</h4>
|
<h4>Natural Language Processing (NLP)</h4>
|
||||||
<p>Coming soon...</p>
|
<p>The <a href="{{ url_for('services.spacy_nlp_pipeline') }}">SpaCy NLP Pipeline</a> extracts
|
||||||
|
information from plain text files (.txt format) via computational linguistic data processing
|
||||||
|
(tokenization, lemmatization, part-of-speech tagging and named-entity recognition).
|
||||||
|
To use this service, select the .txt file that you want to extract this information from.
|
||||||
|
Then select the language model and the version you want to use. Once the job is finished, you can find and download the files in
|
||||||
|
<b>.json</b> and <b>.vrt</b> format under the section labeled “Results.”</p>
|
||||||
|
|
||||||
<h4>Corpus Analysis</h4>
|
<h4>Corpus Analysis</h4>
|
||||||
<p>
|
<p>
|
||||||
With the corpus analysis service, it is possible to create a text corpus
|
With the <a href="{{ url_for('services.corpus_analysis') }}">Corpus Analysis</a>
|
||||||
and then explore it in an analysis session. The analysis session is realized
|
service, it is possible to create a text corpus
|
||||||
|
and then explore through it with analytical tools. The analysis session is realized
|
||||||
on the server side by the Open Corpus Workbench software, which enables
|
on the server side by the Open Corpus Workbench software, which enables
|
||||||
efficient and complex searches with the help of the CQP Query Language.
|
efficient and complex searches with the help of the CQP Query Language.</p>
|
||||||
|
<p>
|
||||||
|
To use this service, navigate to the corpus you would like to analyze and click on the Analyze button.
|
||||||
|
This will take you to an analysis overview page for your corpus. Here, you can find
|
||||||
|
a visualization of general linguistic information of your corpus, including tokens,
|
||||||
|
sentences, unique words, unique lemmas, unique parts of speech and unique simple
|
||||||
|
parts of speech. You will also find a pie chart of the proportional textual makeup
|
||||||
|
of your corpus and can view the linguistic information for each individual text file.
|
||||||
|
A more detailed visualization of token frequencies with a search option is also on
|
||||||
|
this page.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
From the corpus analysis overview page, you can navigate to other analysis modules:
|
||||||
|
the Query Builder (under Concordance) and the Reader.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
With the <b>Reader</b>, you can read your corpus texts tokenized with the associated linguistic information. The tokens
|
||||||
|
can be shown as lemmas, parts of speech, words, and can be displayed in different
|
||||||
|
ways: visually as plain text with the option of highlighted entities or as chips.
|
||||||
|
</p>
|
||||||
|
<p>
|
||||||
|
The Concordance module allows for more specific, query-oriented text analyses.
|
||||||
|
Here, you can filter out text parameters and structural attributes in different
|
||||||
|
combinations. This is explained in more detail in the <b>Query Builder</b> section of the
|
||||||
|
manual.
|
||||||
|
</p>
|
||||||
</p>
|
</p>
|
||||||
|
@ -1,5 +1,22 @@
|
|||||||
<h3 class="manual-chapter-title">CQP Query Language</h3>
|
<h3 class="manual-chapter-title">CQP Query Language</h3>
|
||||||
<p>Within the Corpus Query Language, a distinction is made between two types of annotations: positional attributes and structural attributes. Positional attributes refer to a token, e.g. the word "book" is assigned the part-of-speech tag "NN", the lemma "book" and the simplified part-of-speech tag "NOUN" within the token structure. Structural attributes refer to text structure-giving elements such as sentence and entity markup. For example, the markup of a sentence is represented in the background as follows:</p>
|
<h4 id="cqp-query-language">CQP Query Language</h4>
|
||||||
|
<p>In this section, we will provide some functional explanations of the properties of the Corpus Query Language. This includes
|
||||||
|
the types of linguistic attributes one can work with and how to use them in your query.</p>
|
||||||
|
|
||||||
|
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
|
||||||
|
<h5>Content</h5>
|
||||||
|
<ol style="list-style-type:disc">
|
||||||
|
<li><a href="#overview-annotations">Overview of annotation types</a></li>
|
||||||
|
<li><a href="#positional-attributes">Positional attributes</a></li>
|
||||||
|
<li><a href="#searching-positional-attributes">How to search for positional attributes</a></li>
|
||||||
|
<li><a href="#structural-attributes">Structural attributes</a></li>
|
||||||
|
<li><a href="#searching-structural-attributes">How to search for structural attributes</a></li>
|
||||||
|
|
||||||
|
</ol>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<h4 id="overview-annotations">Overview of annotation types</h4>
|
||||||
|
<p>Within the Corpus Query Language, a distinction is made between two types of annotations: <b>positional attributes</b> and <b>structural attributes</b>. Positional attributes refer to a token, e.g. the word "book" is assigned the part-of-speech tag "NN", the lemma "book" and the simplified part-of-speech tag "NOUN" within the token structure. Structural attributes refer to text structure-giving elements such as sentence and entity markup. For example, the markup of a sentence is represented in the background as follows:</p>
|
||||||
<pre>
|
<pre>
|
||||||
<code>
|
<code>
|
||||||
<span class="green-text"><s> structural attribute</span>
|
<span class="green-text"><s> structural attribute</span>
|
||||||
@ -13,7 +30,7 @@
|
|||||||
</code>
|
</code>
|
||||||
</pre>
|
</pre>
|
||||||
|
|
||||||
<h4>Positional attributes</h4>
|
<h4 id="positional-attributes">Positional attributes</h4>
|
||||||
<p>Before you can start searching for positional attributes (also called tokens), it is necessary to know what properties they contain.</p>
|
<p>Before you can start searching for positional attributes (also called tokens), it is necessary to know what properties they contain.</p>
|
||||||
<ol>
|
<ol>
|
||||||
<li><span class="blue-text"><b>word</b></span>: The string as it is also found in the original text</li>
|
<li><span class="blue-text"><b>word</b></span>: The string as it is also found in the original text</li>
|
||||||
@ -33,7 +50,7 @@
|
|||||||
</li>
|
</li>
|
||||||
</ol>
|
</ol>
|
||||||
|
|
||||||
<h5>Searching for positional attributes</h5>
|
<h5 id="searching-positional-attributes">How to search for positional attributes</h5>
|
||||||
<div>
|
<div>
|
||||||
<p>
|
<p>
|
||||||
<b>Token with no condition on any property (also called <span class="blue-text">wildcard token</span>)</b><br>
|
<b>Token with no condition on any property (also called <span class="blue-text">wildcard token</span>)</b><br>
|
||||||
@ -118,7 +135,7 @@
|
|||||||
<pre style="margin-top: 0;" ><code> ^ ^ the braces indicate the start and end of an option group</code></pre>
|
<pre style="margin-top: 0;" ><code> ^ ^ the braces indicate the start and end of an option group</code></pre>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<h4>Structural attributes</h4>
|
<h4 id="structural-attributes">Structural attributes</h4>
|
||||||
<p>nopaque provides several structural attributes for query. A distinction is made between attributes with and without value.</p>
|
<p>nopaque provides several structural attributes for query. A distinction is made between attributes with and without value.</p>
|
||||||
<ol>
|
<ol>
|
||||||
<li><span class="green-text"><b>s</b></span>: Annotates a sentence</li>
|
<li><span class="green-text"><b>s</b></span>: Annotates a sentence</li>
|
||||||
@ -153,7 +170,7 @@
|
|||||||
</li>
|
</li>
|
||||||
</ol>
|
</ol>
|
||||||
|
|
||||||
<h5>Searching for structural attributes</h5>
|
<h5 id="searching-structural-attributes">How to search for structural attributes</h5>
|
||||||
<pre><code><ent> [] </ent>; A one token long entity of any type</code></pre>
|
<pre><code><ent> [] </ent>; A one token long entity of any type</code></pre>
|
||||||
<pre><code><ent_type="PERSON"> [] </ent_type>; A one token long entity of type PERSON</code></pre>
|
<pre><code><ent_type="PERSON"> [] </ent_type>; A one token long entity of type PERSON</code></pre>
|
||||||
<pre><code><ent_type="PERSON"> []* </ent_type>; Entity of any length of type PERSON</code></pre>
|
<pre><code><ent_type="PERSON"> []* </ent_type>; Entity of any length of type PERSON</code></pre>
|
||||||
|
@ -1,27 +1,12 @@
|
|||||||
<h3 class="manual-chapter-title">Query Builder Tutorial</h3>
|
<h3 class="manual-chapter-title">Query Builder Tutorial</h3>
|
||||||
<h4>Overview</h4>
|
<h4>Query Builder</h4>
|
||||||
<p>The query builder can be accessed via "My Corpora" or "Corpus Analysis" in the sidebar options.
|
<p>In this section, we will provide you with more detailed instructions on how to use the Query Builder -
|
||||||
Select the desired corpus and click on the "Analyze" and then "Concordance"
|
nopaque's main user-friendly tool for finding and analyzing different linguistic elements of your texts.</p>
|
||||||
buttons to open the query builder.</p>
|
|
||||||
<p>The query builder uses the Corpus Query Language (CQL) to help you make a query for analyzing your texts.
|
|
||||||
In this way, it is possible to filter out various types of text parameters, for
|
|
||||||
example, a specific word, a lemma, or you can set part-of-speech
|
|
||||||
tags (pos) that indicate the type of word you are looking for (a noun, an
|
|
||||||
adjective, etc.). In addition, you can also search for structural attributes,
|
|
||||||
or specify your query for a token (word, lemma, pos) via entity typing. And of
|
|
||||||
course, the different text parameters can be combined.</p>
|
|
||||||
<p>Tokens and structural attributes can be added by clicking on the "+" button
|
|
||||||
(the "input marker") in the input field or the labeled buttons below it. Elements
|
|
||||||
added are shown as chips. These can be reorganized using drag and drop. The input
|
|
||||||
marker can also be moved in this way. Its position shows where new elements will be added. <br>
|
|
||||||
A "translation" of your query into Corpus Query Language (CQL) is shown below.</p>
|
|
||||||
<p>Advanced users can make direct use of the Corpus Query Language (CQL) by switching to "expert mode" via the toggle button.</p>
|
|
||||||
<p>The entire input field can be cleared using the red trash icon on the right.</p>
|
|
||||||
<br>
|
|
||||||
|
|
||||||
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
|
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
|
||||||
<h5>Content</h5>
|
<h5>Content</h5>
|
||||||
<ol style="list-style-type:disc">
|
<ol style="list-style-type:disc">
|
||||||
|
<li><a href="#general-overview">General Overview</a></li>
|
||||||
<li><a href="#add-new-token-tutorial">Add a new token to your query</a></li>
|
<li><a href="#add-new-token-tutorial">Add a new token to your query</a></li>
|
||||||
<li><a href="#edit-options-tutorial">Options for editing your query</a></li>
|
<li><a href="#edit-options-tutorial">Options for editing your query</a></li>
|
||||||
<li><a href="#add-structural-attribute-tutorial">Add structural attributes to your query</a></li>
|
<li><a href="#add-structural-attribute-tutorial">Add structural attributes to your query</a></li>
|
||||||
@ -29,6 +14,33 @@ A "translation" of your query into Corpus Query Language (CQL) is shown below.</
|
|||||||
</ol>
|
</ol>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
<h4 id="general-overview">General Overview</h4>
|
||||||
|
<p>The Query Builder can be accessed via <a href=" {{ url_for('main.dashboard') }}">My Corpora</a> or <a href=" {{ url_for('services.corpus_analysis') }}">Corpus Analysis</a> in the sidebar options.
|
||||||
|
Click on the corpus you wish to analyze. You will be sent to its corpus overview page.
|
||||||
|
Here, click on <b>Analyze</b> to reach the analysis page.
|
||||||
|
The analysis page features different options for analyzing your corpus, including
|
||||||
|
visualizations and a <b>Reader</b> module. In this case, we want to open the query builder.
|
||||||
|
To do so, click on the <b>Concordance</b> button on the top of the page.</p>
|
||||||
|
<p>The query builder uses the <b>Corpus Query Language (CQL)</b> to help you make a query for analyzing your texts.
|
||||||
|
In this way, it is possible to filter out various types of text parameters, for
|
||||||
|
example, a specific word, a lemma, or you can set part-of-speech
|
||||||
|
tags (pos) that indicate the type of word you are looking for (a noun, an
|
||||||
|
adjective, etc.). In addition, you can also search for structural attributes,
|
||||||
|
or specify your query for a token (word, lemma, pos) via entity typing. And of
|
||||||
|
course, the different text parameters can be combined.</p>
|
||||||
|
<p>Tokens and structural attributes can be added by clicking on the <b>"+"</b> button
|
||||||
|
(what we call the "input marker") in the input field or the labeled buttons below it. Elements
|
||||||
|
added are shown as chips. These can be reorganized using drag and drop. The input
|
||||||
|
marker can also be moved in this way. Its position shows where new elements will be added. <br>
|
||||||
|
A "translation" of your query into Corpus Query Language (CQL) will be displayed underneath the query field.</p>
|
||||||
|
<p>For more information, see our <b>manual section for the Corpus Query Language.</b>
|
||||||
|
<br>
|
||||||
|
Advanced users can make direct use of CQL by switching to <b>expert mode</b> via the toggle button.
|
||||||
|
</p>
|
||||||
|
<p>The entire input field can be cleared using the red trash icon on the right.</p>
|
||||||
|
<br>
|
||||||
|
|
||||||
|
|
||||||
{# Add Token Tutorial #}
|
{# Add Token Tutorial #}
|
||||||
<div>
|
<div>
|
||||||
<hr>
|
<hr>
|
||||||
@ -37,8 +49,8 @@ A "translation" of your query into Corpus Query Language (CQL) is shown below.</
|
|||||||
<h4 id="add-new-token-tutorial">Add new token to your Query</h4>
|
<h4 id="add-new-token-tutorial">Add new token to your Query</h4>
|
||||||
<p>If you are only looking for a specific token, you can click on the left
|
<p>If you are only looking for a specific token, you can click on the left
|
||||||
button and select the type of token you are looking for from the drop-down menu.
|
button and select the type of token you are looking for from the drop-down menu.
|
||||||
By default "Word" is selected. </p>
|
"Word" is selected by default. </p>
|
||||||
<br>
|
|
||||||
<h5>Word and Lemma</h5>
|
<h5>Word and Lemma</h5>
|
||||||
<p>If you want to search for a specific word or lemma and the respective
|
<p>If you want to search for a specific word or lemma and the respective
|
||||||
category is selected in the drop-down menu, you can type in the word or lemma
|
category is selected in the drop-down menu, you can type in the word or lemma
|
||||||
|
@ -3,21 +3,22 @@
|
|||||||
<h2>Manual</h2>
|
<h2>Manual</h2>
|
||||||
<ul class="tabs" id="manual-modal-toc">
|
<ul class="tabs" id="manual-modal-toc">
|
||||||
<li class="tab"><a href="#manual-modal-introduction">Introduction</a></li>
|
<li class="tab"><a href="#manual-modal-introduction">Introduction</a></li>
|
||||||
<li class="tab"><a href="#manual-modal-registration-and-log-in">Registration and Log in</a></li>
|
<li class="tab"><a href="#manual-modal-getting-started">Getting Started</a></li>
|
||||||
<li class="tab"><a href="#manual-modal-dashboard">Dashboard</a></li>
|
<li class="tab"><a href="#manual-modal-dashboard">Dashboard</a></li>
|
||||||
<li class="tab"><a href="#manual-modal-services">Services</a></li>
|
<li class="tab"><a href="#manual-modal-services">Services</a></li>
|
||||||
<li class="tab"><a href="#manual-modal-a-closer-look-at-the-corpus-analysis">A closer look at the Corpus Analysis</a></li>
|
<!-- <li class="tab"><a href="#manual-modal-a-closer-look-at-the-corpus-analysis">A closer look at the Corpus Analysis</a></li> -->
|
||||||
<li class="tab"><a href="#manual-modal-cqp-query-language">CQP Query Language</a></li>
|
|
||||||
<li class="tab"><a href="#manual-modal-query-builder">Query Builder</a></li>
|
<li class="tab"><a href="#manual-modal-query-builder">Query Builder</a></li>
|
||||||
|
<li class="tab"><a href="#manual-modal-cqp-query-language">CQP Query Language</a></li>
|
||||||
|
|
||||||
<li class="tab"><a href="#manual-modal-tagsets">Tagsets</a></li>
|
<li class="tab"><a href="#manual-modal-tagsets">Tagsets</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
<div id="manual-modal-introduction">
|
<div id="manual-modal-introduction">
|
||||||
<br>
|
<br>
|
||||||
{% include "_base/_modals/_manual/01_introduction.html.j2" %}
|
{% include "_base/_modals/_manual/01_introduction.html.j2" %}
|
||||||
</div>
|
</div>
|
||||||
<div id="manual-modal-registration-and-log-in">
|
<div id="manual-modal-getting-started">
|
||||||
<br>
|
<br>
|
||||||
{% include "_base/_modals/_manual/02_registration_and_log_in.html.j2" %}
|
{% include "_base/_modals/_manual/02_getting_started.html.j2" %}
|
||||||
</div>
|
</div>
|
||||||
<div id="manual-modal-dashboard">
|
<div id="manual-modal-dashboard">
|
||||||
<br>
|
<br>
|
||||||
@ -27,10 +28,10 @@
|
|||||||
<br>
|
<br>
|
||||||
{% include "_base/_modals/_manual/06_services.html.j2" %}
|
{% include "_base/_modals/_manual/06_services.html.j2" %}
|
||||||
</div>
|
</div>
|
||||||
<div id="manual-modal-a-closer-look-at-the-corpus-analysis">
|
<!-- <div id="manual-modal-a-closer-look-at-the-corpus-analysis">
|
||||||
<br>
|
<br>
|
||||||
{% include "_base/_modals/_manual/07_a_closer_look_at_the_corpus_analysis.html.j2" %}
|
{% include "_base/_modals/_manual/07_a_closer_look_at_the_corpus_analysis.html.j2" %}
|
||||||
</div>
|
</div> -->
|
||||||
<div id="manual-modal-cqp-query-language">
|
<div id="manual-modal-cqp-query-language">
|
||||||
<br>
|
<br>
|
||||||
{% include "_base/_modals/_manual/08_cqp_query_language.html.j2" %}
|
{% include "_base/_modals/_manual/08_cqp_query_language.html.j2" %}
|
||||||
|
Reference in New Issue
Block a user