updates and restructuring

This commit is contained in:
Gloria Glinphratum 2024-03-26 15:29:26 +01:00
parent 4425d50140
commit 5a2723b617
5 changed files with 77 additions and 35 deletions

View File

@ -1,6 +1,8 @@
<h3 class="manual-chapter-title">Getting Started</h3>
<h4>Getting Started</h4>
<br>
<p>
In this section, we will take you through all the steps you need to start analyzing your data with nopaque.
</p>
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
<h5>Content</h5>
@ -21,6 +23,7 @@
Open the menu (three dots) at the top right of the screen and choose “Register.” Enter
the required details listed on the registration page (username, password, email address).
After verifying your account via the link sent to your email, you can log in.</p>
<h5 id="preparing-files">Preparing files for analysis</h5>
<p>A few steps need to be taken before images, scans, or other text data are ready for
analysis in nopaque. The SpaCy NLP Pipeline service can only extract linguistic data
@ -39,6 +42,7 @@ Add a title and description to your job and select the File Setup version* you w
After uploading the images and completing the File Setup job, the list of files added
can be seen under “Inputs.” Further below, under “Results,” you can find and download
the PDF output.</p>
<h5 id="converting-a-pdf-into-text">Converting a PDF into text data</h5>
<p>Select an image-to-text conversion tool depending on whether your PDF is primarily
composed of handwritten text or printed text. For printed text, select the <b>Tesseract OCR
@ -50,11 +54,13 @@ the text output for errors and coherence. (Note: the Transkribus HTR Pipeline is
deactivated; we are working on an alternative solution. You can try using Tesseract OCR,
though the results will likely be poor.)
</p>
<h5 id="extracting-linguistic-data">Extracting linguistic data from text</h5>
<p>The <b>SpaCy NLP Pipeline</b> service extracts linguistic information from plain text files
(in .txt format). Select the corresponding .txt file, the language model, and the
version* you want to use. When the job is finished, find and download the files in
<b>.json</b> and <b>.vrt</b> format under “Results.”</p>
<h5 id="creating-a-corpus">Creating a corpus</h5>
<p>Now, using the files in .vrt format, you can create a corpus. This can be done
in the <a href="{{ url_for('main.dashboard') }}">Dashboard</a> or
@ -72,6 +78,7 @@ be prepared for analysis. This process can be initiated by clicking on the
On the corpus overview page, you can see information about the current status of
the corpus in the upper right corner. After the build process, the status "built" should be shown here.
Now, your corpus is ready for analysis.</p>
<h5 id="analyzing-a-corpus">Analyzing a corpus</h5>
<p>Navigate to the corpus you would like to analyze and click on the Analyze button.
This will take you to an analysis overview page for your corpus. Here, you can find a

View File

@ -1,6 +1,9 @@
<h3 class="manual-chapter-title">Services</h5>
<h4>Services</h4>
<br>
<p>
In this section, we will describe the different services nopaque has to offer.
</p>
<div class="row">
<div class="col s12 m4">
<img alt="Services" class="materialboxed responsive-img" src="{{ url_for('static', filename='images/manual/services.png') }}">
@ -87,15 +90,17 @@ version you want to use. When the job is finished, find and download the files i
</p>
<p>
From the corpus analysis overview page, you can navigate to other analysis modules:
the Query Builder (under Concordance) and the Reader. With the Reader, you can read
your corpus texts tokenized with the associated linguistic information. The tokens
the Query Builder (under Concordance) and the Reader.
</p>
<p>
With the <b>Reader</b>, you can read your corpus texts tokenized with the associated linguistic information. The tokens
can be shown as lemmas, parts of speech, words, and can be displayed in different
ways: visually as plain text with the option of highlighted entities or as chips.
</p>
<p>
The Concordance module allows for more specific, query-oriented text analyses.
Here, you can filter out text parameters and structural attributes in different
combinations. This is explained in more detail in the Query Builder section of the
combinations. This is explained in more detail in the <b>Query Builder</b> section of the
manual.
</p>
</p>

View File

@ -1,5 +1,22 @@
<h3 class="manual-chapter-title">CQP Query Language</h3>
<p>Within the Corpus Query Language, a distinction is made between two types of annotations: positional attributes and structural attributes. Positional attributes refer to a token, e.g. the word "book" is assigned the part-of-speech tag "NN", the lemma "book" and the simplified part-of-speech tag "NOUN" within the token structure. Structural attributes refer to text structure-giving elements such as sentence and entity markup. For example, the markup of a sentence is represented in the background as follows:</p>
<h4 id="cqp-query-language">CQP Query Language</h4>
<p>In this section, we will provide some functional explanations of the properties of the Corpus Query Language. This includes
the types of linguistic attributes one can work with and how to use them in your query.</p>
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
<h5>Content</h5>
<ol style="list-style-type:disc">
<li><a href="#overview-annotations">Overview of annotation types</a></li>
<li><a href="#positional-attributes">Positional attributes</a></li>
<li><a href="#searching-positional-attributes">How to search for positional attributes</a></li>
<li><a href="#structural-attributes">Structural attributes</a></li>
<li><a href="#searching-structural-attributes">How to search for structural attributes</a></li>
</ol>
</div>
<h4 id="overview-annotations">Overview of annotation types</h4>
<p>Within the Corpus Query Language, a distinction is made between two types of annotations: <b>positional attributes</b> and <b>structural attributes</b>. Positional attributes refer to a token, e.g. the word "book" is assigned the part-of-speech tag "NN", the lemma "book" and the simplified part-of-speech tag "NOUN" within the token structure. Structural attributes refer to text structure-giving elements such as sentence and entity markup. For example, the markup of a sentence is represented in the background as follows:</p>
<pre>
<code>
<span class="green-text">&lt;s&gt; structural attribute</span>
@ -13,7 +30,7 @@
</code>
</pre>
<h4>Positional attributes</h4>
<h4 id="positional-attributes">Positional attributes</h4>
<p>Before you can start searching for positional attributes (also called tokens), it is necessary to know what properties they contain.</p>
<ol>
<li><span class="blue-text"><b>word</b></span>: The string as it is also found in the original text</li>
@ -33,7 +50,7 @@
</li>
</ol>
<h5>Searching for positional attributes</h5>
<h5 id="searching-positional-attributes">How to search for positional attributes</h5>
<div>
<p>
<b>Token with no condition on any property (also called <span class="blue-text">wildcard token</span>)</b><br>
@ -118,7 +135,7 @@
<pre style="margin-top: 0;" ><code> ^ ^ the braces indicate the start and end of an option group</code></pre>
</div>
<h4>Structural attributes</h4>
<h4 id="structural-attributes">Structural attributes</h4>
<p>nopaque provides several structural attributes for query. A distinction is made between attributes with and without value.</p>
<ol>
<li><span class="green-text"><b>s</b></span>: Annotates a sentence</li>
@ -153,7 +170,7 @@
</li>
</ol>
<h5>Searching for structural attributes</h5>
<h5 id="searching-structural-attributes">How to search for structural attributes</h5>
<pre><code>&lt;ent&gt; [] &lt;/ent&gt;; A one token long entity of any type</code></pre>
<pre><code>&lt;ent_type="PERSON"&gt; [] &lt;/ent_type&gt;; A one token long entity of type PERSON</code></pre>
<pre><code>&lt;ent_type="PERSON"&gt; []* &lt;/ent_type&gt;; Entity of any length of type PERSON</code></pre>

View File

@ -1,27 +1,12 @@
<h3 class="manual-chapter-title">Query Builder Tutorial</h3>
<h4>Overview</h4>
<p>The query builder can be accessed via "My Corpora" or "Corpus Analysis" in the sidebar options.
Select the desired corpus and click on the "Analyze" and then "Concordance"
buttons to open the query builder.</p>
<p>The query builder uses the Corpus Query Language (CQL) to help you make a query for analyzing your texts.
In this way, it is possible to filter out various types of text parameters, for
example, a specific word, a lemma, or you can set part-of-speech
tags (pos) that indicate the type of word you are looking for (a noun, an
adjective, etc.). In addition, you can also search for structural attributes,
or specify your query for a token (word, lemma, pos) via entity typing. And of
course, the different text parameters can be combined.</p>
<p>Tokens and structural attributes can be added by clicking on the "+" button
(the "input marker") in the input field or the labeled buttons below it. Elements
added are shown as chips. These can be reorganized using drag and drop. The input
marker can also be moved in this way. Its position shows where new elements will be added. <br>
A "translation" of your query into Corpus Query Language (CQL) is shown below.</p>
<p>Advanced users can make direct use of the Corpus Query Language (CQL) by switching to "expert mode" via the toggle button.</p>
<p>The entire input field can be cleared using the red trash icon on the right.</p>
<br>
<h4>Query Builder</h4>
<p>In this section, we will provide you with more detailed instructions on how to use the Query Builder -
nopaque's main user-friendly tool for finding and analyzing different linguistic elements of your texts.</p>
<div style="border: 1px solid; padding-left: 20px; margin-right: 400px; margin-bottom: 40px;">
<h5>Content</h5>
<ol style="list-style-type:disc">
<li><a href="#general-overview">General Overview</a></li>
<li><a href="#add-new-token-tutorial">Add a new token to your query</a></li>
<li><a href="#edit-options-tutorial">Options for editing your query</a></li>
<li><a href="#add-structural-attribute-tutorial">Add structural attributes to your query</a></li>
@ -29,6 +14,33 @@ A "translation" of your query into Corpus Query Language (CQL) is shown below.</
</ol>
</div>
<h4 id="general-overview">General Overview</h4>
<p>The Query Builder can be accessed via <a href=" {{ url_for('main.dashboard') }}">My Corpora</a> or <a href=" {{ url_for('services.corpus_analysis') }}">Corpus Analysis</a> in the sidebar options.
Click on the corpus you wish to analyze. You will be sent to its corpus overview page.
Here, click on <b>Analyze</b> to reach the analysis page.
The analysis page features different options for analyzing your corpus, including
visualizations and a <b>Reader</b> module. In this case, we want to open the query builder.
To do so, click on the <b>Concordance</b> button on the top of the page.</p>
<p>The query builder uses the <b>Corpus Query Language (CQL)</b> to help you make a query for analyzing your texts.
In this way, it is possible to filter out various types of text parameters, for
example, a specific word, a lemma, or you can set part-of-speech
tags (pos) that indicate the type of word you are looking for (a noun, an
adjective, etc.). In addition, you can also search for structural attributes,
or specify your query for a token (word, lemma, pos) via entity typing. And of
course, the different text parameters can be combined.</p>
<p>Tokens and structural attributes can be added by clicking on the <b>"+"</b> button
(what we call the "input marker") in the input field or the labeled buttons below it. Elements
added are shown as chips. These can be reorganized using drag and drop. The input
marker can also be moved in this way. Its position shows where new elements will be added. <br>
A "translation" of your query into Corpus Query Language (CQL) will be displayed underneath the query field.</p>
<p>For more information, see our <b>manual section for the Corpus Query Language.</b>
<br>
Advanced users can make direct use of CQL by switching to <b>expert mode</b> via the toggle button.
</p>
<p>The entire input field can be cleared using the red trash icon on the right.</p>
<br>
{# Add Token Tutorial #}
<div>
<hr>
@ -37,8 +49,8 @@ A "translation" of your query into Corpus Query Language (CQL) is shown below.</
<h4 id="add-new-token-tutorial">Add new token to your Query</h4>
<p>If you are only looking for a specific token, you can click on the left
button and select the type of token you are looking for from the drop-down menu.
By default "Word" is selected. </p>
<br>
"Word" is selected by default. </p>
<h5>Word and Lemma</h5>
<p>If you want to search for a specific word or lemma and the respective
category is selected in the drop-down menu, you can type in the word or lemma

View File

@ -6,9 +6,10 @@
<li class="tab"><a href="#manual-modal-getting-started">Getting Started</a></li>
<li class="tab"><a href="#manual-modal-dashboard">Dashboard</a></li>
<li class="tab"><a href="#manual-modal-services">Services</a></li>
<li class="tab"><a href="#manual-modal-a-closer-look-at-the-corpus-analysis">A closer look at the Corpus Analysis</a></li>
<li class="tab"><a href="#manual-modal-cqp-query-language">CQP Query Language</a></li>
<!-- <li class="tab"><a href="#manual-modal-a-closer-look-at-the-corpus-analysis">A closer look at the Corpus Analysis</a></li> -->
<li class="tab"><a href="#manual-modal-query-builder">Query Builder</a></li>
<li class="tab"><a href="#manual-modal-cqp-query-language">CQP Query Language</a></li>
<li class="tab"><a href="#manual-modal-tagsets">Tagsets</a></li>
</ul>
<div id="manual-modal-introduction">
@ -27,10 +28,10 @@
<br>
{% include "_base/_modals/_manual/06_services.html.j2" %}
</div>
<div id="manual-modal-a-closer-look-at-the-corpus-analysis">
<!-- <div id="manual-modal-a-closer-look-at-the-corpus-analysis">
<br>
{% include "_base/_modals/_manual/07_a_closer_look_at_the_corpus_analysis.html.j2" %}
</div>
</div> -->
<div id="manual-modal-cqp-query-language">
<br>
{% include "_base/_modals/_manual/08_cqp_query_language.html.j2" %}