manual sections 01, 02, 06

2026-06-06 13:35:44 +00:00 · 2024-03-14 17:07:53 +01:00
parent a53f1d216b
commit 39113a6f17
3 changed files with 58 additions and 23 deletions
@@ -11,7 +11,7 @@
  <li><b>Image-to-text conversion tools:</b></li>
    <ol style="list-style-type:circle; margin-left:1em; padding-bottom:0;"><li><b>Optical Character Recognition</b> converts photos and 
    scans into text data, making them machine-readable.</li>
-    <li><b>Transkribus HTR (Handwritten Text Recognition) Pipeline</b> 
+    <li><b>Transkribus HTR (Handwritten Text Recognition) Pipeline (currently deactivated)* </b> 
    also converts images into text data, making them machine-readable.</li>
    </ol>
  <li><b>Natural Language Processing</b> extracts information from your text via 
@@ -23,5 +23,12 @@

 Nopaque also features a <b>Social Area</b>, where researchers can create a personal profile, connect with other users and share corpora if desired.
 These services can be accessed from the sidebar in nopaque.
-All processes are implemented in a specially provided cloud environment with established open-source software. This always ensures that no personal data of the users is disclosed.
+All processes are implemented in a specially provided cloud environment with established open-source software. 
+This always ensures that no personal data of the users is disclosed.
+<p>
+*Note: the Transkribus HTR Pipeline is currently 
+deactivated; we are working on an alternative solution. You can try using Tesseract OCR, 
+though the results will likely be poor.
+</p>
+

@@ -35,6 +35,7 @@ name in ascending order. It is thus recommended to name them accordingly, for ex
 page-01.png, page-02.jpg, page-03.tiff.
 </p>
 <p>
+Add a title and description to your job and select the File Setup version* you want to use.
 After uploading the images and completing the File Setup job, the list of files added 
 can be seen under “Inputs.” Further below, under “Results,” you can find and download 
 the PDF output.</p>
@@ -42,14 +43,17 @@ the PDF output.</p>
 <p>Select an image-to-text conversion tool depending on whether your PDF is primarily 
 composed of handwritten text or printed text. For printed text, select the <b>Tesseract OCR 
 Pipeline</b>. For handwritten text, select the <b>Transkribus HTR Pipeline</b>. Select the desired 
-language model or upload your own. Select the version of Tesseract OCR you want to use 
+language model or upload your own. Select the version* of Tesseract OCR you want to use 
 and click on submit to start the conversion. When the job is finished, various output 
 files can be seen and downloaded further below, under “Results.” You may want to review 
-the text output for errors and coherence.</p>
+the text output for errors and coherence. (Note: the Transkribus HTR Pipeline is currently 
+deactivated; we are working on an alternative solution. You can try using Tesseract OCR, 
+though the results will likely be poor.)
+</p>
 <h5 id="extracting-linguistic-data">Extracting linguistic data from text</h5>
 <p>The <b>SpaCy NLP Pipeline</b> service extracts linguistic information from plain text files 
 (in .txt format). Select the corresponding .txt file, the language model, and the 
-version you want to use. When the job is finished, find and download the files in 
+version* you want to use. When the job is finished, find and download the files in 
 <b>.json</b> and <b>.vrt</b> format under “Results.”</p>
 <h5 id="creating-a-corpus">Creating a corpus</h5>
 <p>Now, using the files in .vrt format, you can create a corpus. This can be done 
@@ -74,3 +78,9 @@ visually as plain text with the option of highlighted entities or as chips.</p>
 Here, you can filter out text parameters and structural attributes in different 
 combinations. This is explained in more detail in the Query Builder section of the 
 manual.</p>
+
+<br>
+<br>
+*For all services, it is recommended to use the latest version unless you need a model 
+only available in an earlier version or are looking to reproduce data that was originally generated 
+using an older version.
@@ -7,40 +7,58 @@
  </div>
  <div class="col s12 m8">
    <p>
-      Nopaque was designed to be modular. Its workflow consists of a sequence 
-      of services that can be applied at different starting and ending points. 
-      This allows you to proceed with your work flexibly.
-      Each of these modules are implemented in a self-contained service, each of
-      which represents a step in the workflow. The services are coordinated in
-      such a way that they can be used consecutively. The order can either be
-      taken from the listing of the services in the left sidebar or from the
-      roadmap (accessible via the pink compass in the upper right corner). All
-      services are versioned, so the data generated with nopaque is always
+      Nopaque was designed to be modular. Its modules are implemented in 
+      self-contained <b>services</b>, each of which represents a step in the 
+      workflow. The typical workflow involves using services one after another, 
+      consecutively.
+      The typical workflow order can be taken from the listing of the 
+      services in the left sidebar or from the nopaque manual (accessible via the pink 
+      button in the upper right corner). 
+      The services can also be applied at different starting and ending points, 
+      which allows you to conduct your work flexibly.
+      All services are versioned, so the data generated with nopaque is always
      reproducible.
+      
+      <p>For all services, it is recommended to use the latest version (selected 
+      in the drop-down menu on the service page) unless you need a model 
+      only available in an earlier version or are looking to reproduce data that was originally generated 
+      using an older version.</p>
    </p>
  </div>
 </div>

-<h4 class="manual-chapter-title">File Setup</h4>
+
+
+<h4>File Setup</h4>
 <p>
  The <a href="{{ url_for('services.file_setup_pipeline') }}">File Setup Service</a> bundles image data, such as scans and photos,
  together in a handy PDF file. To use this service, use the job form to
  select the images to be bundled, choose the desired service version, and
-  specify a title and description. Please note that the service sorts the
-  images into the resulting PDF file based on the file names. So naming the
-  images correctly is of great importance. It has proven to be a good practice
-  to name the files according to the following scheme:
-  page-01.png, page-02.jpg, page-03.tiff, etc. In general, you can assume
+  specify a title and description.
+  Note that the File Setup service will sort the images based on their file name in 
+  ascending order. It is thus important and highly recommended to name 
+  them accordingly, for example: 
+  page-01.png, page-02.jpg, page-03.tiff. Generally, you can assume
  that the images will be sorted in the order in which the file explorer of
  your operating system lists them when you view the files in a folder
  sorted in ascending order by file name.
 </p>

 <h4>Optical Character Recognition (OCR)</h4>
-<p>Coming soon...</p>
+<p>
+  The <a href="{{ url_for('services.tesseract_ocr_pipeline') }}">Tesseract OCR Pipeline</a> 
+  converts image data - like photos and scans - into text data, making them machine-readable. 
+  This step enables you to proceed with the computational analysis of your documents. 
+  To use this service, use the job form to select the file you want to convert, choose 
+  the desired language model and service version, enter the title and description, and 
+  submit your job. The results can be found and downloaded below, under "Inputs."
+
+</p>

 <h4>Handwritten Text Recognition (HTR)</h4>
-<p>Coming soon...</p>
+<p>The Transkribus HTR Pipeline is currently 
+deactivated. We are working on an alternative solution. In the meantime, you can 
+try using Tesseract OCR, though the results will likely be poor.</p>

 <h4>Natural Language Processing (NLP)</h4>
 <p>Coming soon...</p>
@@ -48,7 +66,7 @@
 <h4>Corpus Analysis</h4>
 <p>
  With the corpus analysis service, it is possible to create a text corpus
-  and then explore it in an analysis session. The analysis session is realized
+  and then explore through it with analytical tools. The analysis session is realized
  on the server side by the Open Corpus Workbench software, which enables
  efficient and complex searches with the help of the CQP Query Language.
 </p>