From ffd7a3ad9158efcae75ed73fa07de7dbd413ef23 Mon Sep 17 00:00:00 2001 From: Gloria Glinphratum Date: Tue, 5 Mar 2024 15:41:17 +0100 Subject: [PATCH] =?UTF-8?q?Manual=20Erg=C3=A4nzungen=20Intro=20/Getting=20?= =?UTF-8?q?Started?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../_modals/_manual/01_introduction.html.j2 | 30 ++++-- .../02_registration_and_log_in.html.j2 | 92 +++++++++++++++---- .../_modals/_manual/03_dashboard.html.j2 | 15 ++- 3 files changed, 109 insertions(+), 28 deletions(-) diff --git a/app/templates/_base/_modals/_manual/01_introduction.html.j2 b/app/templates/_base/_modals/_manual/01_introduction.html.j2 index 0b1d9ad7..8e8b7b7b 100644 --- a/app/templates/_base/_modals/_manual/01_introduction.html.j2 +++ b/app/templates/_base/_modals/_manual/01_introduction.html.j2 @@ -1,9 +1,27 @@

Introduction

+

Introduction

- nopaque is a web-based digital working environment. It implements a - workflow based on the research process in the humanities and supports its - users in processing their data in order to subsequently apply digital - analysis methods to them. All processes are implemented in a specially - provided cloud environment with established open source software. This - always ensures that no personal data of the users is disclosed. + Nopaque is a web application that offers different services and tools to support + researchers working with image and text-based data. These services are logically + connected and build upon each other. They include:

+
    +
  1. File setup, which converts and merges different data (e.g., books, letters) + for further processing.
  2. +
  3. Image-to-text conversion tools:
  4. +
    1. Optical Character Recognition converts photos and + scans into text data, making them machine-readable.
    2. +
    3. Transkribus HTR (Handwritten Text Recognition) Pipeline + also converts images into text data, making them machine-readable.
    4. +
    +
  5. Natural Language Processing extracts information from your text via + computational linguistic data processing (tokenization, lemmatization, part-of-speech + tagging and named-entity recognition.
  6. +
  7. Corpus analysis makes use of CQP Query Language to search through text + corpora with the aid of metadata and Natural Language Processing tags.
  8. +
+ +Nopaque also features a Social Area, where researchers can create a personal profile, connect with other users and share corpora if desired. +These services can be accessed from the sidebar in nopaque. +All processes are implemented in a specially provided cloud environment with established open-source software. This always ensures that no personal data of the users is disclosed. + diff --git a/app/templates/_base/_modals/_manual/02_registration_and_log_in.html.j2 b/app/templates/_base/_modals/_manual/02_registration_and_log_in.html.j2 index 5f05c543..88be162f 100644 --- a/app/templates/_base/_modals/_manual/02_registration_and_log_in.html.j2 +++ b/app/templates/_base/_modals/_manual/02_registration_and_log_in.html.j2 @@ -1,18 +1,76 @@ -

Registration and Log in

-
-
- Registration and Log in -
-
-

- Before you can start using the web platform, you need to create a user - account. This requires only a few details: just a user name, an e-mail - address and a password are needed. In order to register yourself, fill out - the form on the registration page. After successful registration, the - created account must be verified. To do this, follow the instructions - given in the automatically sent e-mail. Afterwards, you can log in as - usual with your username/email address and password in the log-in form - located next to the registration button. -

-
+

Getting Started

+

Getting Started

+
+ +
+
Content
+
    +
  1. Registration and login
  2. +
  3. Preparing files for analysis
  4. +
  5. Converting a PDF into text data
  6. +
  7. Extracting linguistic data from text
  8. +
  9. Creating a corpus
  10. +
  11. Analyzing a corpus
  12. +
+ +

+ +
Registration and login
+

Before you can begin using nopaque, you will need to create a personal user account. +Open the menu (three dots) at the top right of the screen and choose “Register.” Enter +the required details listed on the registration page (username, password, email address). +After verifying your account via the link sent to your email, you can log in.

+
Preparing files for analysis
+

A few steps need to be taken before images, scans, or other text data are ready for +analysis in nopaque. The SpaCy NLP Pipeline service can only extract linguistic data +from texts in plain text (.txt) format. If your text is already in this format, you +can skip the next steps and go directly to Extracting linguistic data from text. +Otherwise, the next steps assume that you are starting off with image data.

+

+First, all data needs to be converted into PDF format. Using the File Setup service, +you can bundle images together – even of different formats – and convert them all into +one PDF file. Note that the File Setup service will sort the images based on their file +name in ascending order. It is thus recommended to name them accordingly, for example: +page-01.png, page-02.jpg, page-03.tiff. +

+

+After uploading the images and completing the File Setup job, the list of files added +can be seen under “Inputs.” Further below, under “Results,” you can find and download +the PDF output.

+
Converting a PDF into text data
+

Select an image-to-text conversion tool depending on whether your PDF is primarily +composed of handwritten text or printed text. For printed text, select the Tesseract OCR +Pipeline. For handwritten text, select the Transkribus HTR Pipeline. Select the desired +language model or upload your own. Select the version of Tesseract OCR you want to use +and click on submit to start the conversion. When the job is finished, various output +files can be seen and downloaded further below, under “Results.” You may want to review +the text output for errors and coherence.

+
Extracting linguistic data from text
+

The SpaCy NLP Pipeline service extracts linguistic information from plain text files +(in .txt format). Select the corresponding .txt file, the language model, and the +version you want to use. When the job is finished, find and download the files in +.json and .vrt format under “Results.”

+
Creating a corpus
+

Now, using the files in .vrt format, you can create a corpus. This can be done +in the Dashboard or Corpus Analysis under “My Corpora.” Click on “Create corpus” +and add a title and description for your corpus. After submitting, navigate down to +the “Corpus files” section. Once you have added the desired .vrt files, select “Build” +on the corpus page under “Actions.” Now, your corpus is ready for analysis.

+
Analyzing a corpus
+

Navigate to the corpus you would like to analyze and click on the Analyze button. +This will take you to an analysis overview page for your corpus. Here, you can find a +visualization of general linguistic information of your corpus, including tokens, +sentences, unique words, unique lemmas, unique parts of speech and unique simple parts +of speech. You will also find a pie chart of the proportional textual makeup of your +corpus and can view the linguistic information for each individual text file. A more +detailed visualization of token frequencies with a search option is also on this page.

+

From the corpus analysis overview page, you can navigate to other analysis modules: +the Query Builder (under Concordance) and the Reader. With the Reader, you can read +your corpus texts tokenized with the associated linguistic information. The tokens can +be shown as lemmas, parts of speech, words, and can be displayed in different ways: +visually as plain text with the option of highlighted entities or as chips.

+

The Concordance module allows for more specific, query-oriented text analyses. +Here, you can filter out text parameters and structural attributes in different +combinations. This is explained in more detail in the Query Builder section of the +manual.

diff --git a/app/templates/_base/_modals/_manual/03_dashboard.html.j2 b/app/templates/_base/_modals/_manual/03_dashboard.html.j2 index 51d772a3..c1a9f33a 100644 --- a/app/templates/_base/_modals/_manual/03_dashboard.html.j2 +++ b/app/templates/_base/_modals/_manual/03_dashboard.html.j2 @@ -1,15 +1,20 @@

Dashboard

+

About the dashboard

+
Dashboard

- The dashboard provides a central overview of all resources assigned to the - user. These are corpora and created jobs. Corpora are freely composable - annotated text collections and jobs are the initiated file processing - procedures. One can search for jobs as well as corpus listings using - the search field displayed above them. + The dashboard provides a central + overview of all user-specific resources. + These are corpora, + created jobs, and + model contributions. + A corpus is a freely composable annotated text collection. + A job is an initiated file processing procedure. One can search for jobs as + well as corpus listings using the search field displayed above them on the dashboard.