April 2022 update
Dear users
with the April 2022 update we have improved nopaque in all places.
We have significantly reworked our backend code to utilize our servers more efficiently,
integrated a new service, updated all previously existing ones, rewrote a lot of code and made a few minor design improvements.
Where is my Job data?
At the beginning of the year, we realized that our storage limit had been reached.
This was the time when some users may have noticed system instabilities.
We were fortunately able to temporarily solve this problem without data loss
by deleting some non-nopaque related data on our system (yes we also do other things then nopaque).
In order to not face the same problem again, we had to dedicate ourselves to a long-term solution.
This consists of deleting all previous job data with this update and henceforth storing new job data
only for three months after job creation (important note: corpora are not affected).
All job data prior to this update has been backed up for you,
feel free to contact us at nopaque@uni-bielefeld.de if you would like to get this data back.
What's new?
By partnering up with Transkribus we reached one of our long term goals: integrate a HTR service into nopaque.
The Transkribus HTR Pipeline service is implemented as a kind of proxied service where the work is split between Transkribus and us.
That means we do the preprocessing, storage and postprocessing, while Transkribus handles the HTR itself.
One of the changes in the background was to fix our performance issues. While implementing the Transkribus HTR Pipeline service we
found some optimization potential within different steps of our processing routine. These optimizations are now also
available in our Tesseract OCR Pipeline service, resulting in a speed up of about 4x.
For now we are done with the most obvious optimizations but we may include more in the near future, so stay tuned!
The next step was to reorganize our Corpus Analysis code. Unfortunatly it was a bit messy, after a complete rewrite we are
now able to query a corpus without long loading times and with better error handling, resulting in way more stable user experience.
The Corpus Analysis service is now modularized and comes with 2 modules that recreate and extend the functionality of the old service.
For now we had to disable the Query Result viewer, the code was based on the old Corpus Analysis service and will be reintegrated as a module to the Corpus Analysis.
The spaCy NLP Pipeline service got some love in the form of smaller updates too.
This is important preliminary work to support more models/languages that does not provide the full set of linguistic features (lemma, ner, pos, simple_pos). It still needs some testing and tweaking but will be ready soon!
Last but not least we made some design changes. Now you can find colors in places where we had just black and white before.
Nothing big but the new colors will help you identify ressources more efficient!
Database cleanup
We may be a bit late with our spring cleaning but with this update we tidied up within our database system.
This means we deleted old corpora with no corpus files, unconfirmed user accounts and in general unnecessary data fields.
That's it, thank you for using nopaque! We hope you like the update and appreciate all your past and future feedback.