# Natural language processing This repository provides all code that is needed to build a container image for natural language processing utilizing [spaCy](https://spacy.io). ## Build image 1. Clone this repository and navigate into it: ``` git clone https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git && cd nlp ``` 2. Build image: ``` docker build -t sfb1288inf/nlp:latest . ``` Alternatively build from the GitLab repository without cloning: 1. Build image: ``` docker build -t sfb1288inf/nlp:latest https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git ``` ## Download prebuilt image The GitLab registry provides a prebuilt image. It is automatically created, utilizing the conquaire build servers. 1. Download image: ``` docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp:latest ``` ## Run 1. Create input and output directories for the NLP software: ``` mkdir -p //files_for_nlp //files_from_nlp ``` 2. Place your text files inside the `//files_for_nlp` directory. Files should all contain text of the same language. 3. Start the NLP process. ``` docker run \ --rm \ -it \ -v //files_for_nlp:/files_for_nlp \ -v //files_from_nlp:/files_from_nlp \ sfb1288inf/nlp:latest \ -i /files_for_nlp \ -o /files_from_nlp \ -l ``` The arguments below `sfb1288inf/nlp:latest` are described in the [NLP arguments](#nlp-arguments) part. If you want to use the prebuilt image, replace `sfb1288inf/nlp:latest` with `gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp:latest`. 4. Check your results in the `//files_from_nlp` directory. ### NLP arguments `-i path` * Sets the input directory using the specified path. * required = True `-o path` * Sets the output directory using the specified path. * required = True `-l languagecode` * Tells spaCy which language will be used. * options = de (German), el (Greek), en (English), es (Spanish), fr (French), it (Italian), nl (Dutch), pt (Portuguese) * required = True