mirror of
https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git
synced 2024-12-27 12:04:18 +00:00
Update
This commit is contained in:
parent
5b7bc2a840
commit
6c8b32fad4
@ -31,6 +31,7 @@ RUN pip3 install wheel && pip3 install -U spacy && \
|
|||||||
python3 -m spacy download en && \
|
python3 -m spacy download en && \
|
||||||
python3 -m spacy download es && \
|
python3 -m spacy download es && \
|
||||||
python3 -m spacy download fr && \
|
python3 -m spacy download fr && \
|
||||||
|
python3 -m spacy download it && \
|
||||||
python3 -m spacy download pt
|
python3 -m spacy download pt
|
||||||
|
|
||||||
COPY nlp /usr/local/bin
|
COPY nlp /usr/local/bin
|
||||||
|
86
README.md
86
README.md
@ -1,37 +1,73 @@
|
|||||||
# Natural language processing
|
# Natural language processing
|
||||||
|
|
||||||
This repository provides all code that is needed to build a container image for natural language processing utilising [spaCy](https://spacy.io).
|
This repository provides all code that is needed to build a container image for natural language processing utilizing [spaCy](https://spacy.io).
|
||||||
In case you don't want to build the image by yourself, there is also a prebuild image that can be used in the [registry](https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp/container_registry).
|
|
||||||
|
|
||||||
## Build the image
|
## Build image
|
||||||
|
|
||||||
```console
|
1. Clone this repository and navigate into it:
|
||||||
user@machine:~$ cd <path-to-this-repository>
|
```
|
||||||
user@machine:~$ docker build -t gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp .
|
git clone https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git && cd nlp
|
||||||
```
|
```
|
||||||
|
|
||||||
## Starting a container
|
2. Build image:
|
||||||
|
```
|
||||||
```console
|
docker build -t sfb1288inf/nlp:latest .
|
||||||
user@machine:~$ docker run \
|
|
||||||
--name nlp-container \
|
|
||||||
-dit \
|
|
||||||
-v <your-input-directory>:/root/files_for_nlp \
|
|
||||||
-v <your-output-directory>:/root/files_from_nlp \
|
|
||||||
gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Start a natural language processing run
|
Alternatively build from the GitLab repository without cloning:
|
||||||
|
|
||||||
```console
|
1. Build image:
|
||||||
user@machine:~$ docker exec -it nlp-container \
|
```
|
||||||
nlp -i files_for_nlp -o files_from_nlp -l <language-code>
|
docker build -t sfb1288inf/nlp:latest https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git
|
||||||
```
|
```
|
||||||
|
|
||||||
Where <language-code> needs to be one of the following:
|
## Download prebuilt image
|
||||||
|
|
||||||
* de (German)
|
The GitLab registry provides a prebuilt image. It is automatically created, utilizing the conquaire build servers.
|
||||||
* en (English)
|
|
||||||
* es (Spanish)
|
1. Download image:
|
||||||
* fr (French)
|
```
|
||||||
* pt (Portuguese)
|
docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
## Run
|
||||||
|
|
||||||
|
1. Create input and output directories for the NLP software:
|
||||||
|
```
|
||||||
|
mkdir -p /<mydatalocation>/files_for_nlp /<mydatalocation>/files_from_nlp
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Place your text files inside the `/<mydatalocation>/files_for_nlp` directory. Files should all contain text of the same language.
|
||||||
|
|
||||||
|
3. Start the NLP process.
|
||||||
|
```
|
||||||
|
docker run \
|
||||||
|
--rm \
|
||||||
|
-it \
|
||||||
|
-v /<mydatalocation>/files_for_nlp:/files_for_nlp \
|
||||||
|
-v /<mydatalocation>/files_from_nlp:/files_from_nlp \
|
||||||
|
sfb1288inf/nlp:latest \
|
||||||
|
-i /files_for_nlp \
|
||||||
|
-o /files_from_nlp \
|
||||||
|
-l <languagecode>
|
||||||
|
```
|
||||||
|
The arguments below `sfb1288inf/nlp:latest` are described in the [NLP arguments](#nlp-arguments) part.
|
||||||
|
|
||||||
|
If you want to use the prebuilt image, replace `sfb1288inf/nlp:latest` with `gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp:latest`.
|
||||||
|
|
||||||
|
4. Check your results in the `/<mydatalocation>/files_from_nlp` directory.
|
||||||
|
|
||||||
|
### NLP arguments
|
||||||
|
|
||||||
|
`-i path`
|
||||||
|
* Sets the input directory using the specified path.
|
||||||
|
* required = True
|
||||||
|
|
||||||
|
`-o path`
|
||||||
|
* Sets the output directory using the specified path.
|
||||||
|
* required = True
|
||||||
|
|
||||||
|
`-l languagecode`
|
||||||
|
* Tells spaCy which language will be used.
|
||||||
|
* options = de (German), el (Greek), en (English), es (Spanish), fr (French), it (Italian), nl (Dutch), pt (Portuguese)
|
||||||
|
* required = True
|
||||||
|
2
nlp
2
nlp
@ -28,7 +28,7 @@ def parse_arguments():
|
|||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-l',
|
'-l',
|
||||||
choices=['de', 'en', 'es', 'fr', 'pt'],
|
choices=['de', 'el', 'en', 'es', 'fr', 'it', 'nl', 'pt'],
|
||||||
dest='lang',
|
dest='lang',
|
||||||
required=True
|
required=True
|
||||||
)
|
)
|
||||||
|
@ -15,7 +15,7 @@ parser.add_argument(
|
|||||||
)
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'-l',
|
'-l',
|
||||||
choices=['de', 'en', 'es', 'fr', 'pt'],
|
choices=['de', 'el', 'en', 'es', 'fr', 'it', 'nl', 'pt'],
|
||||||
dest='lang',
|
dest='lang',
|
||||||
required=True
|
required=True
|
||||||
)
|
)
|
||||||
@ -26,8 +26,9 @@ parser.add_argument(
|
|||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
SPACY_MODELS = {
|
SPACY_MODELS = {
|
||||||
'de': 'de_core_news_sm', 'en': 'en_core_web_sm', 'es': 'es_core_news_sm',
|
'de': 'de_core_news_sm', 'el': 'el_core_news_sm', 'en': 'en_core_web_sm',
|
||||||
'fr': 'fr_core_news_sm', 'pt': 'pt_core_news_sm'
|
'es': 'es_core_news_sm', 'fr': 'fr_core_news_sm', 'it': 'it_core_news_sm',
|
||||||
|
'nl': 'nl_core_news_sm', 'pt': 'pt_core_news_sm'
|
||||||
}
|
}
|
||||||
|
|
||||||
# Set the language model for spacy
|
# Set the language model for spacy
|
||||||
|
Loading…
Reference in New Issue
Block a user