mirror of
				https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git
				synced 2025-10-31 12:52:47 +00:00 
			
		
		
		
	
			
				
					
						
					
					504861ae0717d2f8c7f6a9765720c0008d94bd94
				
			
			
		
	Natural language processing
This repository provides all code that is needed to build a container image for natural language processing utilizing spaCy.
Build image
- Clone this repository and navigate into it:
git clone https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git && cd nlp
- Build image:
docker build -t sfb1288inf/nlp:latest .
Alternatively build from the GitLab repository without cloning:
- Build image:
docker build -t sfb1288inf/nlp:latest https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git
Download prebuilt image
The GitLab registry provides a prebuilt image. It is automatically created, utilizing the conquaire build servers.
- Download image:
docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp:latest
Run
- Create input and output directories for the NLP software:
mkdir -p /<mydatalocation>/files_for_nlp /<mydatalocation>/files_from_nlp
- 
Place your text files inside the /<mydatalocation>/files_for_nlpdirectory. Files should all contain text of the same language.
- 
Start the NLP process. 
docker run \
    --rm \
    -it \
    -u $(id -u $USER):$(id -g $USER) \
    -v /<mydatalocation>/files_for_nlp:/input \
    -v /<mydatalocation>/files_from_nlp:/output \
    sfb1288inf/nlp:latest \
        -i /input \
        -l <languagecode> \
        -o /output
The arguments below sfb1288inf/nlp:latest are described in the NLP arguments part.
If you want to use the prebuilt image, replace sfb1288inf/nlp:latest with gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp:latest.
- Check your results in the /<mydatalocation>/files_from_nlpdirectory.
NLP arguments
-i path
- Sets the input directory using the specified path.
- required = True
-o path
- Sets the output directory using the specified path.
- required = True
-l languagecode
- Tells spaCy which language will be used.
- options = de (German), el (Greek), en (English), es (Spanish), fr (French), it (Italian), nl (Dutch), pt (Portuguese)
- required = True
					Languages
				
				
								
								
									Python
								
								95.2%
							
						
							
								
								
									Dockerfile
								
								4.8%