mirror of
				https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git
				synced 2025-10-31 20:43:14 +00:00 
			
		
		
		
	
			
				
					
						
					
					d5805df8e7ca29474d31892edb2fc7cf2d98f65d
				
			
			
		
	Natural language processing
This repository provides all code that is needed to build a container image for natural language processing utilizing spaCy.
Build image
- Clone this repository and navigate into it:
git clone https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git && cd nlp
- Build image:
docker build -t sfb1288inf/nlp:latest .
Alternatively build from the GitLab repository without cloning:
- Build image:
docker build -t sfb1288inf/nlp:latest https://gitlab.ub.uni-bielefeld.de/sfb1288inf/nlp.git
Download prebuilt image
The GitLab registry provides a prebuilt image. It is automatically created, utilizing the conquaire build servers.
- Download image:
docker pull gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp:latest
Run
- Create input and output directories for the NLP software:
mkdir -p /<mydatalocation>/files_for_nlp /<mydatalocation>/files_from_nlp
- 
Place your text files inside the /<mydatalocation>/files_for_nlpdirectory. Files should all contain text of the same language.
- 
Start the NLP process. 
docker run \
    --rm \
    -it \
    -v /<mydatalocation>/files_for_nlp:/files_for_nlp \
    -v /<mydatalocation>/files_from_nlp:/files_from_nlp \
    sfb1288inf/nlp:latest \
        -i /files_for_nlp \
        -o /files_from_nlp \
        -l <languagecode>
The arguments below sfb1288inf/nlp:latest are described in the NLP arguments part.
If you want to use the prebuilt image, replace sfb1288inf/nlp:latest with gitlab.ub.uni-bielefeld.de:4567/sfb1288inf/nlp:latest.
- Check your results in the /<mydatalocation>/files_from_nlpdirectory.
NLP arguments
-i path
- Sets the input directory using the specified path.
- required = True
-o path
- Sets the output directory using the specified path.
- required = True
-l languagecode
- Tells spaCy which language will be used.
- options = de (German), el (Greek), en (English), es (Spanish), fr (French), it (Italian), nl (Dutch), pt (Portuguese)
- required = True
					Languages
				
				
								
								
									Python
								
								95.2%
							
						
							
								
								
									Dockerfile
								
								4.8%