Notice: Function wpdb::prepare was called incorrectly. The query does not contain the correct number of placeholders (6) for the number of arguments passed (4). Please see Debugging in WordPress for more information. (This message was added in version 4.8.3.) in /var/www/wp-includes/functions.php on line 6078

Python Setup for spaCy

This post describes how to setup Python for spaCy. The various “how to” posts on spaCy programming in Python depend on this setup. This is part of the overall spaCy Python tutorial.

Python Environment Setup

The setup instructions assume a Windows 10 or newer environment. I’m using native python approaches so you should be able to replicate this on Mac/Linux environments without too much change.

Step 1 – create a python installation

Download and install the latest python windows version from python.org which at this time was 3.10.7. If you have an existing python installation it will probably be fine as long as it is 3.x but no guarantees.

Step 2 – create a python environment

I use the venv command to create python virtual environments. To do this in Windows 10 start a command line window and enter the python command shown below. This will create a python virtual environment based on your current python installation. Now you can install python libraries in the virtual environment and keep them away from other python projects you might be working on. Run the command below:

python -m venv C:\Users\xxx\pyenv\spacy

On my system I created a folder call “pyenv” in my user folder (xxx will be your windows userid). I then created a folder called “spacy” for these projects.

Step 3 – activate the NLP python environment

When “venv” created the python virtual environment it created an “activate” batch file. In your command line window run the following command:

	cd C:\Users\xxx\pyenv\spacy
	C:\Users\xxx\pyenv\spacy\scripts\activate.bat

This will “activate” the python virtual environment. You will notice the command prompt changes to include the environment name. You must activate the environment before completing the rest of the setup tasks.

To make life easier, put the 2 lines above into a batch file “spacy.bat” and save the file to your “c:\users\xxx” folder. This is the folder location initially displayed when you start the command prompt window. Now you can simply open the command prompt window, enter “spacy” and the batch file will change directories to the python environment and run the activate script. At this point you are ready to work.

Step 4 – Create a batch file that starts Idle

Idle is a simple python GUI that ships with python. I will use Idle to run the tutorial python scripts. You can enter the following line at the command prompt to start Idle or you can put this in a file called “idle.bat” and save it to the root of your python environment – C:\Users\xxx\pyenv\spacy\idle.bat

python -m idlelib.idle

Now you can simply enter “idle” to start an “Idle” session that is running in the “spacy” python environment.

Step 5 – Install python libraries

There are two python libraries we need to display sentence diagrams and the text highlights with entity types.

	Pip install svglib  

	Pip install tkinterweb
  • svglib is used to read and write the image file generated by displaCy for the sentence diagram
  • tkinterweb is used to display the HTML generated by displayCy that formats the text with entity types

Step 8 – Install spaCy

spaCy is the python library that does all the heavy lifting. It is installed with a simple pip command.

pip install spacy

Step 8 – Install spaCy Language Models

spaCy uses trained language “models” to analyse your text. There are many models for different languages and purposes. To see a list of the language models supported go to this link: Models and Languages. Find the language you want and click on the link in the column labeled “Packages”. This will take you to another page that shows the packages you can download for that language. In my case, I’m using the “small” english package called “en_core_web_sm”. Once you have identified the name of the package you are ready to install.

spaCy provides a download command that will download and install a language model as a package in your python environment. Run the following command to install

python -m spacy download en_core_web_sm

You can also use “pip” to install language models. For more advanced information on how to install spaCy language models go to this link: Download and Install

Conclusion

That’s it! This setup will allow you to begin experimenting with spaCy for natural language processing or NLP. In the following post we will present the python code used to accomplish basic tasks in spaCy.

The Complete App

If you want to go straight to the full solution then check out this complete python application.

This post describes how to setup the python environment for spaCy and the spaCy NLP Workbench.

And Finally…

That’s it for setting up the Python environment and installing the NLP Workbench code. Check out the detail posts if you want to walk through how the Python code works.A

Total
0
Shares
Previous Post

spaCy Workbench in Python

Next Post

How To Diagram A Sentence Using spaCy and Python


Notice: Function wpdb::prepare was called incorrectly. The query does not contain the correct number of placeholders (6) for the number of arguments passed (4). Please see Debugging in WordPress for more information. (This message was added in version 4.8.3.) in /var/www/wp-includes/functions.php on line 6078
Related Posts

Notice: Function wpdb::prepare was called incorrectly. The query does not contain the correct number of placeholders (6) for the number of arguments passed (4). Please see Debugging in WordPress for more information. (This message was added in version 4.8.3.) in /var/www/wp-includes/functions.php on line 6078
Access Bridge

Maryland Mountain Quartz Valley Open Space Park

This post describes the Maryland Mountain Quartz Valley Open Space Park in Black Hawk Colorado. This park provides…
Read More

Notice: Function wpdb::prepare was called incorrectly. The query does not contain the correct number of placeholders (6) for the number of arguments passed (4). Please see Debugging in WordPress for more information. (This message was added in version 4.8.3.) in /var/www/wp-includes/functions.php on line 6078
Chase Gulch Gate

Hike Up Chase Gulch

This post describes how to hike up Chase Gulch in Black Hawk Colorado to reach the Gilpin Gold…
Read More