Software-Engineering
spacy
# spaCy ?
Is an open-source library, that can be used for NLP-tasks.
It’s a really powerful library compared to the nltk-library (deprecated).
# Init
spacy.load loads the full pipeline:
# Tokenization
# Removing Stopwords
Stopwords are words, that appear really often, but they are not neccessary for the sense of a sentence.
To find stopwords use the .is_stop
property
# Part Of Speech Tagging - POS
A process of identifying words based on the “wordclass”.
For example: verb, adjective, noun, …
In spacy there is the pos_
property
…
Exercise: Find Nouns and Proper-Nouns of JobDescriptions
Exercise: Count different word-classes
Exercise: Find the frequently used word
# Named Entity Recognition - NER
Here, the keyword .ents
is used.
Highlighting using displacy
Exercise: Find all people in the text.
Also with the token
class you can access entity-informations: