[tutorial status: work in progress: extension - 04.2022]
This small example illustrates how the Stanford Named Entity Recognizer (NER) can be driven from Python 3:
# Stanford NER 3.9.2 stand-alone version
# classifier: english.muc.7class.distsim.crf.ser.gz
import nltk
from nltk import *
import os
from nltk.tokenize import word_tokenize
from nltk.tag.stanford import StanfordNERTagger
java_path = "C:/Program Files/Java/jdk1.8.0_192/bin/java.exe"
os.environ['JAVAHOME'] = java_path
model = "C:/Users/Public/utility/stanford-ner-2018-10-16/classifiers/english.muc.7class.distsim.crf.ser.gz"
jar = "C:/Users/Public/utility/stanford-ner-2018-10-16/stanford-ner-3.9.2.jar"
ner_tagger = StanfordNERTagger(model, jar, encoding = "utf-8")
text = open("C:/Users/Public/projects/python101-2018/data/sample-text.txt").read()
words = word_tokenize(text)
classified_words = ner_tagger.tag(words)
print(classified_words)
for x, y in classified_words:
print(x + "_" + y)
Note that the last two lines of code (line 24-25) illustrate a way of converting the original list of tuples (classified_words) to a vertical list of tokens with NER labels by means of a for-loop.
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International