Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
linguisticsweb:tutorials:linguistics_tutorials:automaticannotation:stanford_pos_tagger_python [2019/03/07 18:32]
sabinebartsch
linguisticsweb:tutorials:linguistics_tutorials:automaticannotation:stanford_pos_tagger_python [2020/03/01 13:04] (current)
sabinebartsch [Running the local Stanford PoS Tagger on a directory of files]
Line 8: Line 8:
 While we will often be running an annotation tool in a stand-alone fashion directly from the command line, there are many scenarios in which we would like to integrate an automatic annotation tool in a larger workflow, for example with the aim of running pre-processing and annotation steps as well as analyses in one go. In this tutorial, we will be running the [[linguisticsweb:​tutorials:​linguistics_tutorials:​automaticannotation:​stanford_pos_tagger|Stanford PoS Tagger]] from a Python script. While we will often be running an annotation tool in a stand-alone fashion directly from the command line, there are many scenarios in which we would like to integrate an automatic annotation tool in a larger workflow, for example with the aim of running pre-processing and annotation steps as well as analyses in one go. In this tutorial, we will be running the [[linguisticsweb:​tutorials:​linguistics_tutorials:​automaticannotation:​stanford_pos_tagger|Stanford PoS Tagger]] from a Python script.
  
-The Stanford PoS Tagger is itself written in Java, so can be easily integrated in and called from Java programs. However, many linguists will want rather stick with Python as their programming language, especially when they are using other Python packages such as NLTK as part of their workflow. And while the Stanford PoS Tagger is not written in Python, it can nevertheless be more or less seamlessly integrated into Python programs. In this tutorial, we will be looking at two principal ways of driving the Stanford PoS Tagger from Python and show how this can be done with singular ​and multiple files in a directory.+The Stanford PoS Tagger is itself written in Java, so can be easily integrated in and called from Java programs. However, many linguists will rather ​want to stick with Python as their preferred ​programming language, especially when they are using other Python packages such as NLTK as part of their workflow. And while the Stanford PoS Tagger is not written in Python, it can nevertheless be more or less seamlessly integrated into Python programs. In this tutorial, we will be looking at two principal ways of driving the Stanford PoS Tagger from Python and show how this can be done with single files and with multiple files in a directory.
  
 ===== Running the Stanford PoS Tagger in NLTK ===== ===== Running the Stanford PoS Tagger in NLTK =====
Line 174: Line 174:
  
 Please note down the name of the directory to which you have unpacked the Stanford PoS Tagger as well as the subdirectory in which the tagging models are located. Also write down (or copy) the name of the directory in which the file(s) you would like to part of speech tag is located. Please note down the name of the directory to which you have unpacked the Stanford PoS Tagger as well as the subdirectory in which the tagging models are located. Also write down (or copy) the name of the directory in which the file(s) you would like to part of speech tag is located.
-As we will be writing output of the two subprocesses of tokenization and tagging to files in your file system, you have to create these output directories in your file system and again write down or copy the locations for further use. In this example these directories are called:+As we will be writing output of the two subprocesses of tokenization and tagging to files in your file system, you have to create these output directories in your file system and again write down or copy the locations ​to your clipboard ​for further use. In this example these directories are called:
  
 <sxh bash; gutter:​false>​ <sxh bash; gutter:​false>​