Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
linguisticsweb:tutorials:linguistics_tutorials:automaticannotation:stanford_word_segmenter [2017/06/09 12:31]
sabinebartsch created
linguisticsweb:tutorials:linguistics_tutorials:automaticannotation:stanford_word_segmenter [2019/05/15 11:24]
sabinebartsch [Download and extraction]
Line 7: Line 7:
 Download the Stanford Word Segmenter from the [[https://​nlp.stanford.edu/​software/​segmenter.shtml|Stanford NLP software website]] and unpack the zip file to a location of your choice. Download the Stanford Word Segmenter from the [[https://​nlp.stanford.edu/​software/​segmenter.shtml|Stanford NLP software website]] and unpack the zip file to a location of your choice.
  
-It does not require any installation proper, but needs Java version 1.6 or upwards to be installed on your machine as a Java JDK (Java Development Kit).+It does not require any installation proper, but needs Java version 1.6 or upwards to be installed on your machine as a [[linguisticsweb:​tutorials:​linguistics_tutorials:​basics:​environment:​java|Java JDK (Java Development Kit) or Open JDK]].
  
 ===== Running the software ===== ===== Running the software =====
Line 25: Line 25:
 Now open this file in a good UTF-8 aware text editor (Notepad++ or Sublime will do fine). Now open this file in a good UTF-8 aware text editor (Notepad++ or Sublime will do fine).
  
-Chances are the output will still not look right. ​This has a few reasons. ​+===== Shell configuration issues ===== 
 + 
 +Chances are the output will still not look right. ​There are a few potential ​reasons ​for this and thus a few things worth checking
 First of all, check the encoding of the input file test.simple.utf8;​ it will be UTF-8. First of all, check the encoding of the input file test.simple.utf8;​ it will be UTF-8.
-Next, check the encoding of your ouput file; it will be UTF-16.+Next, check the encoding of your output ​file; it will be UTF-16.
  
-So why doesn'​t ​it look right. Well, it looks like (I have to verify this) your shell is interfering with the output.+So why does it still not look right. Well, it looks like (I have to verify this) your shell is interfering with the output.
 It does not look right in the shell window itself which is probably due to the fact that you have no Chinese fonts installed and that your fonts are unable to display Chinese characters. Next, it might simple not '​know'​ that you are processing UTF-8, so we need to make sure it does that. It does not look right in the shell window itself which is probably due to the fact that you have no Chinese fonts installed and that your fonts are unable to display Chinese characters. Next, it might simple not '​know'​ that you are processing UTF-8, so we need to make sure it does that.