Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
linguisticsweb:tutorials:linguistics_tutorials:semi-automatic_annotation:wordfreak_and_opennlp [2019/04/03 17:24]
sabinebartsch [1 What are WordFreak and the OpenNLP Tools?]
linguisticsweb:tutorials:linguistics_tutorials:semi-automatic_annotation:wordfreak_and_opennlp [2019/05/15 15:39] (current)
sabinebartsch [4.4 The Executable Batch-file]
Line 30: Line 30:
 ====  4.2 Copy & Paste The Tools Into The Corresponding Directories ==== ====  4.2 Copy & Paste The Tools Into The Corresponding Directories ====
  
-Save both tools into a new folder on your computer (preferably not on the Desktop[[#​EndNote1|<​sup>​1</​sup>​]]), called **WordFreak and OpenNLP**. Extract the zip-file of the OpenNLP tools to a new folder and rename it to **OpenNLP** ​[[#​EndNote2|<​sup>​2</​sup>​]]</​em>​. Copy the java file of the WordFreak tool into a new folder within **WordFreak and OpenNLP** and rename the folder to **WordFreak**. Create a new folder called ''​**plugins**''​ in the WordFreak directory. Move all the plug-in files you downloaded from the WordFreak website into this folder.+Save both tools into a new folder on your computer (preferably not on the Desktop((Many tools in CCL do not respond too favourably to an installation to the Desktop. This is due to the fact that Desktop is only a virtual link and not a 'real location'​ in the file system which is not what most tools expect as their location.))), called **WordFreak and OpenNLP**. Extract the zip-file of the OpenNLP tools to a new folder and rename it to **OpenNLP**((Throughout these instructions we use these names as examples. Of course these names can be changed according to your preference, but be aware to change the folder names in the executable script file later, too. ))</​em>​. Copy the java file of the WordFreak tool into a new folder within **WordFreak and OpenNLP** and rename the folder to **WordFreak**. Create a new folder called ''​**plugins**''​ in the WordFreak directory. Move all the plug-in files you downloaded from the WordFreak website into this folder.
  
 By now your folder structure should look like this and contain the following files: By now your folder structure should look like this and contain the following files:
Line 49: Line 49:
 ====  4.4 The Executable Batch-file ==== ====  4.4 The Executable Batch-file ====
  
-In order to be able to use the OpenNLP tools with the WordFreak environment you need to create a batch-file. Batch-files are frequently used to execute applications under the windows system environment. First of all create an empty file in the main directory ​<​em>&​ldquo;</​em>​WordFreak<​em>​_</​em>​OpenNLP<​em>&​rdquo;</​em> ​and name it<em> &​ldquo;</​em>​WordFreak<em>.bat&rdquo;.</​em> ​Open this empty file with an editor like Notepad++ and copy and paste the content of figure 1 into the file[[#​EndNote3][<​sup>​3</​sup>​]] ​and save it:+In order to be able to use the OpenNLP tools with the WordFreak environment you need to create a batch-file. Batch-files are frequently used to execute applications under the windows system environment. First of all create an empty file in the main directory ​WordFreak_OpenNLP ​and name it WordFreak.bat. Open this empty file with an editor like Notepad++ and copy and paste the content of figure 1 into the file((Be aware that the classpath must not contain any linebreaks. Only the line in front of ''​java WordFreak -d …''​ should contain a line break.)) ​and save it:
  
-''​set CLASSPATH=%~dp0wordfreak\wordfreak-2.2.jar;​+<code java> 
 +set CLASSPATH=%~dp0wordfreak\wordfreak-2.2.jar;​
 %~dp0wordfreak\plugins\opennlp-wordfreak-1.1.jar;​ %~dp0wordfreak\plugins\opennlp-wordfreak-1.1.jar;​
 %~dp0wordfreak\plugins\opennlp-tools-1.4.3.jar;​ %~dp0wordfreak\plugins\opennlp-tools-1.4.3.jar;​
 %~dp0wordfreak\plugins\maxent-2.5.2.jar;​ %~dp0wordfreak\plugins\maxent-2.5.2.jar;​
 %~dp0wordfreak\plugins\trove.jar;​ %~dp0wordfreak\plugins\trove.jar;​
 +</​code>​
  
-java  wordfreak -d %~dp0opennlp\models\english''​+<code batch> 
 +java  wordfreak -d %~dp0opennlp\models\english 
 +</​code>​
  
 Be aware that the batch-script will not work if you change the names of the containing folders of <​em>&​ldquo;</​em>​WordFreak<​em>​_</​em>​OpenNLP<​em>&​rdquo;</​em>​. You should furthermore check if the names of the files you downloaded have changed, due to updated versions, etc. You either have to change the script according to your directories and files or change the folder and file names as stated in the batch-file. The location of the <​em>&​ldquo;</​em>​WordFreak<​em>​_</​em>​OpenNLP<​em>&​rdquo;</​em>​ folder however does not have any impact on the execution of the batch-script,​ as the script only looks for folders inside its containing directory. Be aware that the batch-script will not work if you change the names of the containing folders of <​em>&​ldquo;</​em>​WordFreak<​em>​_</​em>​OpenNLP<​em>&​rdquo;</​em>​. You should furthermore check if the names of the files you downloaded have changed, due to updated versions, etc. You either have to change the script according to your directories and files or change the folder and file names as stated in the batch-file. The location of the <​em>&​ldquo;</​em>​WordFreak<​em>​_</​em>​OpenNLP<​em>&​rdquo;</​em>​ folder however does not have any impact on the execution of the batch-script,​ as the script only looks for folders inside its containing directory.
Line 76: Line 80:
 To start the WordFreak framework together with the implemented OpenNLP tools you need to execute the &​ldquo;​WordFreak<​em>​.bat&​rdquo;</​em>​ file by double clicking on it. The WordFreak main window opens: To start the WordFreak framework together with the implemented OpenNLP tools you need to execute the &​ldquo;​WordFreak<​em>​.bat&​rdquo;</​em>​ file by double clicking on it. The WordFreak main window opens:
  
 +''​
 <p align="​center"><​img width="​461"​ alt="​WordFreak"​ src="​http://​linglit194.linglit.tu-darmstadt.de/​linguisticsweb/​pub/​LinguisticsWeb/​WordFreak-OpenNLP/​mainWindow-wordfreak.jpg"​ height="​361"​ border="​0"​ /></​p>​ <p align="​center"><​img width="​461"​ alt="​WordFreak"​ src="​http://​linglit194.linglit.tu-darmstadt.de/​linguisticsweb/​pub/​LinguisticsWeb/​WordFreak-OpenNLP/​mainWindow-wordfreak.jpg"​ height="​361"​ border="​0"​ /></​p>​
 +''​
  
 +''​
 <p align="​center">​ __Screenshot 1: The WordFreak main window__ </p> <p align="​center">​ __Screenshot 1: The WordFreak main window__ </p>
 +''​
 ====  5.3 Text Processing ==== ====  5.3 Text Processing ====
  
-After starting the application WordFreak creates an untitled project, which by default is not saved at any place. Thus, be sure to save your processed project every now and then to prevent data loss. The descriptions below refer to text files (i.e. &​ldquo;​article.txt&​rdquo;​). If you use other file types the steps below may not be applicable.+After starting the application WordFreak creates an untitled project, which by default is not saved at any place. Thus, be sure to save your processed project every now and then to prevent data loss. The descriptions below refer to text files (i.e. ''​&​ldquo;​article.txt&​rdquo;​''​). If you use other file types the steps below may not be applicable.
  
-Before you can process a text, you have to tell WordFreak which raw text you want to work with and load it[[#​EndNote4][<​sup>​4</​sup>​]].+Before you can process a text, you have to tell WordFreak which raw text you want to work with and load it.((Created text annotations will not be saved within the raw text file, but in an extra annotation file in XML format. Thus, WordFreak will not make any changes to the original text file.))
  
  
 === 5.3.1 Add & Load A Text File === === 5.3.1 Add & Load A Text File ===
  
-To add and load a text file click on the <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Add</​span>​-Button and select the directory in which you have stored your text file. By default the archive-type is set to <span style="​background-color:#​c1bdd7;​ border: solid 1px">​TreeBank Files</​span>​. To be able to select other file types, you have to mark the corresponding file type from the drop-down list. In this example you need to select <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Text Files</​span>​. Select the file in the corresponding folder and click <span style="​background-color:#​c1bdd7;​ border: solid 1px">​open</​span>​ to add the file to your untitled project. Respond to the question in the appearing pop-up window with <span style="​background-color:#​c1bdd7;​ border: solid 1px">​yes</​span>​ if you want to create an annotation file. The annotation file is created in the folder of the corresponding text file.+To add and load a text file click on the ''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Add</​span>​''​-Button and select the directory in which you have stored your text file. By default the archive-type is set to ''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​TreeBank Files</​span>​''​. To be able to select other file types, you have to mark the corresponding file type from the drop-down list. In this example you need to select ​''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Text Files</​span>​''​. Select the file in the corresponding folder and click ''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​open</​span>​'' ​to add the file to your untitled project. Respond to the question in the appearing pop-up window with ''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​yes</​span>​'' ​if you want to create an annotation file. The annotation file is created in the folder of the corresponding text file.
  
-You can add other files by repeating the previous steps or by holding the <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Control</​span>​-Button (<span style="​background-color:#​c1bdd7;​ border: solid 1px">​STRG</​span>​ on a German keyboard or the Apple-key) and select the files you want to add. After adding all the files, you need the application to load the added files and their annotation files in order to let it process them. Select the text file you want to be loaded from the WordFreak main window and click the <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Load</​span>​-Button from the right panel. You can see that the text- and annotation-files are loaded by the appearing green mark:+You can add other files by repeating the previous steps or by holding the ''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Control</​span>​''​-Button (''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​STRG</​span>​'' ​on a German keyboard or the Apple-key) and select the files you want to add. After adding all the files, you need the application to load the added files and their annotation files in order to let it process them. Select the text file you want to be loaded from the WordFreak main window and click the ''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Load</​span>​''​-Button from the right panel. You can see that the text- and annotation-files are loaded by the appearing green mark:
  
 +''​
 <p align="​center"><​img width="​552"​ alt="​WordFreak"​ src="​http://​linglit194.linglit.tu-darmstadt.de/​linguisticsweb/​pub/​LinguisticsWeb/​WordFreak-OpenNLP/​LoadText-wordfreak.jpg"​ height="​438"​ border="​0"​ /></​p>​ <p align="​center"><​img width="​552"​ alt="​WordFreak"​ src="​http://​linglit194.linglit.tu-darmstadt.de/​linguisticsweb/​pub/​LinguisticsWeb/​WordFreak-OpenNLP/​LoadText-wordfreak.jpg"​ height="​438"​ border="​0"​ /></​p>​
 +''​ 
 +''​
 <p align="​center">​ __Screenshot 2: Load a text into WordFreak__ </p> <p align="​center">​ __Screenshot 2: Load a text into WordFreak__ </p>
 +''​
 WordFreak cannot be used to build pipeline processes, which means we have to process each step separately to add POS-Tags to a text. Thus, before a text can be annotated, it needs to be preprocessed with a sentence detector and tokenizer. WordFreak cannot be used to build pipeline processes, which means we have to process each step separately to add POS-Tags to a text. Thus, before a text can be annotated, it needs to be preprocessed with a sentence detector and tokenizer.
  
 === 5.3.2 Sentence Detection & Tokenization === === 5.3.2 Sentence Detection & Tokenization ===
  
-At the bottom of the main window there are several drop-down menus which display frequently used options for text processing (cp. screenshot 1). In a first step we need to select a sentence detector to tell WordFreak which algorithm we want to use for the sentence detection. You can either select the sentence detector <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Open Sentence</​span>​ from the Tagger menu of the main window or from the Tagger menu in main menu bar on the top of the main window. Now start the tagging process by either selecting <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Tag</​span>​ from the Tagger menu bar or pressing the icon below the main menu bar.+At the bottom of the main window there are several drop-down menus which display frequently used options for text processing (cp. screenshot 1). In a first step we need to select a sentence detector to tell WordFreak which algorithm we want to use for the sentence detection. You can either select the sentence detector ​''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Open Sentence</​span>​'' ​from the Tagger menu of the main window or from the Tagger menu in main menu bar on the top of the main window. Now start the tagging process by either selecting ​''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Tag</​span>​'' ​from the Tagger menu bar or pressing the icon below the main menu bar.
  
 In a next step we select the annotation from the Annotation menu (<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Set annotation</​span>​ from the main menu bar) to open an alteration panel for manual correction, in case needed for manual correction of the tagged text. For the sentence detection we need the sentence annotation. In a next step we select the annotation from the Annotation menu (<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Set annotation</​span>​ from the main menu bar) to open an alteration panel for manual correction, in case needed for manual correction of the tagged text. For the sentence detection we need the sentence annotation.
  
-This process needs to be repeated for the tokenization of the text. First select <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Open Token</​span>​ from the Tagger menu and repeat the process.+This process needs to be repeated for the tokenization of the text. First select ​''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Open Token</​span>​'' ​from the Tagger menu and repeat the process.
  
 === 5.3.3 POS-Tagging === === 5.3.3 POS-Tagging ===
  
-As described before we select from the <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Tagger</​span>​ menu the necessary attribute for the text to be processed. In this case we select <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Open POS</​span>​ for apply the OpenNLP tool to our preprocessed text file. From the <span style="​background-color:#​c1bdd7;​ border: solid 1px">​Annotation</​span>​ menu we select again the OpenNLP tagger accordingly and let WordFreak do the tagging. By default WordFreak uses the Penn Treebank tag-set. If you want to use a different kind of tag-set, you can either install it and select it from the drop-down menu or use the tag-sets already installed.+As described before we select from the ''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Tagger</​span>​'' ​menu the necessary attribute for the text to be processed. In this case we select ​''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Open POS</​span>​'' ​for apply the OpenNLP tool to our preprocessed text file. From the ''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​Annotation</​span>​'' ​menu we select again the OpenNLP tagger accordingly and let WordFreak do the tagging. By default WordFreak uses the Penn Treebank tag-set. If you want to use a different kind of tag-set, you can either install it and select it from the drop-down menu or use the tag-sets already installed.
  
-In a next step we check on the automatically generated tags in our text. By selecting <span style="​background-color:#​c1bdd7;​ border: solid 1px">​TextPOS</​span>​ from the Viewer menu WordFreak opens a new tab in which we can see our text with the respective tags attached. In the example, shown in screenshot 3, there is an annotation missing for the word group &​ldquo;​according to&​rdquo;​. By clicking on &​ldquo;​PRP&​rdquo;​ in the right panel we add the annotation to the previously selected word &​ldquo;​according&​rdquo;​. The particle &​ldquo;​to&​rdquo;​ will be added automatically as a part of the preposition.+In a next step we check on the automatically generated tags in our text. By selecting ​''​<span style="​background-color:#​c1bdd7;​ border: solid 1px">​TextPOS</​span>​'' ​from the Viewer menu WordFreak opens a new tab in which we can see our text with the respective tags attached. In the example, shown in screenshot 3, there is an annotation missing for the word group &​ldquo;​according to&​rdquo;​. By clicking on &​ldquo;​PRP&​rdquo;​ in the right panel we add the annotation to the previously selected word &​ldquo;​according&​rdquo;​. The particle &​ldquo;​to&​rdquo;​ will be added automatically as a part of the preposition.
 <p align="​center"><​img width="​706"​ alt="​WordFreak"​ src="​http://​linglit194.linglit.tu-darmstadt.de/​linguisticsweb/​pub/​LinguisticsWeb/​WordFreak-OpenNLP/​Annotation-wordfreak.jpg"​ height="​395"​ border="​0"​ /></​p>​ <p align="​center"><​img width="​706"​ alt="​WordFreak"​ src="​http://​linglit194.linglit.tu-darmstadt.de/​linguisticsweb/​pub/​LinguisticsWeb/​WordFreak-OpenNLP/​Annotation-wordfreak.jpg"​ height="​395"​ border="​0"​ /></​p>​
  
Line 125: Line 134:
 %BIBTEX{topic="​LinguisticsReferences"​ select="​keywords : '​WordFreak'"​ sort="​author"​ rev="​off"​ errors="​off"​}% %BIBTEX{topic="​LinguisticsReferences"​ select="​keywords : '​WordFreak'"​ sort="​author"​ rev="​off"​ errors="​off"​}%
  
---- 
- 
-#EndNote1 <​sup>​1</​sup>​ There are some known problems concerning the execution on Windows XP machines, due to white spaces in the directory path. Be aware that this can cause the program not to run properly. 
- 
-#EndNote2 <​sup>​2</​sup>​ Throughout these instructions we use these names as examples. Of course these names can be changed according to your preference, but be aware to change the folder names in the executable script file later too. 
- 
-#EndNote3 <​sup>​3</​sup>​ Be aware that the classpath does not contains any linebreaks. Only the line in front of &​bdquo;​java WordFreak &​ndash;​d...&​ldquo;​ should contain a line break. 
- 
-#EndNote4 <​sup>​4</​sup>​ Created text annotations will not be saved within the raw text file, but in an extra annotation file in XML format. Thus, WordFreak will not make any change to this text file.