Note: this batch file was developed in collaboration with Dieter Keßler.
Much of the time, we will want to tag more than one single file. In a real-life corpus linguistic annotation scenario, we will more like want to tag a directory of files, e.g. a whole directory of .txt files. The TreeTagger batch files are not designed to do achieve that. After some thought, I have opted to run the TreeTagger annotation batch file from another batch file that takes care of processing a directory of data. I will demonstrate this based on the English TreeTagger batch file called tag-english.bat.
We start off by creating a new batch file in our code aware editor such as VSCode. The batch file should look as follows - note that you have to adapt the following paths:
INPUT_DIR –> location of the plain text files to be annotated
OUTPUT_DIR –> location to which your ouput files will be written
as well as
TREETAGGER_BAT which must point to the location of the batch file for a particular language within your TreeTagger\bin directory.
This script assumes that your input files are plain text .txt files. If they have a different file name extension, you have to adapt “*.txt” in line 16 to the file name extension of your files. If you want to see your output files named anything other than .tagged, you will have to change this value in line 19.
@echo off setlocal echo starting rem Set the directory containing the text files set INPUT_DIR=C:\MY-PROJECT\INPUT-DATA-DIRECTORY set OUTPUT_DIR=C:\MY-PROJECT\OUPUT-DIRECTORY set TREETAGGER_BAT=C:\TreeTagger\bin\tag-english.bat rem Ensuring the output directory exists, create if it does not exist if not exist "%OUTPUT_DIR%" ( mkdir "%OUTPUT_DIR%" echo created output dir ) for /f %%G in ('dir /b /s "%INPUT_DIR%\*.txt"') do ( echo Found %%G echo "%OUTPUT_DIR%\%%~nG.tagged" call "%TREETAGGER_BAT%" "%%G" "%OUTPUT_DIR%\%%~nG.tagged" ) endlocal