The concordancing software AntConc can be used on any plain text corpora. This also entails corpora with annotations in standard structured formats such as:
The_DT old_JJ house_NN ._.
<div level=“1”><head> <s n=“2”><mw c5=“AV0”><w c5=“PRP” hw=“at” pos=“PREP”>At </w><w c5=“ORD” hw=“last” pos=“ADJ”>last </w></mw><w c5=“PNP” hw=“it” pos=“PRON”>it</w><w c5=“VBZ” hw=“be” pos=“VERB”>'s </w><w c5=“AV0” hw=“here” pos=“ADV”>here</w>
(source: <bncDoc xml:id=“C8K”>
The_DT old_JJ house_NN ._.
In order to display and query this kind of data, you need to be familiar with its structure. So let's take a look at some data pos tagged with the Stanford PoS Tagger. The Stanford PoS Tagger can automatically tag any plain text corpus in a variety of languages such as English, German, French, Spanish and others with part of speech tags in a format that looks like this:
token | delimiter | pos tag |
Linguistics | _ | NN |
is | _ | VBZ |
the | _ | DT |
scientific | _ | JJ |
study | _ | NN |
of | _ | IN |
natural | _ | JJ |
language | _ | NN |
. | _ | . |
Represented in the output file as a so-called 'in-line' annotation, it looks like this:
Linguistics_NNP is_VBZ the_DT scientific_JJ study_NN of_IN natural_JJ language_NN ._.
This format can be opened and queried in AntConc (3.5.9), but some settings have to be adjusted in AntConc so that the software is set to detect that internal structure of the data.
[IN PROGRESS!]
XML annotated data:
<wtext type=“OTHERPUB”><p type=“caption”> <s n=“1”><w c5=“NN1” hw=“number” pos=“SUBST”>Number </w><w c5=“CRD” hw=“133” pos=“ADJ”>133</w></s></p><div level=“1”><head> <s n=“2”><mw c5=“AV0”><w c5=“PRP” hw=“at” pos=“PREP”>At </w><w c5=“ORD” hw=“last” pos=“ADJ”>last </w></mw><w c5=“PNP” hw=“it” pos=“PRON”>it</w><w c5=“VBZ” hw=“be” pos=“VERB”>'s </w><w c5=“AV0” hw=“here” pos=“ADV”>here</w><c c5=“PUN”>!</c></s></head><p> (source: <bncDoc xml:id=“C8K”>