lemmatization

Closely related to the identification of parts of speech in a corpus is the process of lemmatization. It involves the reduction of inflectional variants of the words to the respective lemmas or lexemes. It is mostly used as a corpus annotation in works of digital lexicography or vocabulary, where words like doing , done , does are reduced to their lemma do.