You have a choice between three options: enter text in the text box, choose a demo text, or upload a file. A variety of file formats are supported: plain utf-8 text (.txt), and unless the formatting is especially convoluted, .pdf, .doc, .docx, .csv, .epub, .html, .odt, .rtf and .xls files.
The output is presented as a table, which is also available for download as a spreadsheet or TSV (tab separated values) file.
The table has five columns. The first shows the token (word, punctuation unit, url... whatever the tokenizer consideres to be one token) as it appeared in the original text. The second column shows the lemma, or base form, of the token. The next column shows the most likely morphological tags for the token. The final two columns represent named entities; the first one contains tags for persons, organisations and locations, the second contains everything else (mostly time expressions and events).
Page generated in 0.00 seconds