Porttagger

A Brazilian Portuguese part-of-speech tagger according to Universal Dependencies

Porttagger (Porttinari Part-Of-Speech) tagger was trained on the Porttinari-base corpus which is a collection of news extracted from the Folha de São Paulo newspaper site. The trained model is a fine-tuned version of Bertimbau that receives tokens and outputs part-of-speech tags. Since the model expects a sequence of tokens for its inputs, Spacy's tokenization is used to tokenize the input text.