File size: 1,680 Bytes
6f74f49
 
 
 
 
972c64b
 
 
 
 
 
 
 
 
b7f6e9b
8c5f9f2
8f2d5b4
 
8c5f9f2
8f2d5b4
 
 
cfcb71d
 
8c5f9f2
 
 
 
 
 
 
 
cfcb71d
8c5f9f2
 
cfcb71d
8c5f9f2
8f2d5b4
 
3e827ae
 
 
 
8f2d5b4
972c64b
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
pipeline_tag: text-classification
widget:
- text: "Pani Katarzyno z jakiej racji moja paczka przyszła do sąsiada zamiast do mnie? Nie można poprawnie nadać paczki?"
  example_title: "Sentiment"
license: cc-by-4.0
language: 
- pl
---

<img src="https://public.3.basecamp.com/p/rs5XqmAuF1iEuW6U7nMHcZeY/upload/download/VL-NLP-short.png" alt="logo voicelab nlp" style="width:300px;"/>

# Sentiment Classification in Polish

```python
import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification

id2label = {0: "negative", 1: "neutral", 2: "positive"}
tokenizer = AutoTokenizer.from_pretrained("Voicelab/herbert-base-cased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("Voicelab/herbert-base-cased-sentiment")

input = ["Ale fajnie, spadł dzisiaj śnieg! Ulepimy dziś bałwana?"]

encoding = tokenizer(
          input,
          add_special_tokens=True,
          return_token_type_ids=True,
          truncation=True,
          padding='max_length',
          return_attention_mask=True,
          return_tensors='pt',
        )
output = model(**encoding).logits.to("cpu").detach().numpy()
prediction = id2label[np.argmax(output)]
print(input, "--->", prediction)

```

Predicted output:
```python
['Ale fajnie, spadł dzisiaj śnieg! Ulepimy dziś bałwana?'] ---> positive
```

### Overview
- **Language model:** [allegro/herbert-base-cased](https://huggingface.co/allegro/herbert-base-cased)   
- **Language:** pl
- **Training data:** Reviews + own data
- **Blog post:** [Sentiment analysis - COVID-19 – the source of the heated discussion](https://voicelab.ai/covid-19-the-source-of-the-heated-discussion)