marcuskd commited on
Commit
91d966c
1 Parent(s): 92b1945

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - marcuskd/reviews_binary_not4_concat
4
+ language:
5
+ - 'no'
6
+ - nb
7
+ - nn
8
+ metrics:
9
+ - accuracy
10
+ - recall
11
+ - precision
12
+ - f1
13
+ ---
14
+ # Model Card for Model ID
15
+
16
+ Sentiment analysis for Norwegian reviews.
17
+
18
+ # Model Description
19
+
20
+ This model is trained using a self-concatinated dataset consisting of Norwegian Review Corpus dataset (https://github.com/ltgoslo/norec) and a sentiment dataset from huggingface (https://huggingface.co/datasets/sepidmnorozy/Norwegian_sentiment).
21
+ Its purpose is merely for testing.
22
+
23
+
24
+ - **Developed by:** Simen Aabol and Marcus Dragsten
25
+ - **Finetuned from model:** norbert2
26
+
27
+ # Direct Use
28
+
29
+ Plug in Norwegian sentences to check its sentiment (negative to positive)
30
+
31
+ # Training Details
32
+
33
+ ## Training and Testing Data
34
+
35
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
36
+
37
+ https://huggingface.co/datasets/marcuskd/reviews_binary_not4_concat
38
+
39
+ ### Preprocessing
40
+
41
+ Tokenized using:
42
+
43
+ ```python
44
+ tokenizer = AutoTokenizer.from_pretrained("ltgoslo/norbert2")
45
+ ```
46
+ Training arguments for this model:
47
+ ```python
48
+ training_args = TrainingArguments(
49
+ output_dir='./results', # output directory
50
+ num_train_epochs=10, # total number of training epochs
51
+ per_device_train_batch_size=16, # batch size per device during training
52
+ per_device_eval_batch_size=64, # batch size for evaluation
53
+ warmup_steps=500, # number of warmup steps for learning rate scheduler
54
+ weight_decay=0.01, # strength of weight decay
55
+ logging_dir='./logs', # directory for storing logs
56
+ logging_steps=10,
57
+ )
58
+ ```
59
+
60
+ # Evaluation
61
+
62
+ <!-- This section describes the evaluation protocols and provides the results. -->
63
+ Evaluation by testing using test-split of dataset.
64
+ ```python
65
+ {
66
+ 'accuracy': 0.8357214261912695,
67
+ 'recall': 0.886873508353222,
68
+ 'precision': 0.8789025543992431,
69
+ 'f1': 0.8828700403896412,
70
+ 'total_time_in_seconds': 94.33071640000003,
71
+ 'samples_per_second': 31.81360340013276,
72
+ 'latency_in_seconds': 0.03143309443518828
73
+ }
74
+ ```