nlztrk commited on
Commit
f8e225f
·
1 Parent(s): a8813e1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **Train-Test Set:**
2
+ - https://github.com/L2-Regulasyon/Teknofest2023/blob/main/data/raw/teknofest_train_final.csv
3
+ - https://github.com/L2-Regulasyon/Teknofest2023/blob/main/data/external/tweetset.csv
4
+
5
+ **Model:** "dbmdz/bert-base-turkish-128k-uncased"
6
+
7
+ **Önişleme**
8
+ - Karakterler küçültülmüştür
9
+ - Noktalama işaretleri silinmiştir
10
+ - Ek ofansif olmayan veri kullanılmıştır
11
+ - Ofansif olmayan cümlelerin uzunlukları ofansif olanlara uygun şekilde kırpılmıştır
12
+
13
+ ## Tokenizer Parametreleri
14
+ ```
15
+ max_length=64
16
+ padding=True
17
+ truncation=True
18
+ ```
19
+
20
+ ## Eğitim Parametreleri
21
+ - **Epoch:** 3
22
+ - **Learning Rate:** 7e-5
23
+ - **Batch-Size:** 64
24
+ - **Tokenizer Length:** 64
25
+ - **Loss:** BCE
26
+ - **Online Hard Example Mining:** Açık
27
+ - **Class-Weighting:** Açık (^0.3)
28
+ - **Early Stopping:** Kapalı
29
+ - **Stratified Batch Sampling:** Açık
30
+ - **Gradient Accumulation:** Kapalı
31
+ - **LR Scheduler:** Cosine-with-Warmup
32
+ - **Warmup Ratio:** 0.1
33
+ - **Weight Decay:** 0.01
34
+ - **LLRD:** 0.95
35
+ - **Label Smoothing:** 0.05
36
+ - **Gradient Clipping:** 1.0
37
+ - **MLM Pre-Training:** Kapalı
38
+
39
+
40
+ ## CV10 Sonuçları
41
+ ```
42
+ precision recall f1-score support
43
+
44
+ INSULT 0.8940 0.8918 0.8929 2393
45
+ OTHER 0.9319 0.9079 0.9197 3528
46
+ PROFANITY 0.9626 0.9533 0.9579 2376
47
+ RACIST 0.9317 0.9666 0.9488 2033
48
+ SEXIST 0.9388 0.9587 0.9486 2081
49
+
50
+ accuracy 0.9316 12411
51
+ macro avg 0.9318 0.9356 0.9336 12411
52
+ weighted avg 0.9316 0.9316 0.9315 12411
53
+ ```