Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
**Train-Test Set:**
|
2 |
+
- https://github.com/L2-Regulasyon/Teknofest2023/blob/main/data/raw/teknofest_train_final.csv
|
3 |
+
- https://github.com/L2-Regulasyon/Teknofest2023/blob/main/data/external/tweetset.csv
|
4 |
+
|
5 |
+
**Model:** "dbmdz/bert-base-turkish-128k-uncased"
|
6 |
+
|
7 |
+
**Önişleme**
|
8 |
+
- Karakterler küçültülmüştür
|
9 |
+
- Noktalama işaretleri silinmiştir
|
10 |
+
- Ek ofansif olmayan veri kullanılmıştır
|
11 |
+
- Ofansif olmayan cümlelerin uzunlukları ofansif olanlara uygun şekilde kırpılmıştır
|
12 |
+
|
13 |
+
## Tokenizer Parametreleri
|
14 |
+
```
|
15 |
+
max_length=64
|
16 |
+
padding=True
|
17 |
+
truncation=True
|
18 |
+
```
|
19 |
+
|
20 |
+
## Eğitim Parametreleri
|
21 |
+
- **Epoch:** 3
|
22 |
+
- **Learning Rate:** 7e-5
|
23 |
+
- **Batch-Size:** 64
|
24 |
+
- **Tokenizer Length:** 64
|
25 |
+
- **Loss:** BCE
|
26 |
+
- **Online Hard Example Mining:** Açık
|
27 |
+
- **Class-Weighting:** Açık (^0.3)
|
28 |
+
- **Early Stopping:** Kapalı
|
29 |
+
- **Stratified Batch Sampling:** Açık
|
30 |
+
- **Gradient Accumulation:** Kapalı
|
31 |
+
- **LR Scheduler:** Cosine-with-Warmup
|
32 |
+
- **Warmup Ratio:** 0.1
|
33 |
+
- **Weight Decay:** 0.01
|
34 |
+
- **LLRD:** 0.95
|
35 |
+
- **Label Smoothing:** 0.05
|
36 |
+
- **Gradient Clipping:** 1.0
|
37 |
+
- **MLM Pre-Training:** Kapalı
|
38 |
+
|
39 |
+
|
40 |
+
## CV10 Sonuçları
|
41 |
+
```
|
42 |
+
precision recall f1-score support
|
43 |
+
|
44 |
+
INSULT 0.8940 0.8918 0.8929 2393
|
45 |
+
OTHER 0.9319 0.9079 0.9197 3528
|
46 |
+
PROFANITY 0.9626 0.9533 0.9579 2376
|
47 |
+
RACIST 0.9317 0.9666 0.9488 2033
|
48 |
+
SEXIST 0.9388 0.9587 0.9486 2081
|
49 |
+
|
50 |
+
accuracy 0.9316 12411
|
51 |
+
macro avg 0.9318 0.9356 0.9336 12411
|
52 |
+
weighted avg 0.9316 0.9316 0.9315 12411
|
53 |
+
```
|