Oriaz commited on
Commit
7cf12d5
·
verified ·
1 Parent(s): 6122595

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -24
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Submission Template
3
  emoji: 🔥
4
  colorFrom: yellow
5
  colorTo: green
@@ -7,27 +7,29 @@ sdk: docker
7
  pinned: false
8
  ---
9
 
 
10
 
11
- # Random Baseline Model for Climate Disinformation Classification
12
 
13
- ## Model Description
14
 
15
- This is a random baseline model for the Frugal AI Challenge 2024, specifically for the text classification task of identifying climate disinformation. The model serves as a performance floor, randomly assigning labels to text inputs without any learning.
16
 
17
- ### Intended Use
 
 
18
 
19
  - **Primary intended uses**: Baseline comparison for climate disinformation classification models
20
  - **Primary intended users**: Researchers and developers participating in the Frugal AI Challenge
21
  - **Out-of-scope use cases**: Not intended for production use or real-world classification tasks
22
 
23
- ## Training Data
24
 
25
  The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
26
  - Size: ~6000 examples
27
  - Split: 80% train, 20% test
28
  - 8 categories of climate disinformation claims
29
 
30
- ### Labels
31
  0. No relevant claim detected
32
  1. Global warming is not happening
33
  2. Not caused by humans
@@ -37,18 +39,28 @@ The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
37
  6. Proponents are biased
38
  7. Fossil fuels are needed
39
 
40
- ## Performance
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
- ### Metrics
43
- - **Accuracy**: ~12.5% (random chance with 8 classes)
44
- - **Environmental Impact**:
45
- - Emissions tracked in gCO2eq
46
- - Energy consumption tracked in Wh
47
 
48
- ### Model Architecture
49
- The model implements a random choice between the 8 possible labels, serving as the simplest possible baseline.
50
 
51
- ## Environmental Impact
52
 
53
  Environmental impact is tracked using CodeCarbon, measuring:
54
  - Carbon emissions during inference
@@ -56,16 +68,13 @@ Environmental impact is tracked using CodeCarbon, measuring:
56
 
57
  This tracking helps establish a baseline for the environmental impact of model deployment and inference.
58
 
59
- ## Limitations
60
- - Makes completely random predictions
61
- - No learning or pattern recognition
62
- - No consideration of input text
63
- - Serves only as a baseline reference
64
- - Not suitable for any real-world applications
65
 
66
- ## Ethical Considerations
67
 
68
  - Dataset contains sensitive topics related to climate disinformation
69
- - Model makes random predictions and should not be used for actual classification
70
  - Environmental impact is tracked to promote awareness of AI's carbon footprint
71
  ```
 
1
  ---
2
+ title: Submission Oriaz
3
  emoji: 🔥
4
  colorFrom: yellow
5
  colorTo: green
 
7
  pinned: false
8
  ---
9
 
10
+ # Benchmarkusing different techniques
11
 
12
+ ## ML model for Climate Disinformation Classification
13
 
14
+ ### Model Description
15
 
 
16
 
17
+
18
+
19
+ #### Intended Use
20
 
21
  - **Primary intended uses**: Baseline comparison for climate disinformation classification models
22
  - **Primary intended users**: Researchers and developers participating in the Frugal AI Challenge
23
  - **Out-of-scope use cases**: Not intended for production use or real-world classification tasks
24
 
25
+ ### Training Data
26
 
27
  The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
28
  - Size: ~6000 examples
29
  - Split: 80% train, 20% test
30
  - 8 categories of climate disinformation claims
31
 
32
+ #### Labels
33
  0. No relevant claim detected
34
  1. Global warming is not happening
35
  2. Not caused by humans
 
39
  6. Proponents are biased
40
  7. Fossil fuels are needed
41
 
42
+ ### Performance
43
+
44
+ #### Metrics (I used NVIDIA T4 small GPU)
45
+ - **Accuracy**: ~69-72%
46
+ - **Environmental Impact**:
47
+ - Emissions tracked in gCO2eq (~0,7g)
48
+ - Energy consumption tracked in Wh (~1,8wh)
49
+
50
+ #### Model Architecture
51
+ ML models prefers numeric values so we need to embed our quotes. I used *MTEB Leaderboard* on HuggingFace to find the model with the best trade-off between performance and the number of parameters.
52
+
53
+ I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.
54
+
55
+ Once the quote are embedded, I have 6091 values x 1024 features. After that, train-test split (70%, 30%).
56
+
57
+ Using TPOT Classifier, I found that the best model on my data was a Logistic Regressor.
58
 
59
+ Then here is the Confusion Matrix :
 
 
 
 
60
 
61
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66169e1ce557753f30eab31b/tfAcfFu3Cnc9XJ00ixrWB.png)
 
62
 
63
+ ### Environmental Impact
64
 
65
  Environmental impact is tracked using CodeCarbon, measuring:
66
  - Carbon emissions during inference
 
68
 
69
  This tracking helps establish a baseline for the environmental impact of model deployment and inference.
70
 
71
+ ### Limitations
72
+ - Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
73
+ - Hard to go over 70% accuracy with "simple" ML.
74
+ - Textual data have some interpretations limitations that little models can't find.
 
 
75
 
76
+ ### Ethical Considerations
77
 
78
  - Dataset contains sensitive topics related to climate disinformation
 
79
  - Environmental impact is tracked to promote awareness of AI's carbon footprint
80
  ```