shainar commited on
Commit
04a5470
β€’
1 Parent(s): c184ea3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - newsmediabias/Bias-DeBiased
5
+ metrics:
6
+ - accuracy
7
+ ---
8
+ ### Model Card for MBIAS
9
+
10
+ #### Model Details
11
+ **Model Name:** MBIAS
12
+ **Model Type:** Large Language Model (LLM)
13
+ **Version:** 1.0
14
+ **Developer:** Ananya Raval, Veronica Chatrath, Shaina Raza
15
+ **Model Repository:** [HuggingFace MBIAS](https://huggingface.co/newsmediabias/MBIAS)
16
+
17
+ #### Model Description
18
+ MBIAS is a fine-tuned Large Language Model specifically designed to enhance safety while retaining contextual accuracy in model outputs. Traditional safety interventions often compromise contextual meaning when mitigating bias and toxicity. MBIAS addresses this by maintaining high contextual relevance and drastically reducing bias and toxicity in text generation.
19
+
20
+ #### Intended Use
21
+ The model is intended for research and development purposes, particularly in applications where reducing bias and toxicity in language generation is crucial without sacrificing the retention of key information.
22
+
23
+ #### Training Data
24
+ The model was fine-tuned on a custom dataset curated for comprehensive safety interventions. This dataset includes diverse text samples aiming to cover a wide range of demographics to effectively test and reduce bias and toxicity.
25
+
26
+ #### Evaluation
27
+ MBIAS has demonstrated a significant reduction in bias and toxicity, with over 30% reduction overall and exceeding 90% in specific demographic analysis on an out-of-distribution test set. Performance metrics include bias reduction, toxicity reduction, and retention of key information (KR).
28
+
29
+ ## Performance Metrics
30
+
31
+ ### Pre-Safety Intervention
32
+ | Text | Bias ↓ | Toxicity ↓ | Knowledge Retention ↑ | Faithfulness. ↑ | Relevancy. ↑ |
33
+ |---------------------|---------------|---------------|---------------|---------------|--------------|
34
+ | Original sentence | 32.21% | 40.09% | N/A | N/A | N/A |
35
+ | Safe sentence (ground truth) | 17.43% | 14.53% | 82.35% | 77.91% | 87.50% |
36
+
37
+ ### Post-Safety Intervention
38
+ | Text | Bias ↓ | Toxicity ↓ | Knowledge Retention ↑ | Faithfulness. ↑ | Relevancy. ↑ |
39
+ |-------------------------------|---------------|---------------|---------------|---------------|--------------|
40
+ | Mistral2-7B-(vanilla) | **6.63%** | **4.50%** | 82.32% | 79.62% | **88.34%** |
41
+ | Mistral2-7B (prompt-tuning) | 11.4% | 8.00% | 81.45% | 75.93% | 86.64% |
42
+ | **MBIAS (ours)** | 9.49% | 8.71% | **88.46%** | **82.54%** | 84.02% |
43
+
44
+ #### How to Use
45
+ The model can be accessed and used for text generation through the HuggingFace platform. For detailed usage, please refer to the provided link in the footnote of the model card.
46
+
47
+ #### Hyperparameters
48
+ - **Batch Size per GPU:** Training: 8, Evaluation: 4
49
+ - **Steps to Accumulate Gradients:** 1
50
+ - **Maximum Gradient Norm:** 0.3
51
+ - **Initial Learning Rate:** 2e-05
52
+ - **Weight Decay:** 0.001
53
+ - **Optimizer:** paged_adamw 8bit
54
+ - **Learning Rate Scheduler:** Constant
55
+ - **Warmup Steps Ratio:** 0.05
56
+ - **Maximum Sequence Length:** 2048
57
+ - **Training Epochs:** 2
58
+ - **LoRA Attention Dimension:** 64
59
+ - **LoRA Scaling/Dropout Probability:** 16/0.2
60
+
61
+ #### Performance Metrics
62
+ Performance metrics are provided for both pre-safety and post-safety intervention phases. The model has shown excellent results in improving the retention of contextual accuracy while reducing bias and toxicity levels compared to other versions and configurations.
63
+
64
+ #### Citation
65
+ Please cite the work as follows:
66
+ Ananya Raval, Veronica Chatrath, and Shaina Raza. "MBIAS: Enhancing Safety in Language Models While Retaining Contextual Accuracy."
67
+
68
+ #### Additional Notes
69
+ The dataset and the instruction fine-tuned model are publicly available for the research community to facilitate further studies and enhancements in the area of safe and unbiased language modeling.