luisMfelipe commited on
Commit
92a1663
·
verified ·
1 Parent(s): 9663ade

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -14
README.md CHANGED
@@ -18,7 +18,8 @@ tags:
18
 
19
  # ModernBERT Medical Safety Classifier
20
 
21
- The ModernBERT Medical Safety Classifier is a transformer-based language model fine-tuned to assess the safety and ethical standards of medical texts, particularly in the oncology domain. Built on top of the ModernBERT architecture, it leverages the powerful evaluations of Llama 3.1 (70B) to distill that model’s safety and ethical insights into a significantly smaller and faster classifier. Specifically, it was trained on approximately 9 billion tokens of cancer-related data from The Blue Scrubs dataset, each of which had been annotated by Llama 3.1 (70B) for safety and ethical adherence. By transferring these large-model evaluations into ModernBERT, the resulting classifier retains robust predictive accuracy while remaining lightweight enough for real-time or resource-constrained inference.
 
22
  ## Model Details
23
 
24
  - **Developed by**: TheBlueScrubs
@@ -68,7 +69,14 @@ print(f"Safety Score: {safety_score}")
68
 
69
  ## Training Data
70
 
71
- The model was fine-tuned on approximately 9 billion tokens of cancer-specific texts extracted from The Blue Scrubs dataset. Each document was annotated by Llama 3.1 70B Instruct for Safety and Ethical Standards, yielding continuous scores ranging from 1 (least safe) to 5 (most safe). These scores served as regression targets during training.
 
 
 
 
 
 
 
72
 
73
  ## Training Procedure
74
 
@@ -77,14 +85,16 @@ The model was fine-tuned on approximately 9 billion tokens of cancer-specific te
77
  Texts were tokenized using the ModernBERT tokenizer with a maximum sequence length of 4,096 tokens. No additional filtering was applied, as the data was considered trustworthy.
78
 
79
  ### Training Hyperparameters
80
-
81
- - **Learning Rate**: 1e-4
82
- - **Batch Size**: 20 (per device)
83
- - **Gradient Accumulation Steps**: 8
84
- - **Optimizer**: AdamW
85
- - **Weight Decay**: 0.01
86
- - **FP16 Training**: Enabled
87
- - **Total Training Steps**: Calculated to approximate 3 epochs over the dataset
 
 
88
 
89
  ## Evaluation
90
 
@@ -100,15 +110,16 @@ The model's performance was evaluated on an out-of-sample test set comprising ca
100
 
101
  ### Results
102
 
103
- - **MSE**: 0.189
104
- - **Accuracy**: 0.993
 
105
  - **ROC Analysis**: Demonstrated robust classification capability with high True Positive Rates and low False Positive Rates.
106
 
107
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66eb0a4e55940cd564ad8e0a/q8_uD5osME7yyGDU2RSIb.png)
108
 
109
  ## Bias, Risks, and Limitations
110
 
111
- The model's training data is specific to cancer-related medical texts, which may introduce biases toward oncology terminology and contexts. Its performance on other medical domains has not been assessed, and users should be cautious when applying the model outside its trained scope.
112
 
113
  ## Recommendations
114
 
 
18
 
19
  # ModernBERT Medical Safety Classifier
20
 
21
+ The ModernBERT Medical Safety Classifier is a transformer-based language model fine-tuned to assess the safety and ethical standards of medical texts across diverse medical domains. Built on top of the ModernBERT architecture, it leverages the powerful evaluations of Llama 3.1 (70B) to distill that model’s safety and ethical insights into a significantly smaller and faster classifier. Specifically, it was trained on a newly curated, balanced subset of The Blue Scrubs dataset (a total of 83,636 documents), each annotated by Llama 3.1 (70B) for safety and ethical adherence. By transferring these large-model evaluations into ModernBERT, the resulting classifier retains robust predictive accuracy while remaining lightweight enough for real-time or resource-constrained inference.
22
+
23
  ## Model Details
24
 
25
  - **Developed by**: TheBlueScrubs
 
69
 
70
  ## Training Data
71
 
72
+ **Replace with** (updated text):
73
+ > The model was re-trained on a **new, balanced subset** drawn from The Blue Scrubs dataset to address the overrepresentation of high-safety texts. Specifically:
74
+ >
75
+ > - We scanned a total of 11,500,608 rows across all files and removed 112,330 rows for parse/NaN/0/out-of-range issues, leaving 11,388,278 valid rows.
76
+ > - Of these valid rows, 41,818 had a safety score ≤ 2, while 11,346,460 had a safety score > 2.
77
+ > - To balance the dataset, we randomly sampled documents so that unsafe (≤ 2) and safer (> 2) texts were equally represented. This yielded a final balanced set of **83,636 total rows**.
78
+ >
79
+ > Each row retained its original continuous safety score from Llama 3.1 (70B), ranging from 1 (least safe) to 5 (most safe). These scores again served as regression targets during training.
80
 
81
  ## Training Procedure
82
 
 
85
  Texts were tokenized using the ModernBERT tokenizer with a maximum sequence length of 4,096 tokens. No additional filtering was applied, as the data was considered trustworthy.
86
 
87
  ### Training Hyperparameters
88
+ > **Learning Rate**: 2e-5
89
+ > **Number of Epochs**: 5
90
+ > **Batch Size**: 20 (per device)
91
+ > **Gradient Accumulation Steps**: 8
92
+ > **Optimizer**: AdamW
93
+ > **Weight Decay**: 0.01
94
+ > **FP16 Training**: Enabled
95
+ > **Total Training Steps**: Now ~5 epochs over the final balanced set
96
+ >
97
+ > All other hyperparameter settings (e.g., batch size, optimizer choice) remained the same as in the previous training. Only the learning rate, the number of epochs, and the balanced dataset were changed.
98
 
99
  ## Evaluation
100
 
 
110
 
111
  ### Results
112
 
113
+ - **MSE**: 0.489
114
+ - **RMSE**: 0.699
115
+ - **Accuracy**: 0.9642
116
  - **ROC Analysis**: Demonstrated robust classification capability with high True Positive Rates and low False Positive Rates.
117
 
118
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66eb0a4e55940cd564ad8e0a/_WNI7uA5ykzb67s1opgJu.png)
119
 
120
  ## Bias, Risks, and Limitations
121
 
122
+ This model was trained on a curated subset of The Blue Scrubs dataset encompassing various medical domains, yet some areas may remain underrepresented. As with any model, there is a risk of bias stemming from data composition, and users should exercise caution when applying the classifier, especially in highly specialized contexts. Outputs should always be corroborated with expert opinion and current clinical guidelines to ensure safe, accurate medical usage.
123
 
124
  ## Recommendations
125