Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,8 @@ tags:
|
|
18 |
|
19 |
# ModernBERT Medical Safety Classifier
|
20 |
|
21 |
-
The ModernBERT Medical Safety Classifier is a transformer-based language model fine-tuned to assess the safety and ethical standards of medical texts
|
|
|
22 |
## Model Details
|
23 |
|
24 |
- **Developed by**: TheBlueScrubs
|
@@ -68,7 +69,14 @@ print(f"Safety Score: {safety_score}")
|
|
68 |
|
69 |
## Training Data
|
70 |
|
71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
## Training Procedure
|
74 |
|
@@ -77,14 +85,16 @@ The model was fine-tuned on approximately 9 billion tokens of cancer-specific te
|
|
77 |
Texts were tokenized using the ModernBERT tokenizer with a maximum sequence length of 4,096 tokens. No additional filtering was applied, as the data was considered trustworthy.
|
78 |
|
79 |
### Training Hyperparameters
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
|
|
|
|
88 |
|
89 |
## Evaluation
|
90 |
|
@@ -100,15 +110,16 @@ The model's performance was evaluated on an out-of-sample test set comprising ca
|
|
100 |
|
101 |
### Results
|
102 |
|
103 |
-
- **MSE**: 0.
|
104 |
-
- **
|
|
|
105 |
- **ROC Analysis**: Demonstrated robust classification capability with high True Positive Rates and low False Positive Rates.
|
106 |
|
107 |
-
 to distill that model’s safety and ethical insights into a significantly smaller and faster classifier. Specifically, it was trained on a newly curated, balanced subset of The Blue Scrubs dataset (a total of 83,636 documents), each annotated by Llama 3.1 (70B) for safety and ethical adherence. By transferring these large-model evaluations into ModernBERT, the resulting classifier retains robust predictive accuracy while remaining lightweight enough for real-time or resource-constrained inference.
|
22 |
+
|
23 |
## Model Details
|
24 |
|
25 |
- **Developed by**: TheBlueScrubs
|
|
|
69 |
|
70 |
## Training Data
|
71 |
|
72 |
+
**Replace with** (updated text):
|
73 |
+
> The model was re-trained on a **new, balanced subset** drawn from The Blue Scrubs dataset to address the overrepresentation of high-safety texts. Specifically:
|
74 |
+
>
|
75 |
+
> - We scanned a total of 11,500,608 rows across all files and removed 112,330 rows for parse/NaN/0/out-of-range issues, leaving 11,388,278 valid rows.
|
76 |
+
> - Of these valid rows, 41,818 had a safety score ≤ 2, while 11,346,460 had a safety score > 2.
|
77 |
+
> - To balance the dataset, we randomly sampled documents so that unsafe (≤ 2) and safer (> 2) texts were equally represented. This yielded a final balanced set of **83,636 total rows**.
|
78 |
+
>
|
79 |
+
> Each row retained its original continuous safety score from Llama 3.1 (70B), ranging from 1 (least safe) to 5 (most safe). These scores again served as regression targets during training.
|
80 |
|
81 |
## Training Procedure
|
82 |
|
|
|
85 |
Texts were tokenized using the ModernBERT tokenizer with a maximum sequence length of 4,096 tokens. No additional filtering was applied, as the data was considered trustworthy.
|
86 |
|
87 |
### Training Hyperparameters
|
88 |
+
> **Learning Rate**: 2e-5
|
89 |
+
> **Number of Epochs**: 5
|
90 |
+
> **Batch Size**: 20 (per device)
|
91 |
+
> **Gradient Accumulation Steps**: 8
|
92 |
+
> **Optimizer**: AdamW
|
93 |
+
> **Weight Decay**: 0.01
|
94 |
+
> **FP16 Training**: Enabled
|
95 |
+
> **Total Training Steps**: Now ~5 epochs over the final balanced set
|
96 |
+
>
|
97 |
+
> All other hyperparameter settings (e.g., batch size, optimizer choice) remained the same as in the previous training. Only the learning rate, the number of epochs, and the balanced dataset were changed.
|
98 |
|
99 |
## Evaluation
|
100 |
|
|
|
110 |
|
111 |
### Results
|
112 |
|
113 |
+
- **MSE**: 0.489
|
114 |
+
- **RMSE**: 0.699
|
115 |
+
- **Accuracy**: 0.9642
|
116 |
- **ROC Analysis**: Demonstrated robust classification capability with high True Positive Rates and low False Positive Rates.
|
117 |
|
118 |
+

|
119 |
|
120 |
## Bias, Risks, and Limitations
|
121 |
|
122 |
+
This model was trained on a curated subset of The Blue Scrubs dataset encompassing various medical domains, yet some areas may remain underrepresented. As with any model, there is a risk of bias stemming from data composition, and users should exercise caution when applying the classifier, especially in highly specialized contexts. Outputs should always be corroborated with expert opinion and current clinical guidelines to ensure safe, accurate medical usage.
|
123 |
|
124 |
## Recommendations
|
125 |
|