thesnak's picture
Update README.md
892aa8c verified
---
base_model: unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit
library_name: peft
license: mit
datasets:
- FreedomIntelligence/medical-o1-reasoning-SFT
language:
- en
tags:
- medical
---
# Model Card for DeepSeek-R1-Medical-COT
## Model Details
### Model Description
DeepSeek-R1-Medical-COT is a fine-tuned version of the DeepSeek-R1 model, optimized for medical chain-of-thought (COT) reasoning. It is designed to assist in medical-related tasks such as question-answering, reasoning, and decision support. This model is particularly useful for applications requiring structured reasoning in the medical domain.
- **Developed by:** Mohamed Mahmoud
- **Funded by [optional]:** Independent project
- **Shared by:** Mohamed Mahmoud
- **Model type:** Transformer-based Large Language Model (LLM)
- **Language(s) (NLP):** English (en)
- **License:** MIT
- **Finetuned from model:** unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit
### Model Sources
- **Repository:** [Hugging Face Model Repo](https://huggingface.co/thesnak/DeepSeek-R1-Medical-COT)
- **LinkedIn:** [Mohamed Mahmoud](https://www.linkedin.com/in/mohamed-thesnak)
## Uses
### Direct Use
The model can be used directly for medical reasoning tasks, including:
- Answering medical questions
- Assisting in medical decision-making
- Supporting clinical research and literature review
### Downstream Use
- Fine-tuning for specialized medical applications
- Integration into chatbots and virtual assistants for medical advice
- Educational tools for medical students
### Out-of-Scope Use
- This model is not a replacement for professional medical advice.
- Should not be used for clinical decision-making without expert validation.
- May not perform well in languages other than English.
## Bias, Risks, and Limitations
While fine-tuned for medical reasoning, the model may still have biases due to the limitations of its training data. Users should exercise caution and validate critical outputs with medical professionals.
### Recommendations
Users should verify outputs, particularly in sensitive medical contexts. The model is best used as an assistive tool rather than a primary decision-making system.
## How to Get Started with the Model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "thesnak/DeepSeek-R1-Medical-COT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
input_text = "What are the symptoms of pneumonia?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
### Training Data
The model was fine-tuned using the **FreedomIntelligence/medical-o1-reasoning-SFT** dataset, which contains medical question-answer pairs designed to improve reasoning capabilities.
### Training Procedure
#### Preprocessing
- Tokenization using LLaMA tokenizer
- Text cleaning and normalization
#### Training Hyperparameters
- **Precision:** bf16 mixed precision
- **Optimizer:** AdamW
- **Batch size:** 16
- **Learning rate:** 2e-5
- **Epochs:** 3
#### Speeds, Sizes, Times
- **Training time:** Approximately 12 hours on a P100 GPU (Kaggle)
- **Model size:** 8B parameters (bnb 4-bit quantized)
#### Training Loss
| Step | Training Loss |
| ---- | ------------- |
| 10 | 1.919000 |
| 20 | 1.461800 |
| 30 | 1.402500 |
| 40 | 1.309000 |
| 50 | 1.344400 |
| 60 | 1.314100 |
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
- The model was evaluated on held-out samples from **FreedomIntelligence/medical-o1-reasoning-SFT**.
#### Factors
- Performance was assessed on medical reasoning tasks.
#### Metrics
- **Perplexity:** Measured for general coherence.
- **Accuracy:** Evaluated based on expert-verified responses.
- **BLEU Score:** Used to assess response relevance.
### Results
- **Perplexity:**
- **Accuracy:**
- **BLEU Score:**
## Model Examination
Further interpretability analyses can be conducted using tools like Captum and SHAP to analyze how the model derives its medical reasoning responses.
## Environmental Impact
- **Hardware Type:** P100 GPU (Kaggle)
- **Hours used:** 2 hours
- **Cloud Provider:** Kaggle
- **Compute Region:** N/A
- **Carbon Emitted:** Estimated at 9.5 kg CO2eq
- **[Kaggle Notebook](https://www.kaggle.com/code/thesnak/fine-tune-deepseek)**
## Technical Specifications
### Compute Infrastructure
#### Hardware
- P100 GPU (16GB VRAM) on Kaggle
## Citation
**BibTeX:**
```bibtex
@misc{mahmoud2025deepseekmedcot,
title={DeepSeek-R1-Medical-COT},
author={Mohamed Mahmoud},
year={2025},
url={https://huggingface.co/thesnak/DeepSeek-R1-Medical-COT}
}
```
## Model Card Authors
- Mohamed Mahmoud
## Model Card Contact
- [LinkedIn](https://www.linkedin.com/in/mohamed-thesnak)