Update README.md

892aa8c verified about 1 month ago

5.01 kB

	---
	base_model: unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit
	library_name: peft
	license: mit
	datasets:
	- FreedomIntelligence/medical-o1-reasoning-SFT
	language:
	- en
	tags:
	- medical
	---

	# Model Card for DeepSeek-R1-Medical-COT

	## Model Details

	### Model Description

	DeepSeek-R1-Medical-COT is a fine-tuned version of the DeepSeek-R1 model, optimized for medical chain-of-thought (COT) reasoning. It is designed to assist in medical-related tasks such as question-answering, reasoning, and decision support. This model is particularly useful for applications requiring structured reasoning in the medical domain.

	- Developed by: Mohamed Mahmoud
	- Funded by [optional]: Independent project
	- Shared by: Mohamed Mahmoud
	- Model type: Transformer-based Large Language Model (LLM)
	- Language(s) (NLP): English (en)
	- License: MIT
	- Finetuned from model: unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit

	### Model Sources

	- Repository: [Hugging Face Model Repo](https://huggingface.co/thesnak/DeepSeek-R1-Medical-COT)

	- LinkedIn: [Mohamed Mahmoud](https://www.linkedin.com/in/mohamed-thesnak)

	## Uses

	### Direct Use

	The model can be used directly for medical reasoning tasks, including:

	- Answering medical questions
	- Assisting in medical decision-making
	- Supporting clinical research and literature review

	### Downstream Use

	- Fine-tuning for specialized medical applications
	- Integration into chatbots and virtual assistants for medical advice
	- Educational tools for medical students

	### Out-of-Scope Use

	- This model is not a replacement for professional medical advice.
	- Should not be used for clinical decision-making without expert validation.
	- May not perform well in languages other than English.

	## Bias, Risks, and Limitations

	While fine-tuned for medical reasoning, the model may still have biases due to the limitations of its training data. Users should exercise caution and validate critical outputs with medical professionals.

	### Recommendations

	Users should verify outputs, particularly in sensitive medical contexts. The model is best used as an assistive tool rather than a primary decision-making system.

	## How to Get Started with the Model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_name = "thesnak/DeepSeek-R1-Medical-COT"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

	input_text = "What are the symptoms of pneumonia?"
	inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Details

	### Training Data

	The model was fine-tuned using the FreedomIntelligence/medical-o1-reasoning-SFT dataset, which contains medical question-answer pairs designed to improve reasoning capabilities.

	### Training Procedure

	#### Preprocessing

	- Tokenization using LLaMA tokenizer
	- Text cleaning and normalization

	#### Training Hyperparameters

	- Precision: bf16 mixed precision
	- Optimizer: AdamW
	- Batch size: 16
	- Learning rate: 2e-5
	- Epochs: 3

	#### Speeds, Sizes, Times

	- Training time: Approximately 12 hours on a P100 GPU (Kaggle)
	- Model size: 8B parameters (bnb 4-bit quantized)

	#### Training Loss

	\| Step \| Training Loss \|
	\| ---- \| ------------- \|
	\| 10 \| 1.919000 \|
	\| 20 \| 1.461800 \|
	\| 30 \| 1.402500 \|
	\| 40 \| 1.309000 \|
	\| 50 \| 1.344400 \|
	\| 60 \| 1.314100 \|

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	- The model was evaluated on held-out samples from FreedomIntelligence/medical-o1-reasoning-SFT.

	#### Factors

	- Performance was assessed on medical reasoning tasks.

	#### Metrics

	- Perplexity: Measured for general coherence.
	- Accuracy: Evaluated based on expert-verified responses.
	- BLEU Score: Used to assess response relevance.

	### Results

	- Perplexity:
	- Accuracy:
	- BLEU Score:

	## Model Examination

	Further interpretability analyses can be conducted using tools like Captum and SHAP to analyze how the model derives its medical reasoning responses.

	## Environmental Impact

	- Hardware Type: P100 GPU (Kaggle)
	- Hours used: 2 hours
	- Cloud Provider: Kaggle
	- Compute Region: N/A
	- Carbon Emitted: Estimated at 9.5 kg CO2eq
	- [Kaggle Notebook](https://www.kaggle.com/code/thesnak/fine-tune-deepseek)
	## Technical Specifications

	### Compute Infrastructure

	#### Hardware

	- P100 GPU (16GB VRAM) on Kaggle


	## Citation

	BibTeX:

	```bibtex
	@misc{mahmoud2025deepseekmedcot,
	title={DeepSeek-R1-Medical-COT},
	author={Mohamed Mahmoud},
	year={2025},
	url={https://huggingface.co/thesnak/DeepSeek-R1-Medical-COT}
	}
	```

	## Model Card Authors

	- Mohamed Mahmoud

	## Model Card Contact

	- [LinkedIn](https://www.linkedin.com/in/mohamed-thesnak)