Update README.md
Browse files
README.md
CHANGED
@@ -48,7 +48,7 @@ This approach to transfer learning not only showcases the ability to downscale t
|
|
48 |
We initially attempted a full fine-tune using DeepSpeed on a 4-GPU A100 instance. However, the combination of dataset size and the scale of the model caused significant overfitting, leading to degraded narrative quality. This highlighted the need for a lighter, more targeted adaptation method.
|
49 |
|
50 |
### Transition to LoRA Fine-Tuning
|
51 |
-
To address overfitting, we implemented LoRA fine-tuning (rank 8, DeepSpeed), targeting specific model components (`q_proj`, `k_proj`, `v_proj`, `o_proj`). This method allowed us to retain the base model's linguistic knowledge while specializing it for storytelling. The fine-tuning process lasted **12–18 hours on a 4-GPU A100
|
52 |
|
53 |
---
|
54 |
|
|
|
48 |
We initially attempted a full fine-tune using DeepSpeed on a 4-GPU A100 instance. However, the combination of dataset size and the scale of the model caused significant overfitting, leading to degraded narrative quality. This highlighted the need for a lighter, more targeted adaptation method.
|
49 |
|
50 |
### Transition to LoRA Fine-Tuning
|
51 |
+
To address overfitting, we implemented LoRA fine-tuning (rank 8, DeepSpeed), targeting specific model components (`q_proj`, `k_proj`, `v_proj`, `o_proj`). This method allowed us to retain the base model's linguistic knowledge while specializing it for storytelling. The fine-tuning process lasted **12–18 hours on a 4-GPU A100 80GB instance** via RunPod, effectively balancing performance and computational efficiency.
|
52 |
|
53 |
---
|
54 |
|