magicsquares137 commited on
Commit
124d7af
·
verified ·
1 Parent(s): a0bfc67

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -48,7 +48,7 @@ This approach to transfer learning not only showcases the ability to downscale t
48
  We initially attempted a full fine-tune using DeepSpeed on a 4-GPU A100 instance. However, the combination of dataset size and the scale of the model caused significant overfitting, leading to degraded narrative quality. This highlighted the need for a lighter, more targeted adaptation method.
49
 
50
  ### Transition to LoRA Fine-Tuning
51
- To address overfitting, we implemented LoRA fine-tuning (rank 8, DeepSpeed), targeting specific model components (`q_proj`, `k_proj`, `v_proj`, `o_proj`). This method allowed us to retain the base model's linguistic knowledge while specializing it for storytelling. The fine-tuning process lasted **12–18 hours on a 4-GPU A100 8GB instance**, effectively balancing performance and computational efficiency.
52
 
53
  ---
54
 
 
48
  We initially attempted a full fine-tune using DeepSpeed on a 4-GPU A100 instance. However, the combination of dataset size and the scale of the model caused significant overfitting, leading to degraded narrative quality. This highlighted the need for a lighter, more targeted adaptation method.
49
 
50
  ### Transition to LoRA Fine-Tuning
51
+ To address overfitting, we implemented LoRA fine-tuning (rank 8, DeepSpeed), targeting specific model components (`q_proj`, `k_proj`, `v_proj`, `o_proj`). This method allowed us to retain the base model's linguistic knowledge while specializing it for storytelling. The fine-tuning process lasted **12–18 hours on a 4-GPU A100 80GB instance** via RunPod, effectively balancing performance and computational efficiency.
52
 
53
  ---
54