File size: 12,975 Bytes
0d341b0
c786962
994e20f
 
 
 
0d341b0
 
c786962
5038f04
b55fd78
c786962
0d341b0
a0bfc67
0d341b0
c786962
 
 
 
 
0d341b0
c786962
0d341b0
2745861
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c786962
0d341b0
c786962
 
0d341b0
c786962
124d7af
0d341b0
c786962
0d341b0
 
 
742a8c4
 
 
 
1510aa1
742a8c4
c786962
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d341b0
c786962
0d341b0
5dc3e5f
c786962
 
 
5dc3e5f
c786962
 
 
 
 
5dc3e5f
c786962
5dc3e5f
c786962
5dc3e5f
c786962
5dc3e5f
 
 
 
c786962
5dc3e5f
c786962
 
5dc3e5f
 
 
 
c786962
5dc3e5f
c786962
 
5dc3e5f
 
 
 
c786962
5dc3e5f
c786962
 
5dc3e5f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d341b0
c786962
0d341b0
c786962
0d341b0
c786962
 
f11da4c
 
0d341b0
f11da4c
1e6d92d
c786962
 
0d341b0
f11da4c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d341b0
c786962
0d341b0
c786962
0d341b0
061db0f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c786962
d30fe50
c786962
 
0d341b0
c786962
0d341b0
c786962
d30fe50
0d341b0
c786962
0d341b0
c786962
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
---
tags:
- text-generation
- storytelling
- transformers
- DeepSeek
---

# Deepseek Uncensored Lore
![ ](./library.png)

## Model Overview

Deepseek Uncensored Lore is a fine-tuned 7B DeepSeek-based language model designed for immersive storytelling and character-driven narrative generation. The model leverages LoRA (Low-Rank Adaptation) fine-tuning techniques to specialize in generating rich, descriptive, and emotionally engaging stories from structured prompts.

- **Base Model**: [DeepSeek 7B](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)
- **Fine-Tuned Dataset**: [Character Stories](https://huggingface.co/datasets/luvGPT/CharacterStories)
- **Training Framework**: Hugging Face Transformers with LoRA and PEFT
- **Optimized for**: Text generation, storytelling, narrative creation
- **Primary Use Case**: Enhancing creative writing workflows and interactive storytelling experiences.

---

## Transfer Learning and Model Goals

One of the primary goals of **Deepseek Uncensored Lore** was to demonstrate the power of **transfer learning**: leveraging the knowledge encoded in much larger models (400B+ parameters) to enhance the capabilities of a smaller, more efficient 7B model. This approach was driven by a focus on creating a lightweight, highly performant model that retains the storytelling proficiency of much larger LLMs while being computationally accessible.

### Curated Dataset from Large LLM Ensembles
To achieve this, we developed a custom dataset by leveraging an **ensemble of very large LLMs** (400B+ parameter models) for generating high-quality story arcs and narrative content. These models were selected for their advanced storytelling abilities and fine-grained control over tone, pacing, and emotional depth. 

### Role of the Judge Model
A critical component of our pipeline was a **judge model**, tasked with curating and filtering outputs from the ensemble of large LLMs. By selecting only the most coherent, engaging, and contextually relevant content, we created a dataset that distilled the storytelling expertise of these larger models.

### Transferring Storytelling Capability
Through this process, we were able to impart the narrative richness of the ensemble into **Deepseek Uncensored Lore**, ensuring:
- **Enhanced Creativity**: The model can craft vivid, immersive story arcs.
- **Consistency**: Outputs remain coherent and aligned with the provided prompt.
- **Efficiency**: The fine-tuned 7B model operates on far less computational power, making it suitable for real-time applications.

This approach to transfer learning not only showcases the ability to downscale the capabilities of massive LLMs into smaller models but also highlights the importance of dataset quality and curation in achieving this goal.

---

## Fine-Tuning Journey

### Initial Attempts with Full Fine-Tuning
We initially attempted a full fine-tune using DeepSpeed on a 4-GPU A100 instance. However, the combination of dataset size and the scale of the model caused significant overfitting, leading to degraded narrative quality. This highlighted the need for a lighter, more targeted adaptation method.

### Transition to LoRA Fine-Tuning
To address overfitting, we implemented LoRA fine-tuning (rank 8, DeepSpeed), targeting specific model components (`q_proj`, `k_proj`, `v_proj`, `o_proj`). This method allowed us to retain the base model's linguistic knowledge while specializing it for storytelling. The fine-tuning process lasted **12–18 hours on a 4-GPU A100 80GB instance** via RunPod, effectively balancing performance and computational efficiency.

---

## Training Details

### Training Progress

We used [Weights & Biases (W&B)](https://wandb.ai/) for tracking training metrics such as loss and evaluation performance. Below is the training loss curve, illustrating the model's progression over time:

![Training Loss](./chart.svg)

### Training Parameters
```python
training_args = TrainingArguments(
    output_dir="./lora_finetuned_model",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=6,
    num_train_epochs=5,
    learning_rate=5e-4,
    optim="paged_adamw_32bit",
    fp16=True,
    evaluation_strategy="steps",
    eval_steps=50,
    logging_steps=10,
    max_grad_norm=0.3,
    save_steps=100,
    save_total_limit=2,
    warmup_ratio=0.03,
    report_to="wandb",
    deepspeed="./deepspeed_config.json",
)
```

Our DeepSpeed config followed:
```
{
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "betas": "auto",
      "eps": "auto",
      "weight_decay": "auto"
    }
  },
  "fp16": {
    "enabled": true
  },
  "zero_optimization": {
    "stage": 3,
    "stage3_gather_16bit_weights_on_model_save": true,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "offload_param": {
      "device": "none"
    },
    "stage3_param_persistence_threshold": 0
  },
  "gradient_clipping": "auto",
  "activation_checkpointing": {
    "partition_activations": true,
    "contiguous_memory_optimization": true,
    "cpu_checkpointing": false,
    "number_checkpoints": 100,
    "synchronize_checkpoint_boundary": false
  }
}

```


### LoRA Configuration
```python
lora_config = LoraConfig(
    r=8,                      # Rank of LoRA adapters
    lora_alpha=8,             # Scaling factor for LoRA updates
    lora_dropout=0.1,         # Dropout probability
    bias="none",              # No extra bias parameters
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)
```

### Hardware Setup
- **Instance**: 4 x A100 8GB GPUs
- **Training Duration**: 12–18 hours
- **Precision**: FP16 for model and LoRA training

---

### **Performance Over Checkpoints**

We evaluated the model at multiple checkpoints during training. Below is the evolution of its storytelling capabilities:

#### **Base Model Output (Pre-Fine-Tuning)**
**Prompt**:
```
Description: Jake Sully, portrayed by Sam Worthington, is a former Marine who becomes part of the Avatar Program in James Cameron's "Avatar."
Scenario: Jake Sully is planning a mission to protect Pandora from an impending human attack.
```
**Generated Story Arc**:
```
The story arc follows three main acts. In Act I (Set-up), we learn about Jake’s involvement in the Avatar program on Earth before being assigned to become one with an alien world inhabited by various species including blue catlike creatures called Navi...
```
_(Base model produced generic, predictable outputs.)_

---

#### **Checkpoint 100**
**Generated Story Arc**:
```
In the pulsating heart of Pandoran wilderness, beneath the starlit canopy, lay the lush sanctuary known as Hometree—a sprawling, verdant oasis hidden away like some ancient secret within its own private realm...
```

---

#### **Checkpoint 200**
**Generated Story Arc**:
```
In the pulsating heart of Pandora, beneath the shimmering canopy of blue-green leaves, whispers of an imminent assault echoed through the night air like distant drums. The scent of ozone mingled with the heady perfume of alien flora...
```

---

#### **Checkpoint 300**
**Generated Story Arc**:
```
In the verdant heart of Pandora, under the watchful gaze of its towering trees and shimmering skies, tensions hung heavy like mist on a morning river. The air was thick with anticipation; whispers carried through the jungle...
```

---

#### **Checkpoint 400**
**Generated Story Arc**:
```
The azure sky above Pandora pulsed vibrantly as it reflected off the bioluminescent foliage below, creating a symphony of colors both mesmerizing and ominous. A group of diverse individuals huddled together on one of the few remaining clearings amidst this vast jungle—some humans, some Na’vi, but all united under the common goal of protecting what remained sacred here. 

Amidst them stood Captain Jake Sully; once a proud member of Earth's military forces now transformed into the avian-like figure known only as...The Avatarian! His cybernetic eyes scanned over each person present before focusing back onto himself - remembering every moment since joining this cause against humanity's greedy expansionism across space & time itself...
```

---

### **Conclusion**
The progression from the **base model** to **Checkpoint 400** demonstrates a **remarkable shift**:
- From **factual summaries** → to **descriptive storytelling**.
- From **generic outputs** → to **rich world-building and immersion**.
- From **basic narrative structures** → to **vivid, emotional storytelling**.

This result highlights the success of LoRA fine-tuning in **adapting the storytelling capabilities of larger models** into a more efficient 7B model.

---

## Usage

### Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

# Load the merged model and tokenizer
model_name = "luvGPT/deepseek-uncensored-lore" 
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Define the test prompt
prompt = """Description: Jake Sully, portrayed by Sam Worthington, is a former Marine who becomes part of the Avatar Program in James Cameron's "Avatar." 
He is sent to the moon Pandora, where he inhabits an avatar body to interact with the native Na'vi people. 
Jake falls in love with the Na'vi culture and Neytiri, and ultimately leads a fight to protect Pandora from human exploitation.
Scenario: Jake Sully is planning a mission to protect Pandora from an impending human attack.
He needs to coordinate with the Na'vi and his human allies to devise a strategy that will safeguard their home.
Story Arc:"""

# Configure generation settings
generation_config = GenerationConfig(
    temperature=0.7,
    top_p=0.95,
    top_k=50,
    do_sample=True,
    no_repeat_ngram_size=4,
    repetition_penalty=1.2,
)

# Tokenize the input
inputs = tokenizer(prompt, return_tensors="pt", truncation=True).to("cuda")

# Generate text with the model
outputs = model.generate(
    **inputs,
    generation_config=generation_config,
    max_new_tokens=150,
    eos_token_id=tokenizer.eos_token_id
)

# Decode and print the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Story Arc:\n")
print(generated_text)

```

---

### **System Requirements**


| Precision  | **Total VRAM Usage** | **VRAM Per GPU (with 2 GPUs)** | **VRAM Per GPU (with 4 GPUs)** |
|------------|----------------------|-------------------------------|-------------------------------|
| **FP32 (Full Precision)** | ~24GB | ~12GB | ~6GB |
| **FP16 (Half Precision)** | **~14GB** | **~7GB** | **~3.5GB** |
| **8-bit Quantization** | ~8GB | ~4GB | ~2GB |
| **4-bit Quantization** | ~4GB | ~2GB | ~1GB |

**Important Notes:**
- **Multi-GPU setups** distribute model memory usage across available GPUs.
- Using **`device_map="auto"`** in `transformers` automatically balances memory across devices.
- **Quantized versions (8-bit, 4-bit)** are planned for lower VRAM requirements.

---

### **Loading the Model in 4-bit and 8-bit Quantization**
To reduce memory usage, you can load the model using **4-bit or 8-bit quantization** via **bitsandbytes**.

#### **Install Required Dependencies**
```bash
pip install transformers accelerate bitsandbytes
```

#### **Load Model in 8-bit Quantization**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "luvGPT/deepseek-uncensored-lore"

# Define quantization config for 8-bit loading
quantization_config = BitsAndBytesConfig(load_in_8bit=True)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model in 8-bit mode
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    quantization_config=quantization_config
)

```

---

### **Future Work**
- **GGUF Format Support**: We plan to provide a **GGUF-quantized version** of this model, making it compatible with **llama.cpp** and other lightweight inference frameworks.
- **Fine-tuning & Alignment**: Exploring reinforcement learning and user feedback loops to improve storytelling accuracy and coherence.
- **Optimized Inference**: Integrating FlashAttention and Triton optimizations for even faster performance.



## Limitations
- **Bias**: Outputs may reflect biases present in the original DeepSeek model or training dataset.
- **Context Length**: Limited to 1,000 tokens per sequence.
- **Specialization**: The model is optimized for storytelling and may underperform in other tasks.

---

## Acknowledgments
Special thanks to the Hugging Face community, and the creators of the [Character Stories](https://huggingface.co/datasets/luvGPT/CharacterStories) dataset (us <3).

For questions or collaborations, feel free to contact us via the Hugging Face platform or through [our website](https://www.luv-gpt.com).

---