File size: 12,975 Bytes

---
tags:
- text-generation
- storytelling
- transformers
- DeepSeek
---

# Deepseek Uncensored Lore
![ ](./library.png)

## Model Overview

Deepseek Uncensored Lore is a fine-tuned 7B DeepSeek-based language model designed for immersive storytelling and character-driven narrative generation. The model leverages LoRA (Low-Rank Adaptation) fine-tuning techniques to specialize in generating rich, descriptive, and emotionally engaging stories from structured prompts.

- **Base Model**: [DeepSeek 7B](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)
- **Fine-Tuned Dataset**: [Character Stories](https://huggingface.co/datasets/luvGPT/CharacterStories)
- **Training Framework**: Hugging Face Transformers with LoRA and PEFT
- **Optimized for**: Text generation, storytelling, narrative creation
- **Primary Use Case**: Enhancing creative writing workflows and interactive storytelling experiences.

---

## Transfer Learning and Model Goals

One of the primary goals of **Deepseek Uncensored Lore** was to demonstrate the power of **transfer learning**: leveraging the knowledge encoded in much larger models (400B+ parameters) to enhance the capabilities of a smaller, more efficient 7B model. This approach was driven by a focus on creating a lightweight, highly performant model that retains the storytelling proficiency of much larger LLMs while being computationally accessible.

### Curated Dataset from Large LLM Ensembles
To achieve this, we developed a custom dataset by leveraging an **ensemble of very large LLMs** (400B+ parameter models) for generating high-quality story arcs and narrative content. These models were selected for their advanced storytelling abilities and fine-grained control over tone, pacing, and emotional depth. 

### Role of the Judge Model
A critical component of our pipeline was a **judge model**, tasked with curating and filtering outputs from the ensemble of large LLMs. By selecting only the most coherent, engaging, and contextually relevant content, we created a dataset that distilled the storytelling expertise of these larger models.

### Transferring Storytelling Capability
Through this process, we were able to impart the narrative richness of the ensemble into **Deepseek Uncensored Lore**, ensuring:
- **Enhanced Creativity**: The model can craft vivid, immersive story arcs.
- **Consistency**: Outputs remain coherent and aligned with the provided prompt.
- **Efficiency**: The fine-tuned 7B model operates on far less computational power, making it suitable for real-time applications.

This approach to transfer learning not only showcases the ability to downscale the capabilities of massive LLMs into smaller models but also highlights the importance of dataset quality and curation in achieving this goal.

---

## Fine-Tuning Journey

### Initial Attempts with Full Fine-Tuning
We initially attempted a full fine-tune using DeepSpeed on a 4-GPU A100 instance. However, the combination of dataset size and the scale of the model caused significant overfitting, leading to degraded narrative quality. This highlighted the need for a lighter, more targeted adaptation method.

### Transition to LoRA Fine-Tuning
To address overfitting, we implemented LoRA fine-tuning (rank 8, DeepSpeed), targeting specific model components (`q_proj`, `k_proj`, `v_proj`, `o_proj`). This method allowed us to retain the base model's linguistic knowledge while specializing it for storytelling. The fine-tuning process lasted **12–18 hours on a 4-GPU A100 80GB instance** via RunPod, effectively balancing performance and computational efficiency.

---

## Training Details

### Training Progress

We used [Weights & Biases (W&B)](https://wandb.ai/) for tracking training metrics such as loss and evaluation performance. Below is the training loss curve, illustrating the model's progression over time:

![Training Loss](./chart.svg)

### Training Parameters
```python
training_args = TrainingArguments(
    output_dir="./lora_finetuned_model",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=6,
    num_train_epochs=5,
    learning_rate=5e-4,
    optim="paged_adamw_32bit",
    fp16=True,
    evaluation_strategy="steps",
    eval_steps=50,
    logging_steps=10,
    max_grad_norm=0.3,
    save_steps=100,
    save_total_limit=2,
    warmup_ratio=0.03,
    report_to="wandb",
    deepspeed="./deepspeed_config.json",
)
```

Our DeepSpeed config followed:
```
{
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "betas": "auto",
      "eps": "auto",
      "weight_decay": "auto"
    }
  },
  "fp16": {
    "enabled": true
  },
  "zero_optimization": {
    "stage": 3,
    "stage3_gather_16bit_weights_on_model_save": true,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "offload_param": {
      "device": "none"
    },
    "stage3_param_persistence_threshold": 0
  },
  "gradient_clipping": "auto",
  "activation_checkpointing": {
    "partition_activations": true,
    "contiguous_memory_optimization": true,
    "cpu_checkpointing": false,
    "number_checkpoints": 100,
    "synchronize_checkpoint_boundary": false
  }
}

```


### LoRA Configuration
```python
lora_config = LoraConfig(
    r=8,                      # Rank of LoRA adapters
    lora_alpha=8,             # Scaling factor for LoRA updates
    lora_dropout=0.1,         # Dropout probability
    bias="none",              # No extra bias parameters
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
)
```

### Hardware Setup
- **Instance**: 4 x A100 8GB GPUs
- **Training Duration**: 12–18 hours
- **Precision**: FP16 for model and LoRA training

---

### **Performance Over Checkpoints**

We evaluated the model at multiple checkpoints during training. Below is the evolution of its storytelling capabilities:

#### **Base Model Output (Pre-Fine-Tuning)**
**Prompt**:
```
Description: Jake Sully, portrayed by Sam Worthington, is a former Marine who becomes part of the Avatar Program in James Cameron's "Avatar."
Scenario: Jake Sully is planning a mission to protect Pandora from an impending human attack.
```
**Generated Story Arc**:
```
The story arc follows three main acts. In Act I (Set-up), we learn about Jake’s involvement in the Avatar program on Earth before being assigned to become one with an alien world inhabited by various species including blue catlike creatures called Navi...
```
_(Base model produced generic, predictable outputs.)_

---

#### **Checkpoint 100**
**Generated Story Arc**:
```
In the pulsating heart of Pandoran wilderness, beneath the starlit canopy, lay the lush sanctuary known as Hometree—a sprawling, verdant oasis hidden away like some ancient secret within its own private realm...
```

---

#### **Checkpoint 200**
**Generated Story Arc**:
```
In the pulsating heart of Pandora, beneath the shimmering canopy of blue-green leaves, whispers of an imminent assault echoed through the night air like distant drums. The scent of ozone mingled with the heady perfume of alien flora...
```

---

#### **Checkpoint 300**
**Generated Story Arc**:
```
In the verdant heart of Pandora, under the watchful gaze of its towering trees and shimmering skies, tensions hung heavy like mist on a morning river. The air was thick with anticipation; whispers carried through the jungle...
```

---

#### **Checkpoint 400**
**Generated Story Arc**:
```
The azure sky above Pandora pulsed vibrantly as it reflected off the bioluminescent foliage below, creating a symphony of colors both mesmerizing and ominous. A group of diverse individuals huddled together on one of the few remaining clearings amidst this vast jungle—some humans, some Na’vi, but all united under the common goal of protecting what remained sacred here. 

Amidst them stood Captain Jake Sully; once a proud member of Earth's military forces now transformed into the avian-like figure known only as...The Avatarian! His cybernetic eyes scanned over each person present before focusing back onto himself - remembering every moment since joining this cause against humanity's greedy expansionism across space & time itself...
```

---

### **Conclusion**
The progression from the **base model** to **Checkpoint 400** demonstrates a **remarkable shift**:
- From **factual summaries** → to **descriptive storytelling**.
- From **generic outputs** → to **rich world-building and immersion**.
- From **basic narrative structures** → to **vivid, emotional storytelling**.

This result highlights the success of LoRA fine-tuning in **adapting the storytelling capabilities of larger models** into a more efficient 7B model.

---

## Usage

### Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch

# Load the merged model and tokenizer
model_name = "luvGPT/deepseek-uncensored-lore" 
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Define the test prompt
prompt = """Description: Jake Sully, portrayed by Sam Worthington, is a former Marine who becomes part of the Avatar Program in James Cameron's "Avatar." 
He is sent to the moon Pandora, where he inhabits an avatar body to interact with the native Na'vi people. 
Jake falls in love with the Na'vi culture and Neytiri, and ultimately leads a fight to protect Pandora from human exploitation.
Scenario: Jake Sully is planning a mission to protect Pandora from an impending human attack.
He needs to coordinate with the Na'vi and his human allies to devise a strategy that will safeguard their home.
Story Arc:"""

# Configure generation settings
generation_config = GenerationConfig(
    temperature=0.7,
    top_p=0.95,
    top_k=50,
    do_sample=True,
    no_repeat_ngram_size=4,
    repetition_penalty=1.2,
)

# Tokenize the input
inputs = tokenizer(prompt, return_tensors="pt", truncation=True).to("cuda")

# Generate text with the model
outputs = model.generate(
    **inputs,
    generation_config=generation_config,
    max_new_tokens=150,
    eos_token_id=tokenizer.eos_token_id
)

# Decode and print the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Generated Story Arc:\n")
print(generated_text)

```

---

### **System Requirements**


| Precision  | **Total VRAM Usage** | **VRAM Per GPU (with 2 GPUs)** | **VRAM Per GPU (with 4 GPUs)** |
|------------|----------------------|-------------------------------|-------------------------------|
| **FP32 (Full Precision)** | ~24GB | ~12GB | ~6GB |
| **FP16 (Half Precision)** | **~14GB** | **~7GB** | **~3.5GB** |
| **8-bit Quantization** | ~8GB | ~4GB | ~2GB |
| **4-bit Quantization** | ~4GB | ~2GB | ~1GB |

**Important Notes:**
- **Multi-GPU setups** distribute model memory usage across available GPUs.
- Using **`device_map="auto"`** in `transformers` automatically balances memory across devices.
- **Quantized versions (8-bit, 4-bit)** are planned for lower VRAM requirements.

---

### **Loading the Model in 4-bit and 8-bit Quantization**
To reduce memory usage, you can load the model using **4-bit or 8-bit quantization** via **bitsandbytes**.

#### **Install Required Dependencies**
```bash
pip install transformers accelerate bitsandbytes
```

#### **Load Model in 8-bit Quantization**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "luvGPT/deepseek-uncensored-lore"

# Define quantization config for 8-bit loading
quantization_config = BitsAndBytesConfig(load_in_8bit=True)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load model in 8-bit mode
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    quantization_config=quantization_config
)

```

---

### **Future Work**
- **GGUF Format Support**: We plan to provide a **GGUF-quantized version** of this model, making it compatible with **llama.cpp** and other lightweight inference frameworks.
- **Fine-tuning & Alignment**: Exploring reinforcement learning and user feedback loops to improve storytelling accuracy and coherence.
- **Optimized Inference**: Integrating FlashAttention and Triton optimizations for even faster performance.



## Limitations
- **Bias**: Outputs may reflect biases present in the original DeepSeek model or training dataset.
- **Context Length**: Limited to 1,000 tokens per sequence.
- **Specialization**: The model is optimized for storytelling and may underperform in other tasks.

---

## Acknowledgments
Special thanks to the Hugging Face community, and the creators of the [Character Stories](https://huggingface.co/datasets/luvGPT/CharacterStories) dataset (us <3).

For questions or collaborations, feel free to contact us via the Hugging Face platform or through [our website](https://www.luv-gpt.com).

---