Update README.md
Browse files
README.md
CHANGED
@@ -1,199 +1,187 @@
|
|
|
|
1 |
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
#
|
7 |
-
|
8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
## Model Details
|
13 |
-
|
14 |
-
### Model Description
|
15 |
-
|
16 |
-
<!-- Provide a longer summary of what this model is. -->
|
17 |
-
|
18 |
-
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
19 |
-
|
20 |
-
- **Developed by:** [More Information Needed]
|
21 |
-
- **Funded by [optional]:** [More Information Needed]
|
22 |
-
- **Shared by [optional]:** [More Information Needed]
|
23 |
-
- **Model type:** [More Information Needed]
|
24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
-
|
28 |
-
### Model Sources [optional]
|
29 |
-
|
30 |
-
<!-- Provide the basic links for the model. -->
|
31 |
-
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
-
|
36 |
-
## Uses
|
37 |
-
|
38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
-
|
40 |
-
### Direct Use
|
41 |
-
|
42 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
-
|
44 |
-
[More Information Needed]
|
45 |
-
|
46 |
-
### Downstream Use [optional]
|
47 |
-
|
48 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
|
50 |
-
|
51 |
|
52 |
-
|
53 |
|
54 |
-
|
|
|
|
|
|
|
|
|
55 |
|
56 |
-
|
57 |
-
|
58 |
-
## Bias, Risks, and Limitations
|
59 |
-
|
60 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
-
|
64 |
-
### Recommendations
|
65 |
-
|
66 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
|
68 |
-
|
69 |
|
70 |
-
|
|
|
71 |
|
72 |
-
|
|
|
73 |
|
74 |
-
|
75 |
|
76 |
## Training Details
|
77 |
|
78 |
-
### Training
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
102 |
|
103 |
-
|
104 |
-
|
105 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
-
|
107 |
-
### Testing Data, Factors & Metrics
|
108 |
-
|
109 |
-
#### Testing Data
|
110 |
-
|
111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
-
|
121 |
-
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
-
|
127 |
-
### Results
|
128 |
-
|
129 |
-
[More Information Needed]
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
-
|
159 |
-
### Compute Infrastructure
|
160 |
-
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
-
#### Software
|
168 |
-
|
169 |
-
[More Information Needed]
|
170 |
-
|
171 |
-
## Citation [optional]
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
|
175 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
176 |
|
177 |
-
|
178 |
|
179 |
-
|
180 |
|
181 |
-
|
|
|
|
|
182 |
|
183 |
-
|
|
|
|
|
184 |
|
185 |
-
|
|
|
|
|
186 |
|
187 |
-
[
|
|
|
188 |
|
189 |
-
|
190 |
|
191 |
-
|
|
|
|
|
|
|
192 |
|
193 |
-
|
194 |
|
195 |
-
|
|
|
196 |
|
197 |
-
|
198 |
|
199 |
-
|
|
|
1 |
+
|
2 |
---
|
3 |
+
tags:
|
4 |
+
- text-generation
|
5 |
+
- storytelling
|
6 |
+
- transformers
|
7 |
+
- DeepSeek
|
8 |
---
|
9 |
|
10 |
+
# Deepseek Uncensored Lore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
+
## Model Overview
|
13 |
|
14 |
+
Deepseek Uncensored Lore is a fine-tuned 7B LLaMA-based language model designed for immersive storytelling and character-driven narrative generation. The model leverages LoRA (Low-Rank Adaptation) fine-tuning techniques to specialize in generating rich, descriptive, and emotionally engaging stories from structured prompts.
|
15 |
|
16 |
+
- **Base Model**: [DeepSeek 7B](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)
|
17 |
+
- **Fine-Tuned Dataset**: [Character Stories](https://huggingface.co/datasets/luvGPT/CharacterStories)
|
18 |
+
- **Training Framework**: Hugging Face Transformers with LoRA and PEFT
|
19 |
+
- **Optimized for**: Text generation, storytelling, narrative creation
|
20 |
+
- **Primary Use Case**: Enhancing creative writing workflows and interactive storytelling experiences.
|
21 |
|
22 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
+
## Fine-Tuning Journey
|
25 |
|
26 |
+
### Initial Attempts with Full Fine-Tuning
|
27 |
+
We initially attempted a full fine-tune using DeepSpeed on a 4-GPU A100 instance. However, the combination of dataset size and the scale of the model caused significant overfitting, leading to degraded narrative quality. This highlighted the need for a lighter, more targeted adaptation method.
|
28 |
|
29 |
+
### Transition to LoRA Fine-Tuning
|
30 |
+
To address overfitting, we implemented LoRA fine-tuning (rank 8, DeepSpeed), targeting specific model components (`q_proj`, `k_proj`, `v_proj`, `o_proj`). This method allowed us to retain the base model's linguistic knowledge while specializing it for storytelling. The fine-tuning process lasted **12–18 hours on a 4-GPU A100 8GB instance**, effectively balancing performance and computational efficiency.
|
31 |
|
32 |
+
---
|
33 |
|
34 |
## Training Details
|
35 |
|
36 |
+
### Training Parameters
|
37 |
+
```python
|
38 |
+
training_args = TrainingArguments(
|
39 |
+
output_dir="./lora_finetuned_model",
|
40 |
+
per_device_train_batch_size=1,
|
41 |
+
gradient_accumulation_steps=6,
|
42 |
+
num_train_epochs=5,
|
43 |
+
learning_rate=5e-4,
|
44 |
+
optim="paged_adamw_32bit",
|
45 |
+
fp16=True,
|
46 |
+
evaluation_strategy="steps",
|
47 |
+
eval_steps=50,
|
48 |
+
logging_steps=10,
|
49 |
+
max_grad_norm=0.3,
|
50 |
+
save_steps=100,
|
51 |
+
save_total_limit=2,
|
52 |
+
warmup_ratio=0.03,
|
53 |
+
report_to="wandb",
|
54 |
+
deepspeed="./deepspeed_config.json",
|
55 |
+
)
|
56 |
+
```
|
57 |
+
|
58 |
+
Our DeepSpeed config followed:
|
59 |
+
```
|
60 |
+
{
|
61 |
+
"train_micro_batch_size_per_gpu": "auto",
|
62 |
+
"gradient_accumulation_steps": "auto",
|
63 |
+
"optimizer": {
|
64 |
+
"type": "AdamW",
|
65 |
+
"params": {
|
66 |
+
"lr": "auto",
|
67 |
+
"betas": "auto",
|
68 |
+
"eps": "auto",
|
69 |
+
"weight_decay": "auto"
|
70 |
+
}
|
71 |
+
},
|
72 |
+
"fp16": {
|
73 |
+
"enabled": true
|
74 |
+
},
|
75 |
+
"zero_optimization": {
|
76 |
+
"stage": 3,
|
77 |
+
"stage3_gather_16bit_weights_on_model_save": true,
|
78 |
+
"offload_optimizer": {
|
79 |
+
"device": "cpu",
|
80 |
+
"pin_memory": true
|
81 |
+
},
|
82 |
+
"offload_param": {
|
83 |
+
"device": "none"
|
84 |
+
},
|
85 |
+
"stage3_param_persistence_threshold": 0
|
86 |
+
},
|
87 |
+
"gradient_clipping": "auto",
|
88 |
+
"activation_checkpointing": {
|
89 |
+
"partition_activations": true,
|
90 |
+
"contiguous_memory_optimization": true,
|
91 |
+
"cpu_checkpointing": false,
|
92 |
+
"number_checkpoints": 100,
|
93 |
+
"synchronize_checkpoint_boundary": false
|
94 |
+
}
|
95 |
+
}
|
96 |
+
|
97 |
+
```
|
98 |
+
|
99 |
+
|
100 |
+
### LoRA Configuration
|
101 |
+
```python
|
102 |
+
lora_config = LoraConfig(
|
103 |
+
r=8, # Rank of LoRA adapters
|
104 |
+
lora_alpha=8, # Scaling factor for LoRA updates
|
105 |
+
lora_dropout=0.1, # Dropout probability
|
106 |
+
bias="none", # No extra bias parameters
|
107 |
+
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
|
108 |
+
)
|
109 |
+
```
|
110 |
+
|
111 |
+
### Hardware Setup
|
112 |
+
- **Instance**: 4 x A100 8GB GPUs
|
113 |
+
- **Training Duration**: 12–18 hours
|
114 |
+
- **Precision**: FP16 for model and LoRA training
|
115 |
|
116 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
117 |
|
118 |
+
## Performance
|
119 |
+
|
120 |
+
We evaluated the model at multiple checkpoints during training. Below is the evolution of its storytelling capabilities:
|
121 |
+
|
122 |
+
### Base Model Output
|
123 |
+
**Prompt**:
|
124 |
+
```
|
125 |
+
Description: Jake Sully, portrayed by Sam Worthington, is a former Marine who becomes part of the Avatar Program in James Cameron's "Avatar."
|
126 |
+
Scenario: Jake Sully is planning a mission to protect Pandora from an impending human attack.
|
127 |
+
```
|
128 |
+
**Generated Text**:
|
129 |
+
```
|
130 |
+
Story Arc: The story arc follows three main acts. In Act I (Set-up), we learn about Jake’s involvement in the Avatar program on Earth before being assigned to become one with an alien world inhabited by various species including blue catlike creatures called Navi...
|
131 |
+
```
|
132 |
+
(Base model produced generic, predictable outputs.)
|
133 |
+
|
134 |
+
### Checkpoint 100
|
135 |
+
**Generated Text**:
|
136 |
+
```
|
137 |
+
Story Arc: In the pulsating heart of Pandoran wilderness, beneath the starlit canopy, lay the lush sanctuary known as Hometree—a sprawling, verdant oasis hidden away like some ancient secret within its own private realm...
|
138 |
+
```
|
139 |
+
|
140 |
+
### Checkpoint 200
|
141 |
+
**Generated Text**:
|
142 |
+
```
|
143 |
+
Story Arc: In the pulsating heart of Pandora, beneath the shimmering canopy of blue-green leaves, whispers of an imminent assault echoed through the night air like distant drums. The scent of ozone mingled with the heady perfume of alien flora...
|
144 |
+
```
|
145 |
+
|
146 |
+
### Checkpoint 300 (Final Model)
|
147 |
+
**Generated Text**:
|
148 |
+
```
|
149 |
+
Story Arc: In the verdant heart of Pandora, under the watchful gaze of its towering trees and shimmering skies, tensions hung heavy like mist on a morning river. The air was thick with anticipation; whispers carried through the jungle...
|
150 |
+
```
|
151 |
+
|
152 |
+
The progression demonstrates a shift from factual summarization to vivid, immersive storytelling, showing the success of LoRA fine-tuning.
|
153 |
|
154 |
+
---
|
155 |
|
156 |
+
## Usage
|
157 |
|
158 |
+
### Quick Start
|
159 |
+
```python
|
160 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
161 |
|
162 |
+
model_name = "deepseek-ai/deepseek-uncensored-lore"
|
163 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
164 |
+
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
|
165 |
|
166 |
+
prompt = "Description: A daring explorer ventures into an ancient forest.\nScenario: She discovers a hidden temple and must unlock its secrets.\n\nStory Arc:"
|
167 |
+
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
|
168 |
+
outputs = model.generate(**inputs, max_length=500, temperature=0.7, top_p=0.95)
|
169 |
|
170 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
171 |
+
```
|
172 |
|
173 |
+
---
|
174 |
|
175 |
+
## Limitations
|
176 |
+
- **Bias**: Outputs may reflect biases present in the original LLaMA model or training dataset.
|
177 |
+
- **Context Length**: Limited to 1,000 tokens per sequence.
|
178 |
+
- **Specialization**: The model is optimized for storytelling and may underperform in other tasks.
|
179 |
|
180 |
+
---
|
181 |
|
182 |
+
## Acknowledgments
|
183 |
+
Special thanks to the Hugging Face community, LLaMA's development team, and the creators of the [Character Stories](https://huggingface.co/datasets/luvGPT/CharacterStories) dataset.
|
184 |
|
185 |
+
For questions or collaborations, feel free to contact us via the Hugging Face platform or through [our website](https://www.luv-gpt.com).
|
186 |
|
187 |
+
---
|