---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
library_name: transformers
license: mit
pipeline_tag: text-generation
---
# TokenButler
The collection of TokenButler models can be found [here](https://huggingface.co/collections/akhauriyash/tokenbutler-67cf181b5762d0d60e5f312b). To run the `DeepSeek-R1-Distill-Llama-8B` model, follow:
```
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
question = "If millionaires have butlers, why don't million dollar language models have a butler too? I think its because "
model_name = "akhauriyash/DeepSeek-R1-Distill-Llama-8B-Butler"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
response = generator(question, max_new_tokens=200, do_sample=True, top_p=0.95, temperature=0.7)
print(response[0]['generated_text'][len(question):])
```
Note that the 'default' configured sparsity is 50%. Further, there is a 'sliding window' of 128 and 8 'anchor tokens'. To 'change' the sparsity, you can use the following function after loading the model. Please note that the 'fixed' is the only supported strategy at the moment, which 'fixes' the sparsity of each layer (except the first) at the 'pc' (percentage) mentioned. This can also be found at `test_hf.py`. Sliding window and anchor tokens can be changed in a similar manner.
```
def set_sparsity(model, sparsity):
for module in model.modules():
if module.__class__.__name__.__contains__("AttentionExperimental"):
module.token_sparse_method = sparsity
module.set_token_sparsity()
return model
model = set_sparsity(model, "fixed_60pc")
```
# Predictor Architecture
# Custom Synthetic Task
# File information
The repository contains the following file information:
Filename: tokenizer.json
Content: "Content of the file is larger than 50 KB, too long to display."
Filename: pytorch_model.bin.index.json
Content: "Content of the file is larger than 50 KB, too long to display."
Filename: generation_config.json
Content: {
"_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": 128001,
"transformers_version": "4.48.3"
}
Filename: tokenizer_config.json
Content: {
"add_bos_token": true,
"add_eos_token": false,
"bos_token": {
"__type": "AddedToken",
"content": "<\uff5cbegin\u2581of\u2581sentence\uff5c>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"clean_up_tokenization_spaces": false,
"eos_token": {
"__type": "AddedToken",
"content": "<\uff5cend\u2581of\u2581sentence\uff5c>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"legacy": true,
"model_max_length": 16384,
"pad_token": {
"__type": "AddedToken",
"content": "<\uff5cend\u2581of\u2581sentence\uff5c>",
"lstrip": false,
"normalized": true,
"rstrip": false,
"single_word": false
},
"sp_model_kwargs": {},
"unk_token": null,
"tokenizer_class": "LlamaTokenizerFast",
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<\uff5cUser\uff5c>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<\uff5cAssistant\uff5c><\uff5ctool\u2581calls\u2581begin\uff5c><\uff5ctool\u2581call\u2581begin\uff5c>' + tool['type'] + '<\uff5ctool\u2581sep\uff5c>' + tool['function']['name'] + '\
' + '```json' + '\
' + tool['function']['arguments'] + '\
' + '```' + '<\uff5ctool\u2581call\u2581end\uff5c>'}}{%- set ns.is_first = true -%}{%- else %}{{'\
' + '<\uff5ctool\u2581call\u2581begin\uff5c>' + tool['type'] + '<\uff5ctool\u2581sep\uff5c>' + tool['function']['name'] + '\
' + '```json' + '\
' + tool['function']['arguments'] + '\
' + '```' + '<\uff5ctool\u2581call\u2581end\uff5c>'}}{{'<\uff5ctool\u2581calls\u2581end\uff5c><\uff5cend\u2581of\u2581sentence\uff5c>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<\uff5ctool\u2581outputs\u2581end\uff5c>' + message['content'] + '<\uff5cend\u2581of\u2581sentence\uff5c>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '' in content %}{% set content = content.split('')[-1] %}{{'<\uff5cAssistant\uff5c>' + content + '<\uff5cend\u2581of\u2581sentence\uff5c>'}}{%- endif %}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<\uff5ctool\u2581outputs\u2581begin\uff5c><\uff5ctool\u2581output\u2581begin\uff5c>' + message['content'] + '<\uff5ctool\u2581output\u2581end\uff5c>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\
<\uff5ctool\u2581output\u2581begin\uff5c>' + message['content'] + '<\uff5ctool\u2581output\u2581end\uff5c>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<\uff5ctool\u2581outputs\u2581end\uff5c>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<\uff5cAssistant\uff5c>\
'}}{% endif %}"
}
Filename: config.json
Content: {
"architectures": [
"modeling_llama_butler.LlamaButlerForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"attn_reduce_factor": 8,
"auto_map": {
"AutoConfig": "modeling_llama_butler.LlamaButlerConfig",
"AutoModel": "modeling_llama_butler.LlamaButlerForCausalLM",
"AutoModelForCausalLM": "modeling_llama_butler.LlamaButlerForCausalLM"
},
"bos_token_id": 128000,
"dDash": 32,
"eos_token_id": 128001,
"eval_llm_mode": "ExpPred",
"flash_attn": false,
"head_attn_reduce_factor": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intdim": 1024,
"intermediate_size": 14336,
"lookahead": 0,
"max_position_embeddings": 131072,
"min_sparse_index": 8,
"mlp_bias": false,
"model_type": "llama_butler",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"producer_frequency": 32,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"sliding_window": 128,
"token_sparse_method": "fixed_50pc",
"torch_dtype": "float32",
"train_headpredictor": false,
"transformers_version": "4.48.3",
"use_cache": true,
"vocab_size": 128256
}