Model Information
This is a vocab pruned variant of Llama-3.2-1B-Instruct. The vocabulary size is pruned from 128256 to 32256. The total parameter size is: 1,039,214,756, ~200M parameters pruned from origin.
How to use
This is a code example:
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model='k-l-lambda/Llama-3.2-1B-vocab32k',
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Another:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")
model = AutoModelForCausalLM.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")
input_ids = tokenizer.encode("Hello, ", return_tensors="pt")
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))
Tokens conversion
You can map an ID value in 32k vocab to the ID value in original 128k vocab, by the tensor in token_indices.pt
and inv_token_indices.pt
.
import torch
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer
tokenizer128k = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')
tokenizer32k = AutoTokenizer.from_pretrained('k-l-lambda/Llama-3.2-1B-vocab32k')
indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='token_indices.pt')
inv_indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='inv_token_indices.pt')
token_indices = torch.load(indices_path)
inv_token_indices = torch.load(inv_indices_path)
ids_32k = tokenizer32k.encode('This is an example sentence.')
ids_128k = [token_indices[id].item() for id in ids_32k]
print(f'{ids_32k=}')
print(f'{ids_128k=}')
print(tokenizer128k.decode(ids_128k))
ids_128k = tokenizer128k.encode('This is another example sentence.')
ids_32k = [inv_token_indices[id].item() for id in ids_128k]
print(f'{ids_128k=}')
print(f'{ids_32k=}') # non-exist tokens in 32k vocab will map to -1
print(tokenizer32k.decode(ids_32k))
- Downloads last month
- 108
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.