Model Information

This is a vocab pruned variant of Llama-3.2-1B-Instruct. The vocabulary size is pruned from 128256 to 32256. The total parameter size is: 1,039,214,756, ~200M parameters pruned from origin.

How to use

This is a code example:

import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model='k-l-lambda/Llama-3.2-1B-vocab32k',
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Another:

from transformers import AutoModelForCausalLM, AutoTokenizer


tokenizer = AutoTokenizer.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")
model = AutoModelForCausalLM.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k")

input_ids = tokenizer.encode("Hello, ", return_tensors="pt")
output = model.generate(input_ids)
print(tokenizer.decode(output[0]))

Tokens conversion

You can map an ID value in 32k vocab to the ID value in original 128k vocab, by the tensor in token_indices.pt and inv_token_indices.pt.

import torch
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer


tokenizer128k = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')
tokenizer32k = AutoTokenizer.from_pretrained('k-l-lambda/Llama-3.2-1B-vocab32k')

indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='token_indices.pt')
inv_indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='inv_token_indices.pt')
token_indices = torch.load(indices_path)
inv_token_indices = torch.load(inv_indices_path)

ids_32k = tokenizer32k.encode('This is an example sentence.')
ids_128k = [token_indices[id].item() for id in ids_32k]
print(f'{ids_32k=}')
print(f'{ids_128k=}')

print(tokenizer128k.decode(ids_128k))


ids_128k = tokenizer128k.encode('This is another example sentence.')
ids_32k = [inv_token_indices[id].item() for id in ids_128k]
print(f'{ids_128k=}')
print(f'{ids_32k=}')	# non-exist tokens in 32k vocab will map to -1

print(tokenizer32k.decode(ids_32k))
Downloads last month
108
Safetensors
Model size
1.04B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for k-l-lambda/Llama-3.2-1B-vocab32k

Quantizations
1 model