--- license: mit language: - en library_name: transformers pipeline_tag: text-generation --- ## Model Information This is a vocab pruned variant of [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct). The vocabulary size is pruned from 128256 to 32256. The total parameter size is: 1,039,214,756, ~200M parameters pruned from origin. ## How to use This is a code example: ```python import torch from transformers import pipeline pipe = pipeline( "text-generation", model='k-l-lambda/Llama-3.2-1B-vocab32k', torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipe( messages, max_new_tokens=256, ) print(outputs[0]["generated_text"][-1]) ``` Another: ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k") model = AutoModelForCausalLM.from_pretrained("k-l-lambda/Llama-3.2-1B-vocab32k") input_ids = tokenizer.encode("Hello, ", return_tensors="pt") output = model.generate(input_ids) print(tokenizer.decode(output[0])) ``` ## Tokens conversion You can map an ID value in 32k vocab to the ID value in original 128k vocab, by the tensor in `token_indices.pt` and `inv_token_indices.pt`. ```python import torch from huggingface_hub import hf_hub_download from transformers import AutoTokenizer tokenizer128k = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct') tokenizer32k = AutoTokenizer.from_pretrained('k-l-lambda/Llama-3.2-1B-vocab32k') indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='token_indices.pt') inv_indices_path = hf_hub_download(repo_id='k-l-lambda/Llama-3.2-1B-vocab32k', filename='inv_token_indices.pt') token_indices = torch.load(indices_path) inv_token_indices = torch.load(inv_indices_path) ids_32k = tokenizer32k.encode('This is an example sentence.') ids_128k = [token_indices[id].item() for id in ids_32k] print(f'{ids_32k=}') print(f'{ids_128k=}') print(tokenizer128k.decode(ids_128k)) ids_128k = tokenizer128k.encode('This is another example sentence.') ids_32k = [inv_token_indices[id].item() for id in ids_128k] print(f'{ids_128k=}') print(f'{ids_32k=}') # non-exist tokens in 32k vocab will map to -1 print(tokenizer32k.decode(ids_32k)) ```