--- license: other language: - en pipeline_tag: text-generation inference: false tags: - transformers - gguf - imatrix - Llama-3.2-1B-Instruct --- Quantizations of https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct ### Inference Clients/UIs * [llama.cpp](https://github.com/ggerganov/llama.cpp) * [KoboldCPP](https://github.com/LostRuins/koboldcpp) * [text-generation-webui](https://github.com/oobabooga/text-generation-webui) * [ollama](https://github.com/ollama/ollama) --- # From original readme The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks. **Model Developer:** Meta **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. ## How to use This repository contains two versions of Llama-3.2-1B-Instruct, for use with `transformers` and with the original `llama` codebase. ### Use with transformers Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function. Make sure to update your transformers installation via `pip install --upgrade transformers`. ```python import torch from transformers import pipeline model_id = "meta-llama/Llama-3.2-1B-Instruct" pipe = pipeline( "text-generation", model=model_id, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipe( messages, max_new_tokens=256, ) print(outputs[0]["generated_text"][-1]) ``` Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes) ### Use with `llama` Please, follow the instructions in the [repository](https://github.com/meta-llama/llama) To download Original checkpoints, see the example command below leveraging `huggingface-cli`: ``` huggingface-cli download meta-llama/Llama-3.2-1B-Instruct --include "original/*" --local-dir Llama-3.2-1B-Instruct ```