Mozilla
/

Meta-Llama-3.1-8B-Instruct-llamafile

@@ -26,15 +26,16 @@ history_template: |
   {{message}}<|eot_id|>
 ---
-# Meta Llama 3.1 8B Instruct - llamafile
 This is a large language model that was released by Meta on 2024-07-23.
-It was fine-tuned by Meta to follow your instructions. It's big enough
-to be capable of being put to serious use, and it's small enough to be
-capable of running on most personal computers.
 - Model creator: [Meta](https://huggingface.co/meta-llama/)
-- Original model: [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
 Mozilla has packaged the LLaMA model into executable weights that we
 call [llamafiles](https://github.com/Mozilla-Ocho/llamafile). This gives
@@ -44,15 +45,20 @@ FreeBSD, OpenBSD and NetBSD systems you control on both AMD64 and ARM64.
 ## Quickstart
 Running the following on a desktop OS will launch a tab in your web
-browser with a chatbot interface.
 ```
-wget https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-Instruct-llamafile/resolve/main/Meta-Llama-3.1-8B-Instruct.Q6_K.llamafile
-chmod +x Meta-Llama-3.1-8B-Instruct.Q6_K.llamafile
-./Meta-Llama-3.1-8B-Instruct.Q6_K.llamafile
 ```
-You then need to fill out the prompt / history template (see below).
 This model has a max context window size of 128k tokens. By default, a
 context window size of 512 tokens is used. You may increase this to the
@@ -73,25 +79,6 @@ Having **trouble?** See the ["Gotchas"
 section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas)
 of the README.
-## Prompting
-To have a good working chat experience when using the web GUI, you need
-to fill out the text fields with the following values.
-Prompt template:
-```
-<|begin_of_text|><|start_header_id|>system<|end_header_id|>
-{{prompt}}<|eot_id|>{{history}}<|start_header_id|>{{char}}<|end_header_id|>
-```
-History template:
-```
-<|start_header_id|>{{name}}<|end_header_id|>
-{{message}}<|eot_id|>
-```
 ## About llamafile
 llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
@@ -201,49 +188,35 @@ Where to send questions or comments about the model Instructions on how to provi
 ## How to use
-This repository contains two versions of Meta-Llama-3.1-8B-Instruct, for use with transformers and with the original `llama` codebase.
 ### Use with transformers
-Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
-Make sure to update your transformers installation via `pip install --upgrade transformers`.
 ```python
 import transformers
 import torch
-model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
 pipeline = transformers.pipeline(
-    "text-generation",
-    model=model_id,
-    model_kwargs={"torch_dtype": torch.bfloat16},
-    device_map="auto",
 )
-messages = [
-    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
-    {"role": "user", "content": "Who are you?"},
-]
-outputs = pipeline(
-    messages,
-    max_new_tokens=256,
-)
-print(outputs[0]["generated_text"][-1])
 ```
-Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes)
 ### Use with `llama`
-Please, follow the instructions in the [repository](https://github.com/meta-llama/llama)
 To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
 ```
-huggingface-cli download meta-llama/Meta-Llama-3.1-8B-Instruct --include "original/*" --local-dir Meta-Llama-3.1-8B-Instruct
 ```
 ## Hardware and Software
@@ -1123,4 +1096,4 @@ Finally, we put in place a set of resources including an [output reporting mecha
 The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress.
-But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our [Responsible Use Guide](https://llama.meta.com/responsible-use-guide), [Trust and Safety](https://llama.meta.com/trust-and-safety/) solutions, and other [resources](https://llama.meta.com/docs/get-started/) to learn more about responsible development.

   {{message}}<|eot_id|>
 ---
+# Meta Llama 3.1 8B - llamafile
 This is a large language model that was released by Meta on 2024-07-23.
+It's big enough to be capable of being put to serious use, and it's
+small enough to be capable of running on most personal computers. This
+repo contains the base model, which has not been fine-tuned to follow
+instructions.
 - Model creator: [Meta](https://huggingface.co/meta-llama/)
+- Original model: [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)
 Mozilla has packaged the LLaMA model into executable weights that we
 call [llamafiles](https://github.com/Mozilla-Ocho/llamafile). This gives
 ## Quickstart
 Running the following on a desktop OS will launch a tab in your web
+browser.
 ```
+wget https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
+chmod +x Meta-Llama-3.1-8B.Q6_K.llamafile
+./Meta-Llama-3.1-8B.Q6_K.llamafile
 ```
+You can then use the completion mode of the GUI to experiment with this
+model. You can prompt the model for completions on the command line too:
+```
+./Meta-Llama-3.1-8B.Q6_K.llamafile -p 'four score and seven' --log-disable
+```
 This model has a max context window size of 128k tokens. By default, a
 context window size of 512 tokens is used. You may increase this to the
 section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas)
 of the README.
 ## About llamafile
 llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
 ## How to use
+This repository contains two versions of Meta-Llama-3.1-8B, for use with transformers and with the original `llama` codebase.
 ### Use with transformers
+Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
+Make sure to update your transformers installation via pip install --upgrade transformers.
 ```python
 import transformers
 import torch
+model_id = "meta-llama/Meta-Llama-3.1-8B"
 pipeline = transformers.pipeline(
+    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
 )
+pipeline("Hey how are you doing today?")
 ```
 ### Use with `llama`
+Please, follow the instructions in the [repository](https://github.com/meta-llama/llama).
 To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
 ```
+huggingface-cli download meta-llama/Meta-Llama-3.1-8B --include "original/*" --local-dir Meta-Llama-3.1-8B
 ```
 ## Hardware and Software
 The core values of Llama 3.1 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.1 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress.
+But Llama 3.1 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.1’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.1 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our [Responsible Use Guide](https://llama.meta.com/responsible-use-guide), [Trust and Safety](https://llama.meta.com/trust-and-safety/) solutions, and other [resources](https://llama.meta.com/docs/get-started/) to learn more about responsible development.