weights

Browse files

Files changed (16) hide show

.gitattributes +2 -0
README.md +446 -0
config.json +27 -0
examples/example_fsdp.py +62 -0
examples/example_sft_qlora.py +146 -0
examples/notebook_sft_peft.ipynb +729 -0
generation_config.json +7 -0
model-00001-of-00004.safetensors +3 -0
model-00002-of-00004.safetensors +3 -0
model-00003-of-00004.safetensors +3 -0
model-00004-of-00004.safetensors +3 -0
model.safetensors.index.json +261 -0
special_tokens_map.json +46 -0
tokenizer.json +3 -0
tokenizer.model +3 -0
tokenizer_config.json +70 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+gemma-7b-it.gguf filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,446 @@

+---
+library_name: transformers
+tags: []
+extra_gated_heading: "Access Gemma on Hugging Face"
+extra_gated_prompt: "To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged-in to Hugging Face and click below. Requests are processed immediately."
+extra_gated_button_content: "Acknowledge license"
+license: other
+license_name: gemma-terms-of-use
+license_link: https://ai.google.dev/gemma/terms
+---
+# Gemma Model Card
+**Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
+This model card corresponds to the 7B base version of the Gemma model. You can also visit the model card of the [2B base model](https://huggingface.co/google/gemma-2b), [7B instruct model](https://huggingface.co/google/gemma-7b-it), and [2B instruct model](https://huggingface.co/google/gemma-2b-it).
+**Resources and Technical Documentation**:
+* [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
+* [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma)
+* [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?version=gemma-7b-gg-hf)
+**Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
+**Authors**: Google
+## Model Information
+Summary description and brief definition of inputs and outputs.
+### Description
+Gemma is a family of lightweight, state-of-the-art open models from Google,
+built from the same research and technology used to create the Gemini models.
+They are text-to-text, decoder-only large language models, available in English,
+with open weights, pre-trained variants, and instruction-tuned variants. Gemma
+models are well-suited for a variety of text generation tasks, including
+question answering, summarization, and reasoning. Their relatively small size
+makes it possible to deploy them in environments with limited resources such as
+a laptop, desktop or your own cloud infrastructure, democratizing access to
+state of the art AI models and helping foster innovation for everyone.
+### Usage
+Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
+#### Fine-tuning examples
+You can find fine-tuning notebooks under the [`examples/` directory](https://huggingface.co/google/gemma-7b/tree/main/examples). We provide:
+* A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using [QLoRA](https://huggingface.co/papers/2305.14314)
+* A script to perform SFT using FSDP on TPU devices
+* A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset
+#### Running the model on a CPU
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")
+model = AutoModelForCausalLM.from_pretrained("google/gemma-7b")
+input_text = "Write me a poem about Machine Learning."
+input_ids = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(**input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+#### Running the model on a single / multi GPU
+```python
+# pip install accelerate
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")
+model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto")
+input_text = "Write me a poem about Machine Learning."
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+#### Running the model on a GPU using different precisions
+* _Using `torch.float16`_
+```python
+# pip install accelerate
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")
+model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto", torch_dtype=torch.float16)
+input_text = "Write me a poem about Machine Learning."
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+* _Using `torch.bfloat16`_
+```python
+# pip install accelerate
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")
+model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", device_map="auto", torch_dtype=torch.bfloat16)
+input_text = "Write me a poem about Machine Learning."
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+#### Quantized Versions through `bitsandbytes`
+* _Using 8-bit precision (int8)_
+```python
+# pip install bitsandbytes accelerate
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+quantization_config = BitsAndBytesConfig(load_in_8bit=True)
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")
+model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", quantization_config=quantization_config)
+input_text = "Write me a poem about Machine Learning."
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+* _Using 4-bit precision_
+```python
+# pip install bitsandbytes accelerate
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+quantization_config = BitsAndBytesConfig(load_in_4bit=True)
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b")
+model = AutoModelForCausalLM.from_pretrained("google/gemma-7b", quantization_config=quantization_config)
+input_text = "Write me a poem about Machine Learning."
+input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+#### Other optimizations
+* _Flash Attention 2_
+First make sure to install `flash-attn` in your environment `pip install flash-attn`
+```diff
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
++   attn_implementation="flash_attention_2"
+).to(0)
+```
+### Inputs and outputs
+*   **Input:** Text string, such as a question, a prompt, or a document to be
+    summarized.
+*   **Output:** Generated English-language text in response to the input, such
+    as an answer to a question, or a summary of a document.
+## Model Data
+Data used for model training and how the data was processed.
+### Training Dataset
+These models were trained on a dataset of text data that includes a wide variety
+of sources, totaling 6 trillion tokens. Here are the key components:
+* Web Documents: A diverse collection of web text ensures the model is exposed
+  to a broad range of linguistic styles, topics, and vocabulary. Primarily
+  English-language content.
+* Code: Exposing the model to code helps it to learn the syntax and patterns of
+  programming languages, which improves its ability to generate code or
+  understand code-related questions.
+* Mathematics: Training on mathematical text helps the model learn logical
+  reasoning, symbolic representation, and to address mathematical queries.
+The combination of these diverse data sources is crucial for training a powerful
+language model that can handle a wide variety of different tasks and text
+formats.
+### Data Preprocessing
+Here are the key data cleaning and filtering methods applied to the training
+data:
+* CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was
+  applied at multiple stages in the data preparation process to ensure the
+  exclusion of harmful and illegal content
+* Sensitive Data Filtering: As part of making Gemma pre-trained models safe and
+  reliable, automated techniques were used to filter out certain personal
+  information and other sensitive data from training sets.
+* Additional methods: Filtering based on content quality and safely in line with
+  [our policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11).
+## Implementation Information
+Details about the model internals.
+### Hardware
+Gemma was trained using the latest generation of
+[Tensor Processing Unit (TPU)](https://cloud.google.com/tpu/docs/intro-to-tpu) hardware (TPUv5e).
+Training large language models requires significant computational power. TPUs,
+designed specifically for matrix operations common in machine learning, offer
+several advantages in this domain:
+* Performance: TPUs are specifically designed to handle the massive computations
+  involved in training LLMs. They can speed up training considerably compared to
+  CPUs.
+* Memory: TPUs often come with large amounts of high-bandwidth memory, allowing
+  for the handling of large models and batch sizes during training. This can
+  lead to better model quality.
+* Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for
+  handling the growing complexity of large foundation models. You can distribute
+  training across multiple TPU devices for faster and more efficient processing.
+* Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective
+  solution for training large models compared to CPU-based infrastructure,
+  especially when considering the time and resources saved due to faster
+  training.
+* These advantages are aligned with
+  [Google's commitments to operate sustainably](https://sustainability.google/operating-sustainably/).
+### Software
+Training was done using [JAX](https://github.com/google/jax) and [ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture).
+JAX allows researchers to take advantage of the latest generation of hardware,
+including TPUs, for faster and more efficient training of large models.
+ML Pathways is Google's latest effort to build artificially intelligent systems
+capable of generalizing across multiple tasks. This is specially suitable for
+[foundation models](https://ai.google/discover/foundation-models/), including large language models like
+these ones.
+Together, JAX and ML Pathways are used as described in the
+[paper about the Gemini family of models](https://arxiv.org/abs/2312.11805); "the 'single
+controller' programming model of Jax and Pathways allows a single Python
+process to orchestrate the entire training run, dramatically simplifying the
+development workflow."
+## Evaluation
+Model evaluation metrics and results.
+### Benchmark Results
+These models were evaluated against a large collection of different datasets and
+metrics to cover different aspects of text generation:
+| Benchmark                      | Metric        | 2B Params | 7B Params |
+| ------------------------------ | ------------- | ----------- | --------- |
+| [MMLU](https://arxiv.org/abs/2009.03300)                   | 5-shot, top-1 | 42.3        | 64.3      |
+| [HellaSwag](https://arxiv.org/abs/1905.07830)         | 0-shot        |71.4        | 81.2      |
+| [PIQA](https://arxiv.org/abs/1911.11641)                   | 0-shot        | 77.3        | 81.2      |
+| [SocialIQA](https://arxiv.org/abs/1904.09728)      | 0-shot        | 59.7        | 51.8      |
+| [BooIQ](https://arxiv.org/abs/1905.10044)                | 0-shot        | 69.4        | 83.2      |
+| [WinoGrande](https://arxiv.org/abs/1907.10641)       | partial score | 65.4        | 72.3      |
+| [CommonsenseQA](https://arxiv.org/abs/1811.00937) | 7-shot        | 65.3        | 71.3      |
+| [OpenBookQA](https://arxiv.org/abs/1809.02789)       |               | 47.8        | 52.8      |
+| [ARC-e](https://arxiv.org/abs/1911.01547)                  |               | 73.2        | 81.5      |
+| [ARC-c](https://arxiv.org/abs/1911.01547)                   |               | 42.1        | 53.2      |
+| [TriviaQA](https://arxiv.org/abs/1705.03551)           | 5-shot        | 53.2        | 63.4      |
+| [Natural Questions](https://github.com/google-research-datasets/natural-questions)  | 5-shot        | -       | 23        |
+| [HumanEval](https://arxiv.org/abs/2107.03374)      | pass@1        | 22.0        | 32.3      |
+| [MBPP](https://arxiv.org/abs/2108.07732)                   | 3-shot        | 29.2        | 44.4      |
+| [GSM8K](https://arxiv.org/abs/2110.14168)                | maj@1         | 17.7        | 46.4      |
+| [MATH](https://arxiv.org/abs/2108.07732)                   | 4-shot        | 11.8          | 24.3      |
+| [AGIEval](https://arxiv.org/abs/2304.06364)           |               | 24.2        | 41.7      |
+| [BIG-Bench](https://arxiv.org/abs/2206.04615)         |               | 35.2        | 55.1      |
+| ------------------------------ | ------------- | ----------- | --------- |
+| **Average**                    |               | **54.0**    | **56.4**  |
+## Ethics and Safety
+Ethics and safety evaluation approach and results.
+### Evaluation Approach
+Our evaluation methods include structured evaluations and internal red-teaming
+testing of relevant content policies. Red-teaming was conducted by a number of
+different teams, each with different goals and human evaluation metrics. These
+models were evaluated against a number of different categories relevant to
+ethics and safety, including:
+* Text-to-Text Content Safety: Human evaluation on prompts covering safety
+  policies including child sexual abuse and exploitation, harassment, violence
+  and gore, and hate speech.
+* Text-to-Text Representational Harms: Benchmark against relevant academic
+  datasets such as [WinoBias](https://arxiv.org/abs/1804.06876) and [BBQ Dataset](https://arxiv.org/abs/2110.08193v2).
+* Memorization: Automated evaluation of memorization of training data, including
+  the risk of personally identifiable information exposure.
+* Large-scale harm: Tests for "dangerous capabilities," such as chemical,
+  biological, radiological, and nuclear (CBRN) risks.
+### Evaluation Results
+The results of ethics and safety evaluations are within acceptable thresholds
+for meeting [internal policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11) for categories such as child
+safety, content safety, representational harms, memorization, large-scale harms.
+On top of robust internal evaluations, the results of well known safety
+benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA
+are shown here.
+| Benchmark                      | Metric        | 2B Params   | 7B Params |
+| ------------------------------ | ------------- | ----------- | --------- |
+| [RealToxicity](https://arxiv.org/abs/2009.11462)        | average       | 6.86        | 7.90      |
+| [BOLD](https://arxiv.org/abs/2101.11718)                   |               | 45.57       | 49.08     |
+| [CrowS-Pairs](https://aclanthology.org/2020.emnlp-main.154/)        | top-1         | 45.82       | 51.33     |
+| [BBQ Ambig](https://arxiv.org/abs/2110.08193v2)               | 1-shot, top-1 | 62.58       | 92.54     |
+| [BBQ Disambig](https://arxiv.org/abs/2110.08193v2)            | top-1         | 54.62       | 71.99     |
+| [Winogender](https://arxiv.org/abs/1804.09301)       | top-1         | 51.25       | 54.17     |
+| [TruthfulQA](https://arxiv.org/abs/2109.07958)       |               | 44.84       | 31.81     |
+| [Winobias 1_2](https://arxiv.org/abs/1804.06876)       |               | 56.12       | 59.09     |
+| [Winobias 2_2](https://arxiv.org/abs/1804.06876)       |               | 91.10       | 92.23     |
+| [Toxigen](https://arxiv.org/abs/2203.09509)             |               | 29.77       | 39.59     |
+| ------------------------------ | ------------- | ----------- | --------- |
+## Usage and Limitations
+These models have certain limitations that users should be aware of.
+### Intended Usage
+Open Large Language Models (LLMs) have a wide range of applications across
+various industries and domains. The following list of potential uses is not
+comprehensive. The purpose of this list is to provide contextual information
+about the possible use-cases that the model creators considered as part of model
+training and development.
+* Content Creation and Communication
+  * Text Generation: These models can be used to generate creative text formats
+    such as poems, scripts, code, marketing copy, and email drafts.
+  * Chatbots and Conversational AI: Power conversational interfaces for customer
+    service, virtual assistants, or interactive applications.
+  * Text Summarization: Generate concise summaries of a text corpus, research
+    papers, or reports.
+* Research and Education
+  * Natural Language Processing (NLP) Research: These models can serve as a
+    foundation for researchers to experiment with NLP techniques, develop
+    algorithms, and contribute to the advancement of the field.
+  * Language Learning Tools: Support interactive language learning experiences,
+    aiding in grammar correction or providing writing practice.
+  * Knowledge Exploration: Assist researchers in exploring large bodies of text
+    by generating summaries or answering questions about specific topics.
+### Limitations
+* Training Data
+  * The quality and diversity of the training data significantly influence the
+    model's capabilities. Biases or gaps in the training data can lead to
+    limitations in the model's responses.
+  * The scope of the training dataset determines the subject areas the model can
+    handle effectively.
+* Context and Task Complexity
+  * LLMs are better at tasks that can be framed with clear prompts and
+    instructions. Open-ended or highly complex tasks might be challenging.
+  * A model's performance can be influenced by the amount of context provided
+    (longer context generally leads to better outputs, up to a certain point).
+* Language Ambiguity and Nuance
+  * Natural language is inherently complex. LLMs might struggle to grasp subtle
+    nuances, sarcasm, or figurative language.
+* Factual Accuracy
+  * LLMs generate responses based on information they learned from their
+    training datasets, but they are not knowledge bases. They may generate
+    incorrect or outdated factual statements.
+* Common Sense
+  * LLMs rely on statistical patterns in language. They might lack the ability
+    to apply common sense reasoning in certain situations.
+### Ethical Considerations and Risks
+The development of large language models (LLMs) raises several ethical concerns.
+In creating an open model, we have carefully considered the following:
+* Bias and Fairness
+  * LLMs trained on large-scale, real-world text data can reflect socio-cultural
+    biases embedded in the training material. These models underwent careful
+    scrutiny, input data pre-processing described and posterior evaluations
+    reported in this card.
+* Misinformation and Misuse
+  * LLMs can be misused to generate text that is false, misleading, or harmful.
+  * Guidelines are provided for responsible use with the model, see the
+    [Responsible Generative AI Toolkit](http://ai.google.dev/gemma/responsible).
+* Transparency and Accountability:
+  * This model card summarizes details on the models' architecture,
+    capabilities, limitations, and evaluation processes.
+  * A responsibly developed open model offers the opportunity to share
+    innovation by making LLM technology accessible to developers and researchers
+    across the AI ecosystem.
+Risks identified and mitigations:
+* Perpetuation of biases: It's encouraged to perform continuous monitoring
+  (using evaluation metrics, human review) and the exploration of de-biasing
+  techniques during model training, fine-tuning, and other use cases.
+* Generation of harmful content: Mechanisms and guidelines for content safety
+  are essential. Developers are encouraged to exercise caution and implement
+  appropriate content safety safeguards based on their specific product policies
+  and application use cases.
+* Misuse for malicious purposes: Technical limitations and developer and
+  end-user education can help mitigate against malicious applications of LLMs.
+  Educational resources and reporting mechanisms for users to flag misuse are
+  provided. Prohibited uses of Gemma models are outlined in the
+  [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy).
+* Privacy violations: Models were trained on data filtered for removal of PII
+  (Personally Identifiable Information). Developers are encouraged to adhere to
+  privacy regulations with privacy-preserving techniques.
+### Benefits
+At the time of release, this family of models provides high-performance open
+large language model implementations designed from the ground up for Responsible
+AI development compared to similarly sized models.
+Using the benchmark evaluation metrics described in this document, these models
+have shown to provide superior performance to other, comparably-sized open model
+alternatives.

config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "architectures": [
+    "GemmaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 2,
+  "eos_token_id": 1,
+  "head_dim": 256,
+  "hidden_act": "gelu",
+  "hidden_size": 3072,
+  "initializer_range": 0.02,
+  "intermediate_size": 24576,
+  "max_position_embeddings": 8192,
+  "model_type": "gemma",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 16,
+  "pad_token_id": 0,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.38.0.dev0",
+  "use_cache": true,
+  "vocab_size": 256000
+}

examples/example_fsdp.py ADDED Viewed

	@@ -0,0 +1,62 @@

+# Make sure to run the script with the following envs:
+#   PJRT_DEVICE=TPU XLA_USE_SPMD=1
+import torch
+import torch_xla
+import torch_xla.core.xla_model as xm
+from datasets import load_dataset
+from peft import LoraConfig, get_peft_model
+from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
+from trl import SFTTrainer
+# Set up TPU device.
+device = xm.xla_device()
+model_id = "google/gemma-7b"
+# Load the pretrained model and tokenizer.
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
+# Set up PEFT LoRA for fine-tuning.
+lora_config = LoraConfig(
+    r=8,
+    target_modules=["k_proj", "v_proj"],
+    task_type="CAUSAL_LM",
+)
+# Load the dataset and format it for training.
+data = load_dataset("Abirate/english_quotes", split="train")
+max_seq_length = 1024
+# Set up the FSDP config. To enable FSDP via SPMD, set xla_fsdp_v2 to True.
+fsdp_config = {"fsdp_transformer_layer_cls_to_wrap": [
+        "GemmaDecoderLayer"
+    ],
+    "xla": True,
+    "xla_fsdp_v2": True,
+    "xla_fsdp_grad_ckpt": True}
+# Finally, set up the trainer and train the model.
+trainer = SFTTrainer(
+    model=model,
+    train_dataset=data,
+    args=TrainingArguments(
+        per_device_train_batch_size=64,  # This is actually the global batch size for SPMD.
+        num_train_epochs=100,
+        max_steps=-1,
+        output_dir="./output",
+        optim="adafactor",
+        logging_steps=1,
+        dataloader_drop_last = True,  # Required for SPMD.
+        fsdp="full_shard",
+        fsdp_config=fsdp_config,
+    ),
+    peft_config=lora_config,
+    dataset_text_field="quote",
+    max_seq_length=max_seq_length,
+    packing=True,
+)
+trainer.train()

examples/example_sft_qlora.py ADDED Viewed

	@@ -0,0 +1,146 @@

+from dataclasses import dataclass, field
+from typing import Optional
+import torch
+from transformers import AutoTokenizer, HfArgumentParser, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
+from datasets import load_dataset
+from peft import LoraConfig
+from trl import SFTTrainer
+@dataclass
+class ScriptArguments:
+    """
+    These arguments vary depending on how many GPUs you have, what their capacity and features are, and what size model you want to train.
+    """
+    per_device_train_batch_size: Optional[int] = field(default=4)
+    per_device_eval_batch_size: Optional[int] = field(default=1)
+    gradient_accumulation_steps: Optional[int] = field(default=4)
+    learning_rate: Optional[float] = field(default=2e-4)
+    max_grad_norm: Optional[float] = field(default=0.3)
+    weight_decay: Optional[int] = field(default=0.001)
+    lora_alpha: Optional[int] = field(default=16)
+    lora_dropout: Optional[float] = field(default=0.1)
+    lora_r: Optional[int] = field(default=8)
+    max_seq_length: Optional[int] = field(default=2048)
+    model_name: Optional[str] = field(
+        default=None,
+        metadata={
+            "help": "The model that you want to train from the Hugging Face hub. E.g. gpt2, gpt2-xl, bert, etc."
+        }
+    )
+    dataset_name: Optional[str] = field(
+        default="stingning/ultrachat",
+        metadata={"help": "The preference dataset to use."},
+    )
+    fp16: Optional[bool] = field(
+        default=False,
+        metadata={"help": "Enables fp16 training."},
+    )
+    bf16: Optional[bool] = field(
+        default=False,
+        metadata={"help": "Enables bf16 training."},
+    )
+    packing: Optional[bool] = field(
+        default=True,
+        metadata={"help": "Use packing dataset creating."},
+    )
+    gradient_checkpointing: Optional[bool] = field(
+        default=True,
+        metadata={"help": "Enables gradient checkpointing."},
+    )
+    use_flash_attention_2: Optional[bool] = field(
+        default=False,
+        metadata={"help": "Enables Flash Attention 2."},
+    )
+    optim: Optional[str] = field(
+        default="paged_adamw_32bit",
+        metadata={"help": "The optimizer to use."},
+    )
+    lr_scheduler_type: str = field(
+        default="constant",
+        metadata={"help": "Learning rate schedule. Constant a bit better than cosine, and has advantage for analysis"},
+    )
+    max_steps: int = field(default=1000, metadata={"help": "How many optimizer update steps to take"})
+    warmup_ratio: float = field(default=0.03, metadata={"help": "Fraction of steps to do a warmup for"})
+    save_steps: int = field(default=10, metadata={"help": "Save checkpoint every X updates steps."})
+    logging_steps: int = field(default=10, metadata={"help": "Log every X updates steps."})
+    output_dir: str = field(
+        default="./results",
+        metadata={"help": "The output directory where the model predictions and checkpoints will be written."},
+    )
+parser = HfArgumentParser(ScriptArguments)
+script_args = parser.parse_args_into_dataclasses()[0]
+def formatting_func(example):
+    text = f"### USER: {example['data'][0]}\n### ASSISTANT: {example['data'][1]}"
+    return text
+# Load the GG model - this is the local one, update it to the one on the Hub
+model_id = "google/gemma-7b"
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_quant_type="nf4"
+)
+# Load model
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    quantization_config=quantization_config,
+    torch_dtype=torch.float32,
+    attn_implementation="sdpa" if not script_args.use_flash_attention_2 else "flash_attention_2"
+)
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+tokenizer.pad_token_id = tokenizer.eos_token_id
+lora_config = LoraConfig(
+    r=script_args.lora_r,
+    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
+    bias="none",
+    task_type="CAUSAL_LM",
+    lora_alpha=script_args.lora_alpha,
+    lora_dropout=script_args.lora_dropout
+)
+train_dataset = load_dataset(script_args.dataset_name, split="train[:5%]")
+# TODO: make that configurable
+YOUR_HF_USERNAME = xxx
+output_dir = f"{YOUR_HF_USERNAME}/gemma-qlora-ultrachat"
+training_arguments = TrainingArguments(
+    output_dir=output_dir,
+    per_device_train_batch_size=script_args.per_device_train_batch_size,
+    gradient_accumulation_steps=script_args.gradient_accumulation_steps,
+    optim=script_args.optim,
+    save_steps=script_args.save_steps,
+    logging_steps=script_args.logging_steps,
+    learning_rate=script_args.learning_rate,
+    max_grad_norm=script_args.max_grad_norm,
+    max_steps=script_args.max_steps,
+    warmup_ratio=script_args.warmup_ratio,
+    lr_scheduler_type=script_args.lr_scheduler_type,
+    gradient_checkpointing=script_args.gradient_checkpointing,
+    fp16=script_args.fp16,
+    bf16=script_args.bf16,
+)
+trainer = SFTTrainer(
+    model=model,
+    args=training_arguments,
+    train_dataset=train_dataset,
+    peft_config=lora_config,
+    packing=script_args.packing,
+    dataset_text_field="id",
+    tokenizer=tokenizer,
+    max_seq_length=script_args.max_seq_length,
+    formatting_func=formatting_func,
+)
+trainer.train()

examples/notebook_sft_peft.ipynb ADDED Viewed

	@@ -0,0 +1,729 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU",
+    "widgets": {
+      "application/vnd.jupyter.widget-state+json": {
+        "32e7669cd82042cbbb419e25db606c1d": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "HBoxModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "HBoxModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "HBoxView",
+            "box_style": "",
+            "children": [
+              "IPY_MODEL_b6698be32bf74c4087e129fab6e13fdd",
+              "IPY_MODEL_ff7333b35c1c472482df6550f6e43be2",
+              "IPY_MODEL_da4df56a1ba440dbb69087d0019cab1d"
+            ],
+            "layout": "IPY_MODEL_ad598693c58549e0a83a1328d77b8f83"
+          }
+        },
+        "b6698be32bf74c4087e129fab6e13fdd": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "HTMLModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "HTMLModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "HTMLView",
+            "description": "",
+            "description_tooltip": null,
+            "layout": "IPY_MODEL_de2f7a60851f4681877a4c8dccba29cc",
+            "placeholder": "",
+            "style": "IPY_MODEL_02b296efbff143f4bfbb904cbc7b1109",
+            "value": "Loading checkpoint shards: 100%"
+          }
+        },
+        "ff7333b35c1c472482df6550f6e43be2": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "FloatProgressModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "FloatProgressModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "ProgressView",
+            "bar_style": "success",
+            "description": "",
+            "description_tooltip": null,
+            "layout": "IPY_MODEL_72ac83e43e2b4d4498070a5b701a5572",
+            "max": 3,
+            "min": 0,
+            "orientation": "horizontal",
+            "style": "IPY_MODEL_320fa615d4de4652ac34fc2518f7749e",
+            "value": 3
+          }
+        },
+        "da4df56a1ba440dbb69087d0019cab1d": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "HTMLModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_dom_classes": [],
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "HTMLModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/controls",
+            "_view_module_version": "1.5.0",
+            "_view_name": "HTMLView",
+            "description": "",
+            "description_tooltip": null,
+            "layout": "IPY_MODEL_75280ef205a245be92da268e0752dc71",
+            "placeholder": "",
+            "style": "IPY_MODEL_3f33eabd6f7f46ef8138abe748d8fbb1",
+            "value": " 3/3 [01:06&lt;00:00, 18.14s/it]"
+          }
+        },
+        "ad598693c58549e0a83a1328d77b8f83": {
+          "model_module": "@jupyter-widgets/base",
+          "model_name": "LayoutModel",
+          "model_module_version": "1.2.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "de2f7a60851f4681877a4c8dccba29cc": {
+          "model_module": "@jupyter-widgets/base",
+          "model_name": "LayoutModel",
+          "model_module_version": "1.2.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "02b296efbff143f4bfbb904cbc7b1109": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "DescriptionStyleModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "DescriptionStyleModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "StyleView",
+            "description_width": ""
+          }
+        },
+        "72ac83e43e2b4d4498070a5b701a5572": {
+          "model_module": "@jupyter-widgets/base",
+          "model_name": "LayoutModel",
+          "model_module_version": "1.2.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "320fa615d4de4652ac34fc2518f7749e": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "ProgressStyleModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "ProgressStyleModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "StyleView",
+            "bar_color": null,
+            "description_width": ""
+          }
+        },
+        "75280ef205a245be92da268e0752dc71": {
+          "model_module": "@jupyter-widgets/base",
+          "model_name": "LayoutModel",
+          "model_module_version": "1.2.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/base",
+            "_model_module_version": "1.2.0",
+            "_model_name": "LayoutModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "LayoutView",
+            "align_content": null,
+            "align_items": null,
+            "align_self": null,
+            "border": null,
+            "bottom": null,
+            "display": null,
+            "flex": null,
+            "flex_flow": null,
+            "grid_area": null,
+            "grid_auto_columns": null,
+            "grid_auto_flow": null,
+            "grid_auto_rows": null,
+            "grid_column": null,
+            "grid_gap": null,
+            "grid_row": null,
+            "grid_template_areas": null,
+            "grid_template_columns": null,
+            "grid_template_rows": null,
+            "height": null,
+            "justify_content": null,
+            "justify_items": null,
+            "left": null,
+            "margin": null,
+            "max_height": null,
+            "max_width": null,
+            "min_height": null,
+            "min_width": null,
+            "object_fit": null,
+            "object_position": null,
+            "order": null,
+            "overflow": null,
+            "overflow_x": null,
+            "overflow_y": null,
+            "padding": null,
+            "right": null,
+            "top": null,
+            "visibility": null,
+            "width": null
+          }
+        },
+        "3f33eabd6f7f46ef8138abe748d8fbb1": {
+          "model_module": "@jupyter-widgets/controls",
+          "model_name": "DescriptionStyleModel",
+          "model_module_version": "1.5.0",
+          "state": {
+            "_model_module": "@jupyter-widgets/controls",
+            "_model_module_version": "1.5.0",
+            "_model_name": "DescriptionStyleModel",
+            "_view_count": null,
+            "_view_module": "@jupyter-widgets/base",
+            "_view_module_version": "1.2.0",
+            "_view_name": "StyleView",
+            "description_width": ""
+          }
+        }
+      }
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "mi50mprVsU_P"
+      },
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "from google.colab import userdata\n",
+        "os.environ[\"HF_TOKEN\"] = userdata.get('HF_TOKEN')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip3 install -q -U bitsandbytes==0.42.0\n",
+        "!pip3 install -q -U peft==0.8.2\n",
+        "!pip3 install -q -U trl==0.7.10\n",
+        "!pip3 install -q -U accelerate==0.27.1\n",
+        "!pip3 install -q -U datasets==2.17.0\n",
+        "!pip3 install -q -U transformers==4.38.1"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "-5gJk3W_s0RY",
+        "outputId": "ca3d427e-5bfc-4635-f27a-e49e56718f7e"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Collecting git+https://****@github.com/huggingface/new-model-addition-golden-gate@add-golden-gate\n",
+            "  Cloning https://****@github.com/huggingface/new-model-addition-golden-gate (to revision add-golden-gate) to /tmp/pip-req-build-8jci0sy8\n",
+            "  Running command git clone --filter=blob:none --quiet 'https://****@github.com/huggingface/new-model-addition-golden-gate' /tmp/pip-req-build-8jci0sy8\n",
+            "  Running command git checkout -b add-golden-gate --track origin/add-golden-gate\n",
+            "  Switched to a new branch 'add-golden-gate'\n",
+            "  Branch 'add-golden-gate' set up to track remote branch 'add-golden-gate' from 'origin'.\n",
+            "  Resolved https://****@github.com/huggingface/new-model-addition-golden-gate to commit e9d36beb5fcafeb2ac327a68eee82009d24cb58f\n",
+            "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
+            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
+            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
+            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (3.13.1)\n",
+            "Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (0.20.3)\n",
+            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (1.25.2)\n",
+            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (23.2)\n",
+            "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (6.0.1)\n",
+            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (2023.12.25)\n",
+            "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (2.31.0)\n",
+            "Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (0.15.2)\n",
+            "Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (0.4.2)\n",
+            "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.38.0.dev0) (4.66.2)\n",
+            "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.0.dev0) (2023.6.0)\n",
+            "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.19.3->transformers==4.38.0.dev0) (4.9.0)\n",
+            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.38.0.dev0) (3.3.2)\n",
+            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.38.0.dev0) (3.6)\n",
+            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.38.0.dev0) (2.0.7)\n",
+            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.38.0.dev0) (2024.2.2)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import torch\n",
+        "from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GemmaTokenizer\n",
+        "\n",
+        "model_id = \"google/gemma-7b\"\n",
+        "bnb_config = BitsAndBytesConfig(\n",
+        "    load_in_4bit=True,\n",
+        "    bnb_4bit_quant_type=\"nf4\",\n",
+        "    bnb_4bit_compute_dtype=torch.bfloat16\n",
+        ")\n",
+        "\n",
+        "tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])\n",
+        "model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={\"\":0}, token=os.environ['HF_TOKEN'])"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 49,
+          "referenced_widgets": [
+            "32e7669cd82042cbbb419e25db606c1d",
+            "b6698be32bf74c4087e129fab6e13fdd",
+            "ff7333b35c1c472482df6550f6e43be2",
+            "da4df56a1ba440dbb69087d0019cab1d",
+            "ad598693c58549e0a83a1328d77b8f83",
+            "de2f7a60851f4681877a4c8dccba29cc",
+            "02b296efbff143f4bfbb904cbc7b1109",
+            "72ac83e43e2b4d4498070a5b701a5572",
+            "320fa615d4de4652ac34fc2518f7749e",
+            "75280ef205a245be92da268e0752dc71",
+            "3f33eabd6f7f46ef8138abe748d8fbb1"
+          ]
+        },
+        "id": "EVEotZX8s-v6",
+        "outputId": "e378234f-f56f-483e-c569-f3a196c02370"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]"
+            ],
+            "application/vnd.jupyter.widget-view+json": {
+              "version_major": 2,
+              "version_minor": 0,
+              "model_id": "32e7669cd82042cbbb419e25db606c1d"
+            }
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "text = \"Quote: Imagination is more\"\n",
+        "device = \"cuda:0\"\n",
+        "inputs = tokenizer(text, return_tensors=\"pt\").to(device)\n",
+        "\n",
+        "outputs = model.generate(**inputs, max_new_tokens=20)\n",
+        "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "7Msk610TVUGW",
+        "outputId": "8c14afe0-dc6e-42b1-d05a-1a7a6c2ace9e"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Quote: Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.\n",
+            "\n",
+            "-Albert Einstein\n",
+            "\n",
+            "I\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "os.environ[\"WANDB_DISABLED\"] = \"true\""
+      ],
+      "metadata": {
+        "id": "Mi2P12KsVbyt"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from peft import LoraConfig\n",
+        "\n",
+        "lora_config = LoraConfig(\n",
+        "    r=8,\n",
+        "    target_modules=[\"q_proj\", \"o_proj\", \"k_proj\", \"v_proj\", \"gate_proj\", \"up_proj\", \"down_proj\"],\n",
+        "    task_type=\"CAUSAL_LM\",\n",
+        ")"
+      ],
+      "metadata": {
+        "id": "7lzjoG3KVRMN"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from datasets import load_dataset\n",
+        "\n",
+        "data = load_dataset(\"Abirate/english_quotes\")\n",
+        "data = data.map(lambda samples: tokenizer(samples[\"quote\"]), batched=True)"
+      ],
+      "metadata": {
+        "id": "HPQSpLNAuubn"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import transformers\n",
+        "from trl import SFTTrainer\n",
+        "\n",
+        "def formatting_func(example):\n",
+        "    text = f\"Quote: {example['quote'][0]}\\nAuthor: {example['author'][0]}\"\n",
+        "    return [text]\n",
+        "\n",
+        "trainer = SFTTrainer(\n",
+        "    model=model,\n",
+        "    train_dataset=data[\"train\"],\n",
+        "    args=transformers.TrainingArguments(\n",
+        "        per_device_train_batch_size=1,\n",
+        "        gradient_accumulation_steps=4,\n",
+        "        warmup_steps=2,\n",
+        "        max_steps=10,\n",
+        "        learning_rate=2e-4,\n",
+        "        fp16=True,\n",
+        "        logging_steps=1,\n",
+        "        output_dir=\"outputs\",\n",
+        "        optim=\"paged_adamw_8bit\"\n",
+        "    ),\n",
+        "    peft_config=lora_config,\n",
+        "    formatting_func=formatting_func,\n",
+        ")\n",
+        "trainer.train()"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 530
+        },
+        "id": "HFbR2FIgVfiT",
+        "outputId": "ba27fbda-54be-415c-ee47-78632e4ad4c6"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stderr",
+          "text": [
+            "Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).\n",
+            "/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:223: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024\n",
+            "  warnings.warn(\n",
+            "/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:290: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.\n",
+            "  warnings.warn(\n"
+          ]
+        },
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "<IPython.core.display.HTML object>"
+            ],
+            "text/html": [
+              "\n",
+              "    <div>\n",
+              "      \n",
+              "      <progress value='10' max='10' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+              "      [10/10 00:08, Epoch 6/10]\n",
+              "    </div>\n",
+              "    <table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              " <tr style=\"text-align: left;\">\n",
+              "      <th>Step</th>\n",
+              "      <th>Training Loss</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <td>1</td>\n",
+              "      <td>1.700500</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>2</td>\n",
+              "      <td>0.641000</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>3</td>\n",
+              "      <td>1.031500</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>4</td>\n",
+              "      <td>0.945800</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>5</td>\n",
+              "      <td>0.516200</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>6</td>\n",
+              "      <td>1.278600</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>7</td>\n",
+              "      <td>1.187300</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>8</td>\n",
+              "      <td>0.339000</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>9</td>\n",
+              "      <td>0.724500</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <td>10</td>\n",
+              "      <td>0.647600</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table><p>"
+            ]
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "TrainOutput(global_step=10, training_loss=0.9011982649564743, metrics={'train_runtime': 10.2202, 'train_samples_per_second': 3.914, 'train_steps_per_second': 0.978, 'total_flos': 5520965345280.0, 'train_loss': 0.9011982649564743, 'epoch': 6.67})"
+            ]
+          },
+          "metadata": {},
+          "execution_count": 8
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "text = \"Quote: Imagination is\"\n",
+        "device = \"cuda:0\"\n",
+        "inputs = tokenizer(text, return_tensors=\"pt\").to(device)\n",
+        "\n",
+        "outputs = model.generate(**inputs, max_new_tokens=20)\n",
+        "print(tokenizer.decode(outputs[0], skip_special_tokens=True))"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "f5Mim0lNViwe",
+        "outputId": "4534ee26-63e3-4ced-ee27-673f0b9d7afb"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Quote: Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.\n",
+            "\n",
+            "Author: Albert Einstein\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "djg3QAMuVx8R"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 107,
+  "eos_token_id": 1,
+  "pad_token_id": 0,
+  "transformers_version": "4.38.0.dev0"
+}

model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:632edaf7993dd73b33287cc34e0de0ed48c04a54834198fac5f2f78ff47e62c9
+size 4995496656

model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b9b0278bd4e203c50d4a1b2a29bd6061b19c48abc77f338a4de2f0dd4fba0fac
+size 4982953168

model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:69dcacd38561f42064c81a0d8ebfd97d8a393d22e40fea327fc6c9a14205768c
+size 4982953200

model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1ddee49e12c7ec4ffb8ff6359a727a798ded9d6176da4bad30833f24426cb92f
+size 2113988336

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,261 @@

+{
+  "metadata": {
+    "total_size": 17075361792
+  },
+  "weight_map": {
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.norm.weight": "model-00004-of-00004.safetensors"
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "<start_of_turn>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "<end_of_turn>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<eos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05e97791a5e007260de1db7e1692e53150e08cea481e2bf25435553380c147ee
+size 17477929

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:61a7b147390c64585d6c3543dd6fc636906c9af3865a5548f27f31aee1d4c8e2
+size 4241003

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,70 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<eos>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<bos>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "106": {
+      "content": "<start_of_turn>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "107": {
+      "content": "<end_of_turn>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<start_of_turn>",
+    "<end_of_turn>"
+  ],
+  "bos_token": "<bos>",
+  "chat_template": "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '\n' + message['content'] | trim + '<end_of_turn>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<eos>",
+  "legacy": null,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "GemmaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}