htahir1 commited on Jan 16

Commit

fc4465d

•

1 Parent(s): 05a3feb

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +202 -1
adapter_config.json +29 -0
adapter_model.safetensors +3 -0
checkpoint-100/README.md +204 -0
checkpoint-100/adapter_config.json +29 -0
checkpoint-100/adapter_model.safetensors +3 -0
checkpoint-100/optimizer.pt +3 -0
checkpoint-100/rng_state.pth +3 -0
checkpoint-100/scheduler.pt +3 -0
checkpoint-100/trainer_state.json +53 -0
checkpoint-100/training_args.bin +3 -0
checkpoint-1000/README.md +204 -0
checkpoint-1000/adapter_config.json +29 -0
checkpoint-1000/adapter_model.safetensors +3 -0
checkpoint-1000/optimizer.pt +3 -0
checkpoint-1000/rng_state.pth +3 -0
checkpoint-1000/scheduler.pt +3 -0
checkpoint-1000/trainer_state.json +341 -0
checkpoint-1000/training_args.bin +3 -0
checkpoint-200/README.md +204 -0
checkpoint-200/adapter_config.json +29 -0
checkpoint-200/adapter_model.safetensors +3 -0
checkpoint-200/optimizer.pt +3 -0
checkpoint-200/rng_state.pth +3 -0
checkpoint-200/scheduler.pt +3 -0
checkpoint-200/trainer_state.json +85 -0
checkpoint-200/training_args.bin +3 -0
checkpoint-300/README.md +204 -0
checkpoint-300/adapter_config.json +29 -0
checkpoint-300/adapter_model.safetensors +3 -0
checkpoint-300/optimizer.pt +3 -0
checkpoint-300/rng_state.pth +3 -0
checkpoint-300/scheduler.pt +3 -0
checkpoint-300/trainer_state.json +117 -0
checkpoint-300/training_args.bin +3 -0
checkpoint-400/README.md +204 -0
checkpoint-400/adapter_config.json +29 -0
checkpoint-400/adapter_model.safetensors +3 -0
checkpoint-400/optimizer.pt +3 -0
checkpoint-400/rng_state.pth +3 -0
checkpoint-400/scheduler.pt +3 -0
checkpoint-400/trainer_state.json +149 -0
checkpoint-400/training_args.bin +3 -0
checkpoint-500/README.md +204 -0
checkpoint-500/adapter_config.json +29 -0
checkpoint-500/adapter_model.safetensors +3 -0
checkpoint-500/optimizer.pt +3 -0
checkpoint-500/rng_state.pth +3 -0
checkpoint-500/scheduler.pt +3 -0
checkpoint-500/trainer_state.json +181 -0

README.md CHANGED Viewed

@@ -1,3 +1,204 @@
 ---
-license: bigcode-openrail-m
 ---

 ---
+library_name: peft
+base_model: bigcode/starcoder
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.2.dev0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_fc",
+    "c_proj",
+    "c_attn",
+    "q_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:495e43d13098fd20e268cc0b7e29b7ce202bd78c87b46b411c81898af04d9b90
+size 55255584

checkpoint-100/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: bigcode/starcoder
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.2.dev0

checkpoint-100/adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_fc",
+    "c_proj",
+    "c_attn",
+    "q_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-100/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb4472b9d8d18b86805c351acc203625198bbf35920baa841afdc6ee12f5240f
+size 55255584

checkpoint-100/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82c7ac539ee39d3cb5e740b4054e9ef614187aa4557fff8989ac9bcfc8006a47
+size 110696954

checkpoint-100/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f903165bb36368f47dd9f2d97c529373babf7977e621de0dc0c839044562d263
+size 14180

checkpoint-100/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cf139b2cd869934a6a802450d748b65f070f2fe1250b16bce2934376e88f03de
+size 1064

checkpoint-100/trainer_state.json ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.1,
+  "eval_steps": 100,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "learning_rate": 0.0001666666666666667,
+      "loss": 0.8745,
+      "step": 25
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 0.00019979028262377118,
+      "loss": 0.8093,
+      "step": 50
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 0.00019893981312363562,
+      "loss": 0.7357,
+      "step": 75
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 0.00019744105246469263,
+      "loss": 0.7535,
+      "step": 100
+    },
+    {
+      "epoch": 0.1,
+      "eval_loss": 0.4003306031227112,
+      "eval_runtime": 1.7839,
+      "eval_samples_per_second": 2.242,
+      "eval_steps_per_second": 0.561,
+      "step": 100
+    }
+  ],
+  "logging_steps": 25,
+  "max_steps": 1000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 9223372036854775807,
+  "save_steps": 100,
+  "total_flos": 1.493507298557952e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-100/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5fcc93ccb3f87a3311bb3baa8290949012608d857aa7e1f8e40c50e3c4f99548
+size 4792

checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: bigcode/starcoder
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.2.dev0

checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_fc",
+    "c_proj",
+    "c_attn",
+    "q_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-1000/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:495e43d13098fd20e268cc0b7e29b7ce202bd78c87b46b411c81898af04d9b90
+size 55255584

checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dbc36e1c878784099b46f5b9a718b335165f7613c64f61d059016e0cedfaa033
+size 110696954

checkpoint-1000/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1dc61da6425213a4ed0c6718da9534c16f880b198a814db7fecbd699176650b
+size 14244

checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7b5bf190dc871967c45091d9f1ab233b2d2ed62baca21fee5dfedb5718ffa5d
+size 1064

checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,341 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 100,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "learning_rate": 0.0001666666666666667,
+      "loss": 0.8745,
+      "step": 25
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 0.00019979028262377118,
+      "loss": 0.8093,
+      "step": 50
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 0.00019893981312363562,
+      "loss": 0.7357,
+      "step": 75
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 0.00019744105246469263,
+      "loss": 0.7535,
+      "step": 100
+    },
+    {
+      "epoch": 0.1,
+      "eval_loss": 0.4003306031227112,
+      "eval_runtime": 1.7839,
+      "eval_samples_per_second": 2.242,
+      "eval_steps_per_second": 0.561,
+      "step": 100
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 0.0001953038210948861,
+      "loss": 0.7249,
+      "step": 125
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 0.00019254212296427044,
+      "loss": 0.7118,
+      "step": 150
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 0.00018917405376582145,
+      "loss": 0.7467,
+      "step": 175
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 0.00018522168236559695,
+      "loss": 0.6714,
+      "step": 200
+    },
+    {
+      "epoch": 0.2,
+      "eval_loss": 0.3684937059879303,
+      "eval_runtime": 1.7853,
+      "eval_samples_per_second": 2.24,
+      "eval_steps_per_second": 0.56,
+      "step": 200
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 0.00018071090619916093,
+      "loss": 0.654,
+      "step": 225
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 0.00017567128158176953,
+      "loss": 0.6392,
+      "step": 250
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 0.00017013583004418993,
+      "loss": 0.5745,
+      "step": 275
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 0.000164140821963114,
+      "loss": 0.5364,
+      "step": 300
+    },
+    {
+      "epoch": 0.3,
+      "eval_loss": 0.3665352761745453,
+      "eval_runtime": 1.7852,
+      "eval_samples_per_second": 2.241,
+      "eval_steps_per_second": 0.56,
+      "step": 300
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 0.00015772553890390197,
+      "loss": 0.5693,
+      "step": 325
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 0.00015093201623287631,
+      "loss": 0.563,
+      "step": 350
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 0.00014380476768566824,
+      "loss": 0.5478,
+      "step": 375
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 0.00013639049369634876,
+      "loss": 0.5763,
+      "step": 400
+    },
+    {
+      "epoch": 0.4,
+      "eval_loss": 0.3363753855228424,
+      "eval_runtime": 1.7851,
+      "eval_samples_per_second": 2.241,
+      "eval_steps_per_second": 0.56,
+      "step": 400
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 0.00012873777539848283,
+      "loss": 0.4891,
+      "step": 425
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 0.00012089675630312754,
+      "loss": 0.5331,
+      "step": 450
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 0.00011291881373954065,
+      "loss": 0.5679,
+      "step": 475
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 0.00010485622221144484,
+      "loss": 0.5982,
+      "step": 500
+    },
+    {
+      "epoch": 0.5,
+      "eval_loss": 0.3186224400997162,
+      "eval_runtime": 1.7828,
+      "eval_samples_per_second": 2.244,
+      "eval_steps_per_second": 0.561,
+      "step": 500
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 9.676181087466444e-05,
+      "loss": 0.5467,
+      "step": 525
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 8.868861738047158e-05,
+      "loss": 0.5706,
+      "step": 550
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 8.068954035279121e-05,
+      "loss": 0.504,
+      "step": 575
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 7.281699277636572e-05,
+      "loss": 0.5267,
+      "step": 600
+    },
+    {
+      "epoch": 0.6,
+      "eval_loss": 0.32175499200820923,
+      "eval_runtime": 1.7846,
+      "eval_samples_per_second": 2.241,
+      "eval_steps_per_second": 0.56,
+      "step": 600
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 6.512255856701177e-05,
+      "loss": 0.5414,
+      "step": 625
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 5.765665457425102e-05,
+      "loss": 0.5412,
+      "step": 650
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 5.0468200231001286e-05,
+      "loss": 0.4611,
+      "step": 675
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 4.360429701490934e-05,
+      "loss": 0.5073,
+      "step": 700
+    },
+    {
+      "epoch": 0.7,
+      "eval_loss": 0.31275978684425354,
+      "eval_runtime": 1.7833,
+      "eval_samples_per_second": 2.243,
+      "eval_steps_per_second": 0.561,
+      "step": 700
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 3.710991982161555e-05,
+      "loss": 0.4778,
+      "step": 725
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 3.102762227218957e-05,
+      "loss": 0.5454,
+      "step": 750
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 2.5397257885675397e-05,
+      "loss": 0.5612,
+      "step": 775
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 2.025571894372794e-05,
+      "loss": 0.4983,
+      "step": 800
+    },
+    {
+      "epoch": 0.8,
+      "eval_loss": 0.31457942724227905,
+      "eval_runtime": 1.7821,
+      "eval_samples_per_second": 2.245,
+      "eval_steps_per_second": 0.561,
+      "step": 800
+    },
+    {
+      "epoch": 0.82,
+      "learning_rate": 1.563669475839956e-05,
+      "loss": 0.4941,
+      "step": 825
+    },
+    {
+      "epoch": 0.85,
+      "learning_rate": 1.1570450926997655e-05,
+      "loss": 0.4926,
+      "step": 850
+    },
+    {
+      "epoch": 0.88,
+      "learning_rate": 8.083631020418791e-06,
+      "loss": 0.5094,
+      "step": 875
+    },
+    {
+      "epoch": 0.9,
+      "learning_rate": 5.199082004372957e-06,
+      "loss": 0.5116,
+      "step": 900
+    },
+    {
+      "epoch": 0.9,
+      "eval_loss": 0.31357938051223755,
+      "eval_runtime": 1.7811,
+      "eval_samples_per_second": 2.246,
+      "eval_steps_per_second": 0.561,
+      "step": 900
+    },
+    {
+      "epoch": 0.93,
+      "learning_rate": 2.9357045374040825e-06,
+      "loss": 0.4244,
+      "step": 925
+    },
+    {
+      "epoch": 0.95,
+      "learning_rate": 1.30832912661093e-06,
+      "loss": 0.4511,
+      "step": 950
+    },
+    {
+      "epoch": 0.97,
+      "learning_rate": 3.2761895254306287e-07,
+      "loss": 0.4179,
+      "step": 975
+    },
+    {
+      "epoch": 1.0,
+      "learning_rate": 0.0,
+      "loss": 0.4662,
+      "step": 1000
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 0.3136279284954071,
+      "eval_runtime": 1.7836,
+      "eval_samples_per_second": 2.243,
+      "eval_steps_per_second": 0.561,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 25,
+  "max_steps": 1000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 9223372036854775807,
+  "save_steps": 100,
+  "total_flos": 1.493507298557952e+18,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5fcc93ccb3f87a3311bb3baa8290949012608d857aa7e1f8e40c50e3c4f99548
+size 4792

checkpoint-200/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: bigcode/starcoder
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.2.dev0

checkpoint-200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_fc",
+    "c_proj",
+    "c_attn",
+    "q_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-200/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f05f803645127df3c355c2f4280d7691d4ce9deb1fbc56f61247b55c7e5b719
+size 55255584

checkpoint-200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee69d8793a2d6932e774ee12256f5321b4b714280b2d4d70141459f210dccc26
+size 110696954

checkpoint-200/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:332b5f59525e9daeab689d4a9878243dd81ca2310bbe6cd23fa9e3060f182362
+size 14244

checkpoint-200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0a17e229e361808d0d69c75ecdc1ef9a97dcbcbf9ffe72c26d29d2aceaec1f9
+size 1064

checkpoint-200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,85 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.2,
+  "eval_steps": 100,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "learning_rate": 0.0001666666666666667,
+      "loss": 0.8745,
+      "step": 25
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 0.00019979028262377118,
+      "loss": 0.8093,
+      "step": 50
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 0.00019893981312363562,
+      "loss": 0.7357,
+      "step": 75
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 0.00019744105246469263,
+      "loss": 0.7535,
+      "step": 100
+    },
+    {
+      "epoch": 0.1,
+      "eval_loss": 0.4003306031227112,
+      "eval_runtime": 1.7839,
+      "eval_samples_per_second": 2.242,
+      "eval_steps_per_second": 0.561,
+      "step": 100
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 0.0001953038210948861,
+      "loss": 0.7249,
+      "step": 125
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 0.00019254212296427044,
+      "loss": 0.7118,
+      "step": 150
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 0.00018917405376582145,
+      "loss": 0.7467,
+      "step": 175
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 0.00018522168236559695,
+      "loss": 0.6714,
+      "step": 200
+    },
+    {
+      "epoch": 0.2,
+      "eval_loss": 0.3684937059879303,
+      "eval_runtime": 1.7853,
+      "eval_samples_per_second": 2.24,
+      "eval_steps_per_second": 0.56,
+      "step": 200
+    }
+  ],
+  "logging_steps": 25,
+  "max_steps": 1000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 9223372036854775807,
+  "save_steps": 100,
+  "total_flos": 2.987014597115904e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5fcc93ccb3f87a3311bb3baa8290949012608d857aa7e1f8e40c50e3c4f99548
+size 4792

checkpoint-300/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: bigcode/starcoder
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.2.dev0

checkpoint-300/adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_fc",
+    "c_proj",
+    "c_attn",
+    "q_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-300/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c2a461b4f667ec91e2a4a0f8ebb1a0beac3e7b119c3c7bd46e011e114d66acbe
+size 55255584

checkpoint-300/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:490271b6f994cc5d0c62ef67f2239acfb833aef87a116a9b9be155f14dfd79f2
+size 110696954

checkpoint-300/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2ebe5ecb60a57db217626fc48593a1343b259e1412ab2cc0ce66d958d2f58062
+size 14180

checkpoint-300/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:62db676ea589f2e897f3ed22ee3133a534ed12d0dd978bfaec8bc59572ea976b
+size 1064

checkpoint-300/trainer_state.json ADDED Viewed

	@@ -0,0 +1,117 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.3,
+  "eval_steps": 100,
+  "global_step": 300,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "learning_rate": 0.0001666666666666667,
+      "loss": 0.8745,
+      "step": 25
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 0.00019979028262377118,
+      "loss": 0.8093,
+      "step": 50
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 0.00019893981312363562,
+      "loss": 0.7357,
+      "step": 75
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 0.00019744105246469263,
+      "loss": 0.7535,
+      "step": 100
+    },
+    {
+      "epoch": 0.1,
+      "eval_loss": 0.4003306031227112,
+      "eval_runtime": 1.7839,
+      "eval_samples_per_second": 2.242,
+      "eval_steps_per_second": 0.561,
+      "step": 100
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 0.0001953038210948861,
+      "loss": 0.7249,
+      "step": 125
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 0.00019254212296427044,
+      "loss": 0.7118,
+      "step": 150
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 0.00018917405376582145,
+      "loss": 0.7467,
+      "step": 175
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 0.00018522168236559695,
+      "loss": 0.6714,
+      "step": 200
+    },
+    {
+      "epoch": 0.2,
+      "eval_loss": 0.3684937059879303,
+      "eval_runtime": 1.7853,
+      "eval_samples_per_second": 2.24,
+      "eval_steps_per_second": 0.56,
+      "step": 200
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 0.00018071090619916093,
+      "loss": 0.654,
+      "step": 225
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 0.00017567128158176953,
+      "loss": 0.6392,
+      "step": 250
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 0.00017013583004418993,
+      "loss": 0.5745,
+      "step": 275
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 0.000164140821963114,
+      "loss": 0.5364,
+      "step": 300
+    },
+    {
+      "epoch": 0.3,
+      "eval_loss": 0.3665352761745453,
+      "eval_runtime": 1.7852,
+      "eval_samples_per_second": 2.241,
+      "eval_steps_per_second": 0.56,
+      "step": 300
+    }
+  ],
+  "logging_steps": 25,
+  "max_steps": 1000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 9223372036854775807,
+  "save_steps": 100,
+  "total_flos": 4.480521895673856e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-300/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5fcc93ccb3f87a3311bb3baa8290949012608d857aa7e1f8e40c50e3c4f99548
+size 4792

checkpoint-400/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: bigcode/starcoder
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.2.dev0

checkpoint-400/adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_fc",
+    "c_proj",
+    "c_attn",
+    "q_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-400/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:edaca769743a175eff4e0ef6bc5d4e4446f21774430a20fc43aa3c3670419ba9
+size 55255584

checkpoint-400/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1e62dc266a27e92f541dc5d52ccba7508e44b1ad73a79786b9a027debc324590
+size 110696954

checkpoint-400/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:71832fb3990c1059e14ec9109d9bf125f682c118937e9e5b1a3310b3e8be05ec
+size 14244

checkpoint-400/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:73179501dc4bcc1b0d3ff711880af909a9b84bb7d003a900c122d08331d45bfb
+size 1064

checkpoint-400/trainer_state.json ADDED Viewed

	@@ -0,0 +1,149 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.4,
+  "eval_steps": 100,
+  "global_step": 400,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "learning_rate": 0.0001666666666666667,
+      "loss": 0.8745,
+      "step": 25
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 0.00019979028262377118,
+      "loss": 0.8093,
+      "step": 50
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 0.00019893981312363562,
+      "loss": 0.7357,
+      "step": 75
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 0.00019744105246469263,
+      "loss": 0.7535,
+      "step": 100
+    },
+    {
+      "epoch": 0.1,
+      "eval_loss": 0.4003306031227112,
+      "eval_runtime": 1.7839,
+      "eval_samples_per_second": 2.242,
+      "eval_steps_per_second": 0.561,
+      "step": 100
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 0.0001953038210948861,
+      "loss": 0.7249,
+      "step": 125
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 0.00019254212296427044,
+      "loss": 0.7118,
+      "step": 150
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 0.00018917405376582145,
+      "loss": 0.7467,
+      "step": 175
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 0.00018522168236559695,
+      "loss": 0.6714,
+      "step": 200
+    },
+    {
+      "epoch": 0.2,
+      "eval_loss": 0.3684937059879303,
+      "eval_runtime": 1.7853,
+      "eval_samples_per_second": 2.24,
+      "eval_steps_per_second": 0.56,
+      "step": 200
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 0.00018071090619916093,
+      "loss": 0.654,
+      "step": 225
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 0.00017567128158176953,
+      "loss": 0.6392,
+      "step": 250
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 0.00017013583004418993,
+      "loss": 0.5745,
+      "step": 275
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 0.000164140821963114,
+      "loss": 0.5364,
+      "step": 300
+    },
+    {
+      "epoch": 0.3,
+      "eval_loss": 0.3665352761745453,
+      "eval_runtime": 1.7852,
+      "eval_samples_per_second": 2.241,
+      "eval_steps_per_second": 0.56,
+      "step": 300
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 0.00015772553890390197,
+      "loss": 0.5693,
+      "step": 325
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 0.00015093201623287631,
+      "loss": 0.563,
+      "step": 350
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 0.00014380476768566824,
+      "loss": 0.5478,
+      "step": 375
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 0.00013639049369634876,
+      "loss": 0.5763,
+      "step": 400
+    },
+    {
+      "epoch": 0.4,
+      "eval_loss": 0.3363753855228424,
+      "eval_runtime": 1.7851,
+      "eval_samples_per_second": 2.241,
+      "eval_steps_per_second": 0.56,
+      "step": 400
+    }
+  ],
+  "logging_steps": 25,
+  "max_steps": 1000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 9223372036854775807,
+  "save_steps": 100,
+  "total_flos": 5.974029194231808e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-400/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5fcc93ccb3f87a3311bb3baa8290949012608d857aa7e1f8e40c50e3c4f99548
+size 4792

checkpoint-500/README.md ADDED Viewed

	@@ -0,0 +1,204 @@

+---
+library_name: peft
+base_model: bigcode/starcoder
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.7.2.dev0

checkpoint-500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigcode/starcoder",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "c_fc",
+    "c_proj",
+    "c_attn",
+    "q_attn"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_rslora": false
+}

checkpoint-500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3da466246316f1d6c831d2bca7fe024dc139f59f71a0bb8a12a79bf885467db4
+size 55255584

checkpoint-500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d1dd841350fff6cb5403d07db622784417e503895aa7c470dc0bb46de7e37d6
+size 110696954

checkpoint-500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f08f4c8427f0bcf80eabd18dc74ad53a2ae6e85f6226bc2a3da12c0c80968b99
+size 14244

checkpoint-500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f51b657e3d38a2589f1fc9606eb9bdf1d6b09dd6934a23956cba0003ba32ad8
+size 1064

checkpoint-500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,181 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.5,
+  "eval_steps": 100,
+  "global_step": 500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.03,
+      "learning_rate": 0.0001666666666666667,
+      "loss": 0.8745,
+      "step": 25
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 0.00019979028262377118,
+      "loss": 0.8093,
+      "step": 50
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 0.00019893981312363562,
+      "loss": 0.7357,
+      "step": 75
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 0.00019744105246469263,
+      "loss": 0.7535,
+      "step": 100
+    },
+    {
+      "epoch": 0.1,
+      "eval_loss": 0.4003306031227112,
+      "eval_runtime": 1.7839,
+      "eval_samples_per_second": 2.242,
+      "eval_steps_per_second": 0.561,
+      "step": 100
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 0.0001953038210948861,
+      "loss": 0.7249,
+      "step": 125
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 0.00019254212296427044,
+      "loss": 0.7118,
+      "step": 150
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 0.00018917405376582145,
+      "loss": 0.7467,
+      "step": 175
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 0.00018522168236559695,
+      "loss": 0.6714,
+      "step": 200
+    },
+    {
+      "epoch": 0.2,
+      "eval_loss": 0.3684937059879303,
+      "eval_runtime": 1.7853,
+      "eval_samples_per_second": 2.24,
+      "eval_steps_per_second": 0.56,
+      "step": 200
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 0.00018071090619916093,
+      "loss": 0.654,
+      "step": 225
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 0.00017567128158176953,
+      "loss": 0.6392,
+      "step": 250
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 0.00017013583004418993,
+      "loss": 0.5745,
+      "step": 275
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 0.000164140821963114,
+      "loss": 0.5364,
+      "step": 300
+    },
+    {
+      "epoch": 0.3,
+      "eval_loss": 0.3665352761745453,
+      "eval_runtime": 1.7852,
+      "eval_samples_per_second": 2.241,
+      "eval_steps_per_second": 0.56,
+      "step": 300
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 0.00015772553890390197,
+      "loss": 0.5693,
+      "step": 325
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 0.00015093201623287631,
+      "loss": 0.563,
+      "step": 350
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 0.00014380476768566824,
+      "loss": 0.5478,
+      "step": 375
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 0.00013639049369634876,
+      "loss": 0.5763,
+      "step": 400
+    },
+    {
+      "epoch": 0.4,
+      "eval_loss": 0.3363753855228424,
+      "eval_runtime": 1.7851,
+      "eval_samples_per_second": 2.241,
+      "eval_steps_per_second": 0.56,
+      "step": 400
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 0.00012873777539848283,
+      "loss": 0.4891,
+      "step": 425
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 0.00012089675630312754,
+      "loss": 0.5331,
+      "step": 450
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 0.00011291881373954065,
+      "loss": 0.5679,
+      "step": 475
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 0.00010485622221144484,
+      "loss": 0.5982,
+      "step": 500
+    },
+    {
+      "epoch": 0.5,
+      "eval_loss": 0.3186224400997162,
+      "eval_runtime": 1.7828,
+      "eval_samples_per_second": 2.244,
+      "eval_steps_per_second": 0.561,
+      "step": 500
+    }
+  ],
+  "logging_steps": 25,
+  "max_steps": 1000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 9223372036854775807,
+  "save_steps": 100,
+  "total_flos": 7.46753649278976e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}