First model version

Browse files

Files changed (11) hide show

README.md +48 -180
config.json +29 -0
generation_config.json +12 -0
model-00001-of-00004.safetensors +3 -0
model-00002-of-00004.safetensors +3 -0
model-00003-of-00004.safetensors +3 -0
model-00004-of-00004.safetensors +3 -0
model.safetensors.index.json +298 -0
special_tokens_map.json +1 -7
trainer_log.jsonl +50 -0
training_args.bin +3 -0

README.md CHANGED Viewed

@@ -1,199 +1,67 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: llama3
+base_model: meta-llama/Meta-Llama-3-8B-Instruct
+tags:
+- llama-factory
+- generated_from_trainer
+model-index:
+- name: perc_240915
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# perc_240915
+This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.9368
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-05
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 2
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 4
+- total_eval_batch_size: 2
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1.0
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 1.0914        | 0.2222 | 100  | 1.1223          |
+| 1.1722        | 0.4444 | 200  | 1.0550          |
+| 0.9559        | 0.6667 | 300  | 0.9778          |
+| 0.9108        | 0.8889 | 400  | 0.9368          |
+### Framework versions
+- Transformers 4.43.3
+- Pytorch 2.4.0+cu121
+- Datasets 2.20.0
+- Tokenizers 0.19.1

config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 128000,
+  "eos_token_id": 128009,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.43.3",
+  "use_cache": false,
+  "vocab_size": 128256
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "bos_token_id": 128000,
+  "do_sample": true,
+  "eos_token_id": [
+    128001,
+    128009
+  ],
+  "max_length": 4096,
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.43.3"
+}

model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:95a219b2bc7f89ee379ff0eb7132b8c73c3ea1760891af738591a60b9871328b
+size 4976698672

model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cca419b99264d13c2995ec6d45b02e29f8dfc3fa4d761af033b186afcd1d1bec
+size 4999802720

model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64b09c0e5284a5033b06a32fafb1fbe1f5ab62e69a8a3843b96021f6bb525d28
+size 4915916176

model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5bb97f9443c28a764f498a339f8556a30df3def412df6e96d30eaeb7d09c5587
+size 1168138808

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,298 @@

+{
+  "metadata": {
+    "total_size": 16060522496
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.norm.weight": "model-00004-of-00004.safetensors"
+  }
+}

special_tokens_map.json CHANGED Viewed

@@ -13,11 +13,5 @@
     "rstrip": false,
     "single_word": false
   },
-  "pad_token": {
-    "content": "<|eot_id|>",
-    "lstrip": false,
-    "normalized": false,
-    "rstrip": false,
-    "single_word": false
-  }
 }

     "rstrip": false,
     "single_word": false
   },
+  "pad_token": "<|eot_id|>"
 }

trainer_log.jsonl ADDED Viewed

	@@ -0,0 +1,50 @@

+{"current_steps": 10, "total_steps": 450, "loss": 1.3973, "learning_rate": 2.222222222222222e-06, "epoch": 0.022222222222222223, "percentage": 2.22, "elapsed_time": "0:00:25", "remaining_time": "0:18:35", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 20, "total_steps": 450, "loss": 1.0847, "learning_rate": 4.444444444444444e-06, "epoch": 0.044444444444444446, "percentage": 4.44, "elapsed_time": "0:00:49", "remaining_time": "0:17:45", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 30, "total_steps": 450, "loss": 1.0589, "learning_rate": 6.666666666666667e-06, "epoch": 0.06666666666666667, "percentage": 6.67, "elapsed_time": "0:01:13", "remaining_time": "0:17:13", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 40, "total_steps": 450, "loss": 1.0995, "learning_rate": 8.888888888888888e-06, "epoch": 0.08888888888888889, "percentage": 8.89, "elapsed_time": "0:01:38", "remaining_time": "0:16:44", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 50, "total_steps": 450, "loss": 1.0963, "learning_rate": 9.996239762521152e-06, "epoch": 0.1111111111111111, "percentage": 11.11, "elapsed_time": "0:02:02", "remaining_time": "0:16:17", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 60, "total_steps": 450, "loss": 1.0545, "learning_rate": 9.966191788709716e-06, "epoch": 0.13333333333333333, "percentage": 13.33, "elapsed_time": "0:02:26", "remaining_time": "0:15:51", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 70, "total_steps": 450, "loss": 1.1422, "learning_rate": 9.906276553136924e-06, "epoch": 0.15555555555555556, "percentage": 15.56, "elapsed_time": "0:02:50", "remaining_time": "0:15:26", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 80, "total_steps": 450, "loss": 1.1892, "learning_rate": 9.816854393079402e-06, "epoch": 0.17777777777777778, "percentage": 17.78, "elapsed_time": "0:03:14", "remaining_time": "0:15:01", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 90, "total_steps": 450, "loss": 1.1044, "learning_rate": 9.698463103929542e-06, "epoch": 0.2, "percentage": 20.0, "elapsed_time": "0:03:39", "remaining_time": "0:14:36", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 100, "total_steps": 450, "loss": 1.0914, "learning_rate": 9.551814704830734e-06, "epoch": 0.2222222222222222, "percentage": 22.22, "elapsed_time": "0:04:03", "remaining_time": "0:14:11", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 100, "total_steps": 450, "eval_loss": 1.1223397254943848, "epoch": 0.2222222222222222, "percentage": 22.22, "elapsed_time": "0:04:40", "remaining_time": "0:16:21", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 110, "total_steps": 450, "loss": 1.1131, "learning_rate": 9.377791156510456e-06, "epoch": 0.24444444444444444, "percentage": 24.44, "elapsed_time": "0:05:04", "remaining_time": "0:15:41", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 120, "total_steps": 450, "loss": 1.0948, "learning_rate": 9.177439057064684e-06, "epoch": 0.26666666666666666, "percentage": 26.67, "elapsed_time": "0:05:28", "remaining_time": "0:15:04", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 130, "total_steps": 450, "loss": 1.1654, "learning_rate": 8.951963347593797e-06, "epoch": 0.28888888888888886, "percentage": 28.89, "elapsed_time": "0:05:53", "remaining_time": "0:14:29", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 140, "total_steps": 450, "loss": 1.0162, "learning_rate": 8.702720065545024e-06, "epoch": 0.3111111111111111, "percentage": 31.11, "elapsed_time": "0:06:17", "remaining_time": "0:13:55", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 150, "total_steps": 450, "loss": 1.1379, "learning_rate": 8.43120818934367e-06, "epoch": 0.3333333333333333, "percentage": 33.33, "elapsed_time": "0:06:41", "remaining_time": "0:13:23", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 160, "total_steps": 450, "loss": 1.1389, "learning_rate": 8.139060623360494e-06, "epoch": 0.35555555555555557, "percentage": 35.56, "elapsed_time": "0:07:05", "remaining_time": "0:12:52", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 170, "total_steps": 450, "loss": 1.066, "learning_rate": 7.828034377432694e-06, "epoch": 0.37777777777777777, "percentage": 37.78, "elapsed_time": "0:07:30", "remaining_time": "0:12:21", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 180, "total_steps": 450, "loss": 1.0525, "learning_rate": 7.500000000000001e-06, "epoch": 0.4, "percentage": 40.0, "elapsed_time": "0:07:54", "remaining_time": "0:11:51", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 190, "total_steps": 450, "loss": 1.0559, "learning_rate": 7.156930328406268e-06, "epoch": 0.4222222222222222, "percentage": 42.22, "elapsed_time": "0:08:18", "remaining_time": "0:11:22", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 200, "total_steps": 450, "loss": 1.1722, "learning_rate": 6.800888624023552e-06, "epoch": 0.4444444444444444, "percentage": 44.44, "elapsed_time": "0:08:42", "remaining_time": "0:10:53", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 200, "total_steps": 450, "eval_loss": 1.0550415515899658, "epoch": 0.4444444444444444, "percentage": 44.44, "elapsed_time": "0:09:19", "remaining_time": "0:11:39", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 210, "total_steps": 450, "loss": 1.1375, "learning_rate": 6.434016163555452e-06, "epoch": 0.4666666666666667, "percentage": 46.67, "elapsed_time": "0:09:43", "remaining_time": "0:11:07", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 220, "total_steps": 450, "loss": 1.1109, "learning_rate": 6.058519361147055e-06, "epoch": 0.4888888888888889, "percentage": 48.89, "elapsed_time": "0:10:08", "remaining_time": "0:10:35", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 230, "total_steps": 450, "loss": 1.0671, "learning_rate": 5.6766564987506564e-06, "epoch": 0.5111111111111111, "percentage": 51.11, "elapsed_time": "0:10:32", "remaining_time": "0:10:04", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 240, "total_steps": 450, "loss": 1.0387, "learning_rate": 5.290724144552379e-06, "epoch": 0.5333333333333333, "percentage": 53.33, "elapsed_time": "0:10:56", "remaining_time": "0:09:34", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 250, "total_steps": 450, "loss": 1.0224, "learning_rate": 4.903043341140879e-06, "epoch": 0.5555555555555556, "percentage": 55.56, "elapsed_time": "0:11:20", "remaining_time": "0:09:04", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 260, "total_steps": 450, "loss": 1.1089, "learning_rate": 4.515945646484105e-06, "epoch": 0.5777777777777777, "percentage": 57.78, "elapsed_time": "0:11:45", "remaining_time": "0:08:35", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 270, "total_steps": 450, "loss": 0.9636, "learning_rate": 4.131759111665349e-06, "epoch": 0.6, "percentage": 60.0, "elapsed_time": "0:12:09", "remaining_time": "0:08:06", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 280, "total_steps": 450, "loss": 1.0456, "learning_rate": 3.752794279710094e-06, "epoch": 0.6222222222222222, "percentage": 62.22, "elapsed_time": "0:12:33", "remaining_time": "0:07:37", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 290, "total_steps": 450, "loss": 0.8803, "learning_rate": 3.3813302897083955e-06, "epoch": 0.6444444444444445, "percentage": 64.44, "elapsed_time": "0:12:57", "remaining_time": "0:07:09", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 300, "total_steps": 450, "loss": 0.9559, "learning_rate": 3.019601169804216e-06, "epoch": 0.6666666666666666, "percentage": 66.67, "elapsed_time": "0:13:22", "remaining_time": "0:06:41", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 300, "total_steps": 450, "eval_loss": 0.9777525067329407, "epoch": 0.6666666666666666, "percentage": 66.67, "elapsed_time": "0:13:58", "remaining_time": "0:06:59", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 310, "total_steps": 450, "loss": 0.9671, "learning_rate": 2.6697824014873076e-06, "epoch": 0.6888888888888889, "percentage": 68.89, "elapsed_time": "0:14:23", "remaining_time": "0:06:29", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 320, "total_steps": 450, "loss": 0.9442, "learning_rate": 2.333977835991545e-06, "epoch": 0.7111111111111111, "percentage": 71.11, "elapsed_time": "0:14:47", "remaining_time": "0:06:00", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 330, "total_steps": 450, "loss": 0.9096, "learning_rate": 2.0142070414860704e-06, "epoch": 0.7333333333333333, "percentage": 73.33, "elapsed_time": "0:15:11", "remaining_time": "0:05:31", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 340, "total_steps": 450, "loss": 0.9629, "learning_rate": 1.7123931571546826e-06, "epoch": 0.7555555555555555, "percentage": 75.56, "elapsed_time": "0:15:35", "remaining_time": "0:05:02", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 350, "total_steps": 450, "loss": 1.0907, "learning_rate": 1.4303513272105057e-06, "epoch": 0.7777777777777778, "percentage": 77.78, "elapsed_time": "0:16:00", "remaining_time": "0:04:34", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 360, "total_steps": 450, "loss": 0.9574, "learning_rate": 1.1697777844051105e-06, "epoch": 0.8, "percentage": 80.0, "elapsed_time": "0:16:24", "remaining_time": "0:04:06", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 370, "total_steps": 450, "loss": 0.9702, "learning_rate": 9.322396486851626e-07, "epoch": 0.8222222222222222, "percentage": 82.22, "elapsed_time": "0:16:48", "remaining_time": "0:03:38", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 380, "total_steps": 450, "loss": 0.9496, "learning_rate": 7.191655023486682e-07, "epoch": 0.8444444444444444, "percentage": 84.44, "elapsed_time": "0:17:12", "remaining_time": "0:03:10", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 390, "total_steps": 450, "loss": 0.8602, "learning_rate": 5.318367983829393e-07, "epoch": 0.8666666666666667, "percentage": 86.67, "elapsed_time": "0:17:36", "remaining_time": "0:02:42", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 400, "total_steps": 450, "loss": 0.9108, "learning_rate": 3.7138015365554834e-07, "epoch": 0.8888888888888888, "percentage": 88.89, "elapsed_time": "0:18:01", "remaining_time": "0:02:15", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 400, "total_steps": 450, "eval_loss": 0.9367556571960449, "epoch": 0.8888888888888888, "percentage": 88.89, "elapsed_time": "0:18:37", "remaining_time": "0:02:19", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 410, "total_steps": 450, "loss": 0.9506, "learning_rate": 2.3876057330792344e-07, "epoch": 0.9111111111111111, "percentage": 91.11, "elapsed_time": "0:19:02", "remaining_time": "0:01:51", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 420, "total_steps": 450, "loss": 0.8663, "learning_rate": 1.3477564710088097e-07, "epoch": 0.9333333333333333, "percentage": 93.33, "elapsed_time": "0:19:26", "remaining_time": "0:01:23", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 430, "total_steps": 450, "loss": 0.9494, "learning_rate": 6.005075261595495e-08, "epoch": 0.9555555555555556, "percentage": 95.56, "elapsed_time": "0:19:50", "remaining_time": "0:00:55", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 440, "total_steps": 450, "loss": 0.9524, "learning_rate": 1.5035294161039882e-08, "epoch": 0.9777777777777777, "percentage": 97.78, "elapsed_time": "0:20:14", "remaining_time": "0:00:27", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 450, "total_steps": 450, "loss": 0.9986, "learning_rate": 0.0, "epoch": 1.0, "percentage": 100.0, "elapsed_time": "0:20:39", "remaining_time": "0:00:00", "throughput": "0.00", "total_tokens": 0}
+{"current_steps": 450, "total_steps": 450, "epoch": 1.0, "percentage": 100.0, "elapsed_time": "0:20:39", "remaining_time": "0:00:00", "throughput": "0.00", "total_tokens": 0}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:97af68fde5e549efc71c6ac8abce937f0c54736390f064509503d67df45fd626
+size 7160