Sentence Similarity
sentence-transformers
TensorBoard
Safetensors
English
bert
feature-extraction
Inference Endpoints
text-embeddings-inference
5 papers
srsawant34 commited on
Commit
07e9dff
1 Parent(s): 782236b

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +108 -39
  2. checkpoint-6027/model.safetensors +1 -1
  3. checkpoint-6027/optimizer.pt +1 -1
  4. checkpoint-6027/rng_state.pth +1 -1
  5. checkpoint-6027/trainer_state.json +21 -21
  6. checkpoint-6027/training_args.bin +1 -1
  7. checkpoint-6314/model.safetensors +1 -1
  8. checkpoint-6314/optimizer.pt +1 -1
  9. checkpoint-6314/rng_state.pth +1 -1
  10. checkpoint-6314/trainer_state.json +22 -22
  11. checkpoint-6314/training_args.bin +1 -1
  12. checkpoint-6601/model.safetensors +1 -1
  13. checkpoint-6601/optimizer.pt +1 -1
  14. checkpoint-6601/rng_state.pth +1 -1
  15. checkpoint-6601/trainer_state.json +23 -23
  16. checkpoint-6601/training_args.bin +1 -1
  17. checkpoint-6888/model.safetensors +1 -1
  18. checkpoint-6888/optimizer.pt +1 -1
  19. checkpoint-6888/rng_state.pth +1 -1
  20. checkpoint-6888/trainer_state.json +24 -24
  21. checkpoint-6888/training_args.bin +1 -1
  22. checkpoint-7175/model.safetensors +1 -1
  23. checkpoint-7175/optimizer.pt +1 -1
  24. checkpoint-7175/rng_state.pth +1 -1
  25. checkpoint-7175/trainer_state.json +25 -25
  26. checkpoint-7175/training_args.bin +1 -1
  27. checkpoint-7462/model.safetensors +1 -1
  28. checkpoint-7462/optimizer.pt +1 -1
  29. checkpoint-7462/rng_state.pth +1 -1
  30. checkpoint-7462/trainer_state.json +26 -26
  31. checkpoint-7462/training_args.bin +1 -1
  32. checkpoint-7749/model.safetensors +1 -1
  33. checkpoint-7749/optimizer.pt +1 -1
  34. checkpoint-7749/rng_state.pth +1 -1
  35. checkpoint-7749/trainer_state.json +27 -27
  36. checkpoint-7749/training_args.bin +1 -1
  37. checkpoint-8036/model.safetensors +1 -1
  38. checkpoint-8036/optimizer.pt +1 -1
  39. checkpoint-8036/rng_state.pth +1 -1
  40. checkpoint-8036/trainer_state.json +28 -28
  41. checkpoint-8036/training_args.bin +1 -1
  42. checkpoint-8323/model.safetensors +1 -1
  43. checkpoint-8323/optimizer.pt +1 -1
  44. checkpoint-8323/rng_state.pth +1 -1
  45. checkpoint-8323/trainer_state.json +29 -29
  46. checkpoint-8323/training_args.bin +1 -1
  47. checkpoint-8610/model.safetensors +1 -1
  48. checkpoint-8610/optimizer.pt +1 -1
  49. checkpoint-8610/rng_state.pth +1 -1
  50. checkpoint-8610/trainer_state.json +30 -30
README.md CHANGED
@@ -1,21 +1,41 @@
1
  ---
2
  pipeline_tag: sentence-similarity
3
- license: apache-2.0
4
  tags:
5
  - sentence-transformers
6
  - feature-extraction
7
  - sentence-similarity
8
- - transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
- # sentence-transformers/paraphrase-MiniLM-L6-v2
12
 
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
15
-
16
-
17
  ## Usage (Sentence-Transformers)
18
-
19
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
20
 
21
  ```
@@ -23,25 +43,22 @@ pip install -U sentence-transformers
23
  ```
24
 
25
  Then you can use the model like this:
26
-
27
  ```python
28
  from sentence_transformers import SentenceTransformer
29
  sentences = ["This is an example sentence", "Each sentence is converted"]
30
 
31
- model = SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2')
32
  embeddings = model.encode(sentences)
33
  print(embeddings)
34
  ```
35
 
36
-
37
-
38
  ## Usage (HuggingFace Transformers)
39
  Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
40
 
41
  ```python
42
  from transformers import AutoTokenizer, AutoModel
43
  import torch
44
-
45
 
46
  #Mean Pooling - Take attention mask into account for correct averaging
47
  def mean_pooling(model_output, attention_mask):
@@ -54,8 +71,8 @@ def mean_pooling(model_output, attention_mask):
54
  sentences = ['This is an example sentence', 'Each sentence is converted']
55
 
56
  # Load model from HuggingFace Hub
57
- tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/paraphrase-MiniLM-L6-v2')
58
- model = AutoModel.from_pretrained('sentence-transformers/paraphrase-MiniLM-L6-v2')
59
 
60
  # Tokenize sentences
61
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -64,44 +81,96 @@ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tenso
64
  with torch.no_grad():
65
  model_output = model(**encoded_input)
66
 
67
- # Perform pooling. In this case, max pooling.
68
  sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
69
 
 
 
 
70
  print("Sentence embeddings:")
71
  print(sentence_embeddings)
72
  ```
73
 
 
74
 
 
75
 
76
- ## Evaluation Results
77
 
 
78
 
 
 
 
79
 
80
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/paraphrase-MiniLM-L6-v2)
 
 
 
81
 
 
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
 
84
- ## Full Model Architecture
85
- ```
86
- SentenceTransformer(
87
- (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
88
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
89
- )
90
- ```
91
 
92
- ## Citing & Authors
93
-
94
- This model was trained by [sentence-transformers](https://www.sbert.net/).
95
-
96
- If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
97
- ```bibtex
98
- @inproceedings{reimers-2019-sentence-bert,
99
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
100
- author = "Reimers, Nils and Gurevych, Iryna",
101
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
102
- month = "11",
103
- year = "2019",
104
- publisher = "Association for Computational Linguistics",
105
- url = "http://arxiv.org/abs/1908.10084",
106
- }
107
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  pipeline_tag: sentence-similarity
 
3
  tags:
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
+ language: en
8
+ license: apache-2.0
9
+ datasets:
10
+ - s2orc
11
+ - flax-sentence-embeddings/stackexchange_xml
12
+ - ms_marco
13
+ - gooaq
14
+ - yahoo_answers_topics
15
+ - code_search_net
16
+ - search_qa
17
+ - eli5
18
+ - snli
19
+ - multi_nli
20
+ - wikihow
21
+ - natural_questions
22
+ - trivia_qa
23
+ - embedding-data/sentence-compression
24
+ - embedding-data/flickr30k-captions
25
+ - embedding-data/altlex
26
+ - embedding-data/simple-wiki
27
+ - embedding-data/QQP
28
+ - embedding-data/SPECTER
29
+ - embedding-data/PAQ_pairs
30
+ - embedding-data/WikiAnswers
31
+
32
  ---
33
 
 
34
 
35
+ # all-MiniLM-L6-v2
36
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
37
 
 
 
38
  ## Usage (Sentence-Transformers)
 
39
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
40
 
41
  ```
 
43
  ```
44
 
45
  Then you can use the model like this:
 
46
  ```python
47
  from sentence_transformers import SentenceTransformer
48
  sentences = ["This is an example sentence", "Each sentence is converted"]
49
 
50
+ model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
51
  embeddings = model.encode(sentences)
52
  print(embeddings)
53
  ```
54
 
 
 
55
  ## Usage (HuggingFace Transformers)
56
  Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
57
 
58
  ```python
59
  from transformers import AutoTokenizer, AutoModel
60
  import torch
61
+ import torch.nn.functional as F
62
 
63
  #Mean Pooling - Take attention mask into account for correct averaging
64
  def mean_pooling(model_output, attention_mask):
 
71
  sentences = ['This is an example sentence', 'Each sentence is converted']
72
 
73
  # Load model from HuggingFace Hub
74
+ tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
75
+ model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
76
 
77
  # Tokenize sentences
78
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
81
  with torch.no_grad():
82
  model_output = model(**encoded_input)
83
 
84
+ # Perform pooling
85
  sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
86
 
87
+ # Normalize embeddings
88
+ sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
89
+
90
  print("Sentence embeddings:")
91
  print(sentence_embeddings)
92
  ```
93
 
94
+ ## Evaluation Results
95
 
96
+ For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/all-MiniLM-L6-v2)
97
 
98
+ ------
99
 
100
+ ## Background
101
 
102
+ The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
103
+ contrastive learning objective. We used the pretrained [`nreimers/MiniLM-L6-H384-uncased`](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) model and fine-tuned in on a
104
+ 1B sentence pairs dataset. We use a contrastive learning objective: given a sentence from the pair, the model should predict which out of a set of randomly sampled other sentences, was actually paired with it in our dataset.
105
 
106
+ We developped this model during the
107
+ [Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104),
108
+ organized by Hugging Face. We developped this model as part of the project:
109
+ [Train the Best Sentence Embedding Model Ever with 1B Training Pairs](https://discuss.huggingface.co/t/train-the-best-sentence-embedding-model-ever-with-1b-training-pairs/7354). We benefited from efficient hardware infrastructure to run the project: 7 TPUs v3-8, as well as intervention from Googles Flax, JAX, and Cloud team member about efficient deep learning frameworks.
110
 
111
+ ## Intended uses
112
 
113
+ Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures
114
+ the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
115
+
116
+ By default, input text longer than 256 word pieces is truncated.
117
+
118
+
119
+ ## Training procedure
120
+
121
+ ### Pre-training
122
+
123
+ We use the pretrained [`nreimers/MiniLM-L6-H384-uncased`](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) model. Please refer to the model card for more detailed information about the pre-training procedure.
124
+
125
+ ### Fine-tuning
126
+
127
+ We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each possible sentence pairs from the batch.
128
+ We then apply the cross entropy loss by comparing with true pairs.
129
+
130
+ #### Hyper parameters
131
+
132
+ We trained ou model on a TPU v3-8. We train the model during 100k steps using a batch size of 1024 (128 per TPU core).
133
+ We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
134
+ a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.
135
+
136
+ #### Training data
137
+
138
+ We use the concatenation from multiple datasets to fine-tune our model. The total number of sentence pairs is above 1 billion sentences.
139
+ We sampled each dataset given a weighted probability which configuration is detailed in the `data_config.json` file.
140
 
 
 
 
 
 
 
 
141
 
142
+ | Dataset | Paper | Number of training tuples |
143
+ |--------------------------------------------------------|:----------------------------------------:|:--------------------------:|
144
+ | [Reddit comments (2015-2018)](https://github.com/PolyAI-LDN/conversational-datasets/tree/master/reddit) | [paper](https://arxiv.org/abs/1904.06472) | 726,484,430 |
145
+ | [S2ORC](https://github.com/allenai/s2orc) Citation pairs (Abstracts) | [paper](https://aclanthology.org/2020.acl-main.447/) | 116,288,806 |
146
+ | [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs | [paper](https://doi.org/10.1145/2623330.2623677) | 77,427,422 |
147
+ | [PAQ](https://github.com/facebookresearch/PAQ) (Question, Answer) pairs | [paper](https://arxiv.org/abs/2102.07033) | 64,371,441 |
148
+ | [S2ORC](https://github.com/allenai/s2orc) Citation pairs (Titles) | [paper](https://aclanthology.org/2020.acl-main.447/) | 52,603,982 |
149
+ | [S2ORC](https://github.com/allenai/s2orc) (Title, Abstract) | [paper](https://aclanthology.org/2020.acl-main.447/) | 41,769,185 |
150
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title, Body) pairs | - | 25,316,456 |
151
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title+Body, Answer) pairs | - | 21,396,559 |
152
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) (Title, Answer) pairs | - | 21,396,559 |
153
+ | [MS MARCO](https://microsoft.github.io/msmarco/) triplets | [paper](https://doi.org/10.1145/3404835.3462804) | 9,144,553 |
154
+ | [GOOAQ: Open Question Answering with Diverse Answer Types](https://github.com/allenai/gooaq) | [paper](https://arxiv.org/pdf/2104.08727.pdf) | 3,012,496 |
155
+ | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Title, Answer) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 1,198,260 |
156
+ | [Code Search](https://huggingface.co/datasets/code_search_net) | - | 1,151,414 |
157
+ | [COCO](https://cocodataset.org/#home) Image captions | [paper](https://link.springer.com/chapter/10.1007%2F978-3-319-10602-1_48) | 828,395|
158
+ | [SPECTER](https://github.com/allenai/specter) citation triplets | [paper](https://doi.org/10.18653/v1/2020.acl-main.207) | 684,100 |
159
+ | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Question, Answer) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 681,164 |
160
+ | [Yahoo Answers](https://www.kaggle.com/soumikrakshit/yahoo-answers-dataset) (Title, Question) | [paper](https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html) | 659,896 |
161
+ | [SearchQA](https://huggingface.co/datasets/search_qa) | [paper](https://arxiv.org/abs/1704.05179) | 582,261 |
162
+ | [Eli5](https://huggingface.co/datasets/eli5) | [paper](https://doi.org/10.18653/v1/p19-1346) | 325,475 |
163
+ | [Flickr 30k](https://shannon.cs.illinois.edu/DenotationGraph/) | [paper](https://transacl.org/ojs/index.php/tacl/article/view/229/33) | 317,695 |
164
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (titles) | | 304,525 |
165
+ | AllNLI ([SNLI](https://nlp.stanford.edu/projects/snli/) and [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) | [paper SNLI](https://doi.org/10.18653/v1/d15-1075), [paper MultiNLI](https://doi.org/10.18653/v1/n18-1101) | 277,230 |
166
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (bodies) | | 250,519 |
167
+ | [Stack Exchange](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_xml) Duplicate questions (titles+bodies) | | 250,460 |
168
+ | [Sentence Compression](https://github.com/google-research-datasets/sentence-compression) | [paper](https://www.aclweb.org/anthology/D13-1155/) | 180,000 |
169
+ | [Wikihow](https://github.com/pvl/wikihow_pairs_dataset) | [paper](https://arxiv.org/abs/1810.09305) | 128,542 |
170
+ | [Altlex](https://github.com/chridey/altlex/) | [paper](https://aclanthology.org/P16-1135.pdf) | 112,696 |
171
+ | [Quora Question Triplets](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs) | - | 103,663 |
172
+ | [Simple Wikipedia](https://cs.pomona.edu/~dkauchak/simplification/) | [paper](https://www.aclweb.org/anthology/P11-2117/) | 102,225 |
173
+ | [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions) | [paper](https://transacl.org/ojs/index.php/tacl/article/view/1455) | 100,231 |
174
+ | [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) | [paper](https://aclanthology.org/P18-2124.pdf) | 87,599 |
175
+ | [TriviaQA](https://huggingface.co/datasets/trivia_qa) | - | 73,346 |
176
+ | **Total** | | **1,170,060,424** |
checkpoint-6027/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4733fd7d4f13b14f19647c36ba4fa4454d7f3de192d83b1afb82511a84823e23
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20f54df4e8c9bd1866a8ac7adca8e778928f22e625414141a00dd280c85a20bf
3
  size 90866120
checkpoint-6027/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c886a67f3900eef2655cca574218b61d5a1cd40487a321e085fec5a326ae7540
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f79ccb443804e2bba2882215d428bb3b09c351e80f0ff1716e374562ceb1a13
3
  size 180607738
checkpoint-6027/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8c8132ebefa53f250b44c875ba8e0ff411e4d800e0e9e5925eb8ae5a49fcd489
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2940e9c82b14997acafbac7306669587243bb5d36e9e8963c06bf27bc987b68a
3
  size 14244
checkpoint-6027/trainer_state.json CHANGED
@@ -11,127 +11,127 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  }
137
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  }
137
  ],
checkpoint-6027/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-6314/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ffb9d234ce16eb62bfaa4221636dafbdd629621a3d5d6c5fd9ab84b3b0b1b1a6
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7603e4168414f34ed2739a8584bcc72a715b6bfe1ba92ddf4b2b0c93c307d714
3
  size 90866120
checkpoint-6314/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9a87cfb151fe8553d7bf1bc304afb20d87b6535e8731fa1a235ed08ffc7a2fe6
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce526795569446586023b13e812d7b1123d70054718bfb5f33ac2c43f985ac89
3
  size 180607738
checkpoint-6314/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:19e881a8dd55719ee21e204acb99a2911d67016f769fb6b3b7466fafbaf0f9cd
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f0b7f5add7ad3f61f7f1a7b12ca6fe6f0f06d7c6730356f74df585b3dfe72cc
3
  size 14244
checkpoint-6314/trainer_state.json CHANGED
@@ -11,133 +11,133 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  }
143
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  }
143
  ],
checkpoint-6314/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-6601/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f34e4e0514126e422d0077461ee3fdf8da8668c43e97aaf363dc673364948c6f
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a88576850a220374dc780a89afe685efa8336d62628e1c336b8039e2f50ae536
3
  size 90866120
checkpoint-6601/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:63e359bde6424914d379e217b4ba17a6f2d39e97e20cd4836b7d77b4ca84d62a
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10134f6ecfcfe9238c08f272279b4c12c95c3a6e63bee9d0231bb1adf6d7fbc5
3
  size 180607738
checkpoint-6601/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d798f414d39bf019c39269ef4f56532a529aab04afb3a417a5ff16b7ed0ad786
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fed85929eff002edec8edb87bbeb1690f47728b450557a98d09554980c227b55
3
  size 14244
checkpoint-6601/trainer_state.json CHANGED
@@ -11,139 +11,139 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
- "loss": 2.2257,
147
  "step": 6601
148
  }
149
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
+ "loss": 1.8756,
147
  "step": 6601
148
  }
149
  ],
checkpoint-6601/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-6888/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4ed9000929251e2f6f2419aac4b88167e4e0264b576d3bc6a2d1bb026eac5bd0
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:078d04cb03d68f9e5dff1130f70639157bcb8034213f6aa18462db597752e46a
3
  size 90866120
checkpoint-6888/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:49e950c4c4252d5d35e86d5eb4237b5d9fe84e76c44f29148682318bcc0019b4
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb257272b2289651d45c98c388902a212eec396152c408d9bccb997a4e5562ec
3
  size 180607738
checkpoint-6888/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:02dd0d13161623f5c7e3f1f8f092881967f367c62ffd2bdebb58fb8c30515275
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:911230d9c38583f3cb3f09b5e1e903f4caa43946dd9edc12868a8d0c7278e233
3
  size 14244
checkpoint-6888/trainer_state.json CHANGED
@@ -11,145 +11,145 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
- "loss": 2.2257,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
- "loss": 2.2275,
153
  "step": 6888
154
  }
155
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
+ "loss": 1.8756,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
+ "loss": 1.8628,
153
  "step": 6888
154
  }
155
  ],
checkpoint-6888/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-7175/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:70399412ee3a628d0f3b4610a75eb6d34bbbfd5af078a54428cf923428ede87c
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6a6c4bf0efc02a0db5d1472f33a7aaa0f2ff829f9971cf066d3376e2476cb5c
3
  size 90866120
checkpoint-7175/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:35df210873e2c02a975d32ef197ce30b942c815e6ae93698e6ed9111688677e9
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e6220a11df2196cd429e35fe93dd80caa3cedff8fb116bd2880af16b36decaa
3
  size 180607738
checkpoint-7175/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bfad4aa84c0d92f9d5ccb73464e7c346e61b36684ece175fbbc2d6232d4d2ec1
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a7f17d63e38b858a3d37dac0036826bba2af636fb4fa93fc576c379cfd6fffe
3
  size 14244
checkpoint-7175/trainer_state.json CHANGED
@@ -11,151 +11,151 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
- "loss": 2.2257,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
- "loss": 2.2275,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
- "loss": 2.2225,
159
  "step": 7175
160
  }
161
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
+ "loss": 1.8756,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
+ "loss": 1.8628,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
+ "loss": 1.8646,
159
  "step": 7175
160
  }
161
  ],
checkpoint-7175/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-7462/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:93de1f250f3d2b213443d49f9a6bc58b9bc6136b5360cd3981314985ac99869e
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b31b5739395450e8d78ba38415ebd32a48cded12a0eab4ef4b1ee8ca80de599d
3
  size 90866120
checkpoint-7462/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:95db1db9cba8df01031b08118df90149273beaa66c8a2f4420b573054b006036
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6dccce14eb6c0f7e10922e01b64e0fb4825d650f88326d7d075eabb9f174700
3
  size 180607738
checkpoint-7462/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:989c0e225ca35a25d7e972b25c493513a6c249b1701d9a4935578c8df15b4a84
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb1f2da8cac05205dd34f3f7b221d677f0c357fd01483cd650cd765a772b2153
3
  size 14244
checkpoint-7462/trainer_state.json CHANGED
@@ -11,157 +11,157 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
- "loss": 2.2257,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
- "loss": 2.2275,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
- "loss": 2.2225,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
- "loss": 2.2166,
165
  "step": 7462
166
  }
167
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
+ "loss": 1.8756,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
+ "loss": 1.8628,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
+ "loss": 1.8646,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
+ "loss": 1.8658,
165
  "step": 7462
166
  }
167
  ],
checkpoint-7462/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-7749/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:21ce4eca180873fb16cdb6d9bff483078203550177dabd4ed16bf462c7493af4
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b444106f50fcd3652e3461264e24871a21b434c69b6dd7394f4608a08ece9150
3
  size 90866120
checkpoint-7749/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:254e2ba1d029861e2f7e1b7d917e6ab876636c3c0af898b8979e9336459deaeb
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9571e9f629af0baf9a47642bd01642d67b1feedf519f46e4793c0dbe811ec42e
3
  size 180607738
checkpoint-7749/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eeca4e1dc911e6617de8c0b5c3b6ee333073453ada941b2b075b2c148e12374d
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c72df822af79394695d06839f663699941513a397043882e7a621ffa80367cf
3
  size 14244
checkpoint-7749/trainer_state.json CHANGED
@@ -11,163 +11,163 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
- "loss": 2.2257,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
- "loss": 2.2275,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
- "loss": 2.2225,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
- "loss": 2.2166,
165
  "step": 7462
166
  },
167
  {
168
  "epoch": 27.0,
169
  "learning_rate": 7.8125e-05,
170
- "loss": 2.2174,
171
  "step": 7749
172
  }
173
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
+ "loss": 1.8756,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
+ "loss": 1.8628,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
+ "loss": 1.8646,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
+ "loss": 1.8658,
165
  "step": 7462
166
  },
167
  {
168
  "epoch": 27.0,
169
  "learning_rate": 7.8125e-05,
170
+ "loss": 1.8627,
171
  "step": 7749
172
  }
173
  ],
checkpoint-7749/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-8036/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:094692f052bc3c1da865dd330ea4d8868877c2c651aa4728271543cff9649ebf
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7fc802cbb799cd15bee19b8be5701dab50a2c838bc5accb64015c9f0975cdee
3
  size 90866120
checkpoint-8036/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:efd5c0d40f596b5cc141af43a5a42a15ea6711ede357930782af5e5cdd103831
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec2e64fb9805dc42cdc934799745fa25bd6d13eea4522ef51e97a47689b1371b
3
  size 180607738
checkpoint-8036/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:55ad740d18cd84d3d6ab714ade3167df9b6d81cc8473d8c6af14b20d61391900
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:987e73ef1fd240b0dcfe43b5b9c90ad71172ec8d30a7204e9e74bdf5384f5ce3
3
  size 14244
checkpoint-8036/trainer_state.json CHANGED
@@ -11,169 +11,169 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
- "loss": 2.2257,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
- "loss": 2.2275,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
- "loss": 2.2225,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
- "loss": 2.2166,
165
  "step": 7462
166
  },
167
  {
168
  "epoch": 27.0,
169
  "learning_rate": 7.8125e-05,
170
- "loss": 2.2174,
171
  "step": 7749
172
  },
173
  {
174
  "epoch": 28.0,
175
  "learning_rate": 6.25e-05,
176
- "loss": 2.2188,
177
  "step": 8036
178
  }
179
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
+ "loss": 1.8756,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
+ "loss": 1.8628,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
+ "loss": 1.8646,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
+ "loss": 1.8658,
165
  "step": 7462
166
  },
167
  {
168
  "epoch": 27.0,
169
  "learning_rate": 7.8125e-05,
170
+ "loss": 1.8627,
171
  "step": 7749
172
  },
173
  {
174
  "epoch": 28.0,
175
  "learning_rate": 6.25e-05,
176
+ "loss": 1.8646,
177
  "step": 8036
178
  }
179
  ],
checkpoint-8036/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-8323/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8eb2a5db42fe1825e43b69b178c42992e7d87576776e33758f75592acf8c1f89
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c860bc7aae3f2ed0dde8b2dcad573bb2eb2269628e5ced858bd4df1c975c97df
3
  size 90866120
checkpoint-8323/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2b94a49c3e5c9b3b80c1f296a41d62252434fdc86628b024267ec35270194497
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a0aef967bfddf2936f6a01852f6c6c190dea969640cf4cb109b075be822c59b
3
  size 180607738
checkpoint-8323/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:113c7031f546d1e57f4645de606e8624d51751acbde70de8fdcf580b016726fa
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ccd00e8e6c6f674e45a1098da877938742ac9ae3af263c76c7361a9fda370c0
3
  size 14244
checkpoint-8323/trainer_state.json CHANGED
@@ -11,175 +11,175 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
- "loss": 2.2257,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
- "loss": 2.2275,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
- "loss": 2.2225,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
- "loss": 2.2166,
165
  "step": 7462
166
  },
167
  {
168
  "epoch": 27.0,
169
  "learning_rate": 7.8125e-05,
170
- "loss": 2.2174,
171
  "step": 7749
172
  },
173
  {
174
  "epoch": 28.0,
175
  "learning_rate": 6.25e-05,
176
- "loss": 2.2188,
177
  "step": 8036
178
  },
179
  {
180
  "epoch": 29.0,
181
  "learning_rate": 4.6875e-05,
182
- "loss": 2.2143,
183
  "step": 8323
184
  }
185
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
+ "loss": 1.8756,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
+ "loss": 1.8628,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
+ "loss": 1.8646,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
+ "loss": 1.8658,
165
  "step": 7462
166
  },
167
  {
168
  "epoch": 27.0,
169
  "learning_rate": 7.8125e-05,
170
+ "loss": 1.8627,
171
  "step": 7749
172
  },
173
  {
174
  "epoch": 28.0,
175
  "learning_rate": 6.25e-05,
176
+ "loss": 1.8646,
177
  "step": 8036
178
  },
179
  {
180
  "epoch": 29.0,
181
  "learning_rate": 4.6875e-05,
182
+ "loss": 1.8585,
183
  "step": 8323
184
  }
185
  ],
checkpoint-8323/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e68121a9357c4f016eb6bc0f031c8d8d3f664e26a8b5ed965be82c62d99c0bf
3
  size 4792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f41e362c3bb6d45be0b656b1cdad4a1214468db81442967fe04c0d32b3ce8ef
3
  size 4792
checkpoint-8610/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3e6985a08e870d847d77de4b58e47d5ecb6c35c42629de89616cdee08f45f8aa
3
  size 90866120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b3290fc504eb0df1daeebf81102783b6385fbc82288d4cabea6e9f5df6ce08e
3
  size 90866120
checkpoint-8610/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ddd44f10f5bd38f3e0badf87997f22a43865a97b23e0e42d244561c701ea961b
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:181111d5511842da80729ecc04ad61fe0f751974e95deac2a3fef849bc6d51c7
3
  size 180607738
checkpoint-8610/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0e62c9ee3ff1a49669f8d1d2b974abe793294c16ef64241ea6fcf2d811ad551f
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b318c456e5e209219c261615e08e186ab019d856e4080479990d7c8b68b49e3d
3
  size 14244
checkpoint-8610/trainer_state.json CHANGED
@@ -11,181 +11,181 @@
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
- "loss": 3.0823,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
- "loss": 2.7242,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
- "loss": 2.5348,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
- "loss": 2.4455,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
- "loss": 2.3794,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
- "loss": 2.3375,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
- "loss": 2.3262,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
- "loss": 2.3114,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
- "loss": 2.2921,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
- "loss": 2.2918,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
- "loss": 2.2578,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
- "loss": 2.2693,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
- "loss": 2.2594,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
- "loss": 2.2555,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
- "loss": 2.2481,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
- "loss": 2.2468,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
- "loss": 2.248,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
- "loss": 2.2435,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
- "loss": 2.2319,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
- "loss": 2.2303,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
- "loss": 2.2215,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
- "loss": 2.2256,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
- "loss": 2.2257,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
- "loss": 2.2275,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
- "loss": 2.2225,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
- "loss": 2.2166,
165
  "step": 7462
166
  },
167
  {
168
  "epoch": 27.0,
169
  "learning_rate": 7.8125e-05,
170
- "loss": 2.2174,
171
  "step": 7749
172
  },
173
  {
174
  "epoch": 28.0,
175
  "learning_rate": 6.25e-05,
176
- "loss": 2.2188,
177
  "step": 8036
178
  },
179
  {
180
  "epoch": 29.0,
181
  "learning_rate": 4.6875e-05,
182
- "loss": 2.2143,
183
  "step": 8323
184
  },
185
  {
186
  "epoch": 30.0,
187
  "learning_rate": 3.125e-05,
188
- "loss": 2.2171,
189
  "step": 8610
190
  }
191
  ],
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 0.000484375,
14
+ "loss": 2.8375,
15
  "step": 287
16
  },
17
  {
18
  "epoch": 2.0,
19
  "learning_rate": 0.00046875,
20
+ "loss": 2.4263,
21
  "step": 574
22
  },
23
  {
24
  "epoch": 3.0,
25
  "learning_rate": 0.000453125,
26
+ "loss": 2.2043,
27
  "step": 861
28
  },
29
  {
30
  "epoch": 4.0,
31
  "learning_rate": 0.0004375,
32
+ "loss": 2.0835,
33
  "step": 1148
34
  },
35
  {
36
  "epoch": 5.0,
37
  "learning_rate": 0.000421875,
38
+ "loss": 2.0225,
39
  "step": 1435
40
  },
41
  {
42
  "epoch": 6.0,
43
  "learning_rate": 0.00040625000000000004,
44
+ "loss": 1.9901,
45
  "step": 1722
46
  },
47
  {
48
  "epoch": 7.0,
49
  "learning_rate": 0.000390625,
50
+ "loss": 1.9992,
51
  "step": 2009
52
  },
53
  {
54
  "epoch": 8.0,
55
  "learning_rate": 0.000375,
56
+ "loss": 1.9665,
57
  "step": 2296
58
  },
59
  {
60
  "epoch": 9.0,
61
  "learning_rate": 0.000359375,
62
+ "loss": 1.943,
63
  "step": 2583
64
  },
65
  {
66
  "epoch": 10.0,
67
  "learning_rate": 0.00034375,
68
+ "loss": 1.9327,
69
  "step": 2870
70
  },
71
  {
72
  "epoch": 11.0,
73
  "learning_rate": 0.000328125,
74
+ "loss": 1.9184,
75
  "step": 3157
76
  },
77
  {
78
  "epoch": 12.0,
79
  "learning_rate": 0.0003125,
80
+ "loss": 1.9191,
81
  "step": 3444
82
  },
83
  {
84
  "epoch": 13.0,
85
  "learning_rate": 0.000296875,
86
+ "loss": 1.9074,
87
  "step": 3731
88
  },
89
  {
90
  "epoch": 14.0,
91
  "learning_rate": 0.00028125000000000003,
92
+ "loss": 1.9066,
93
  "step": 4018
94
  },
95
  {
96
  "epoch": 15.0,
97
  "learning_rate": 0.000265625,
98
+ "loss": 1.9053,
99
  "step": 4305
100
  },
101
  {
102
  "epoch": 16.0,
103
  "learning_rate": 0.00025,
104
+ "loss": 1.8906,
105
  "step": 4592
106
  },
107
  {
108
  "epoch": 17.0,
109
  "learning_rate": 0.000234375,
110
+ "loss": 1.8876,
111
  "step": 4879
112
  },
113
  {
114
  "epoch": 18.0,
115
  "learning_rate": 0.00021875,
116
+ "loss": 1.8837,
117
  "step": 5166
118
  },
119
  {
120
  "epoch": 19.0,
121
  "learning_rate": 0.00020312500000000002,
122
+ "loss": 1.8766,
123
  "step": 5453
124
  },
125
  {
126
  "epoch": 20.0,
127
  "learning_rate": 0.0001875,
128
+ "loss": 1.8701,
129
  "step": 5740
130
  },
131
  {
132
  "epoch": 21.0,
133
  "learning_rate": 0.000171875,
134
+ "loss": 1.8698,
135
  "step": 6027
136
  },
137
  {
138
  "epoch": 22.0,
139
  "learning_rate": 0.00015625,
140
+ "loss": 1.8713,
141
  "step": 6314
142
  },
143
  {
144
  "epoch": 23.0,
145
  "learning_rate": 0.00014062500000000002,
146
+ "loss": 1.8756,
147
  "step": 6601
148
  },
149
  {
150
  "epoch": 24.0,
151
  "learning_rate": 0.000125,
152
+ "loss": 1.8628,
153
  "step": 6888
154
  },
155
  {
156
  "epoch": 25.0,
157
  "learning_rate": 0.000109375,
158
+ "loss": 1.8646,
159
  "step": 7175
160
  },
161
  {
162
  "epoch": 26.0,
163
  "learning_rate": 9.375e-05,
164
+ "loss": 1.8658,
165
  "step": 7462
166
  },
167
  {
168
  "epoch": 27.0,
169
  "learning_rate": 7.8125e-05,
170
+ "loss": 1.8627,
171
  "step": 7749
172
  },
173
  {
174
  "epoch": 28.0,
175
  "learning_rate": 6.25e-05,
176
+ "loss": 1.8646,
177
  "step": 8036
178
  },
179
  {
180
  "epoch": 29.0,
181
  "learning_rate": 4.6875e-05,
182
+ "loss": 1.8585,
183
  "step": 8323
184
  },
185
  {
186
  "epoch": 30.0,
187
  "learning_rate": 3.125e-05,
188
+ "loss": 1.8598,
189
  "step": 8610
190
  }
191
  ],