iambestfeed commited on
Commit
1ad5880
1 Parent(s): 18e83e9

Upload 17 files

Browse files
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - sentence-transformers
5
+ - feature-extraction
6
+ - sentence-similarity
7
+ - transformers
8
+ library_name: generic
9
+ language:
10
+ - vi
11
+ widget:
12
+ - source_sentence: Làm thế nào Đại học Bách khoa Hà Nội thu hút sinh viên quốc tế?
13
+ sentences:
14
+ - >-
15
+ Đại học Bách khoa Hà Nội đã phát triển các chương trình đào tạo bằng tiếng
16
+ Anh để làm cho việc học tại đây dễ dàng hơn cho sinh viên quốc tế.
17
+ - >-
18
+ Môi trường học tập đa dạng và sự hỗ trợ đầy đủ cho sinh viên quốc tế tại Đại
19
+ học Bách khoa Hà Nội giúp họ thích nghi nhanh chóng.
20
+ - Hà Nội có khí hậu mát mẻ vào mùa thu.
21
+ - Các món ăn ở Hà Nội rất ngon và đa dạng.
22
+ license: apache-2.0
23
+ ---
24
+
25
+ # bkai-foundation-models/vietnamese-bi-encoder
26
+
27
+ This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
28
+
29
+ We train the model on a merged training dataset that consists of:
30
+ - MS Macro (translated into Vietnamese)
31
+ - SQuAD v2 (translated into Vietnamese)
32
+ - 80% of the training set from the Legal Text Retrieval Zalo 2021 challenge
33
+
34
+ We use [phobert-base-v2](https://github.com/VinAIResearch/PhoBERT) as the pre-trained backbone.
35
+
36
+ Here are the results on the remaining 20% of the training set from the Legal Text Retrieval Zalo 2021 challenge:
37
+
38
+ | Pretrained Model | Training Datasets | Acc@1 | Acc@10 | Acc@100 | Pre@10 | MRR@10 |
39
+ |-------------------------------|---------------------------------------|:------------:|:-------------:|:--------------:|:-------------:|:-------------:|
40
+ | [Vietnamese-SBERT](https://huggingface.co/keepitreal/vietnamese-sbert) | - | 32.34 | 52.97 | 89.84 | 7.05 | 45.30 |
41
+ | PhoBERT-base-v2 | MSMACRO | 47.81 | 77.19 | 92.34 | 7.72 | 58.37 |
42
+ | PhoBERT-base-v2 | MSMACRO + SQuADv2.0 + 80% Zalo | 73.28 | 93.59 | 98.85 | 9.36 | 80.73 |
43
+
44
+
45
+ <!--- Describe your model here -->
46
+
47
+ ## Usage (Sentence-Transformers)
48
+
49
+ Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
50
+
51
+ ```
52
+ pip install -U sentence-transformers
53
+ ```
54
+
55
+ Then you can use the model like this:
56
+
57
+ ```python
58
+ from sentence_transformers import SentenceTransformer
59
+
60
+ # INPUT TEXT MUST BE ALREADY WORD-SEGMENTED!
61
+ sentences = ["Cô ấy là một người vui_tính .", "Cô ấy cười nói suốt cả ngày ."]
62
+
63
+ model = SentenceTransformer('bkai-foundation-models/vietnamese-bi-encoder')
64
+ embeddings = model.encode(sentences)
65
+ print(embeddings)
66
+ ```
67
+
68
+
69
+ ## Usage (Widget HuggingFace)
70
+ The widget use custom pipeline on top of the default pipeline by adding additional word segmenter before PhobertTokenizer. So you do not need to segment words before using the API:
71
+
72
+ An example could be seen in Hosted inference API.
73
+
74
+
75
+ ## Usage (HuggingFace Transformers)
76
+
77
+ Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
78
+
79
+ ```python
80
+ from transformers import AutoTokenizer, AutoModel
81
+ import torch
82
+
83
+
84
+ #Mean Pooling - Take attention mask into account for correct averaging
85
+ def mean_pooling(model_output, attention_mask):
86
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
87
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
88
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
89
+
90
+
91
+ # Sentences we want sentence embeddings, we could use pyvi, underthesea, RDRSegment to segment words
92
+ sentences = ['Cô ấy là một người vui_tính .', 'Cô ấy cười nói suốt cả ngày .']
93
+
94
+ # Load model from HuggingFace Hub
95
+ tokenizer = AutoTokenizer.from_pretrained('bkai-foundation-models/vietnamese-bi-encoder')
96
+ model = AutoModel.from_pretrained('bkai-foundation-models/vietnamese-bi-encoder')
97
+
98
+ # Tokenize sentences
99
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
100
+
101
+ # Compute token embeddings
102
+ with torch.no_grad():
103
+ model_output = model(**encoded_input)
104
+
105
+ # Perform pooling. In this case, mean pooling.
106
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
107
+
108
+ print("Sentence embeddings:")
109
+ print(sentence_embeddings)
110
+ ```
111
+
112
+ ## Training
113
+
114
+ The model was trained with the parameters:
115
+
116
+ **DataLoader**:
117
+
118
+ `torch.utils.data.dataloader.DataLoader` of length 17584 with parameters:
119
+
120
+ ```
121
+ {'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
122
+ ```
123
+
124
+ **Loss**:
125
+
126
+ `sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss` with parameters:
127
+
128
+ ```
129
+ {'scale': 20.0, 'similarity_fct': 'cos_sim'}
130
+ ```
131
+
132
+ Parameters of the fit()-Method:
133
+
134
+ ```
135
+ {
136
+ "epochs": 15,
137
+ "evaluation_steps": 0,
138
+ "evaluator": "NoneType",
139
+ "max_grad_norm": 1,
140
+ "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
141
+ "optimizer_params": {
142
+ "lr": 2e-05
143
+ },
144
+ "scheduler": "WarmupLinear",
145
+ "steps_per_epoch": null,
146
+ "warmup_steps": 1000,
147
+ "weight_decay": 0.01
148
+ }
149
+ ```
150
+
151
+ ## Full Model Architecture
152
+
153
+ ```
154
+ SentenceTransformer(
155
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: RobertaModel
156
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
157
+ )
158
+ ```
159
+
160
+ ### Please cite our manuscript if this dataset is used for your work
161
+ ```
162
+ @article{duc2024towards,
163
+ title={Towards Comprehensive Vietnamese Retrieval-Augmented Generation and Large Language Models},
164
+ author={Nguyen Quang Duc, Le Hai Son, Nguyen Duc Nhan, Nguyen Dich Nhat Minh, Le Thanh Huong, Dinh Viet Sang},
165
+ journal={arXiv preprint arXiv:2403.01616},
166
+ year={2024}
167
+ }
168
+ ```
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<mask>": 64000
3
+ }
bpe.codes ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "bkai-foundation-models/vietnamese-bi-encoder",
3
+ "architectures": [
4
+ "RobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 258,
17
+ "model_type": "roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "pad_token_id": 1,
21
+ "position_embedding_type": "absolute",
22
+ "tokenizer_class": "PhobertTokenizer",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.39.3",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 64001
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "2.2.2",
4
+ "transformers": "4.32.0",
5
+ "pytorch": "2.0.0+cu117"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48b54fa517ddb1b719a3a3503553941b7ac3f5e461710f5409e596d1af9fe567
3
+ size 540015464
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f8fc64d9afbb9f3b6cc460bb0ae7edbe3d4d95beeef191e310675ff2df1450f
3
+ size 1075428986
rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecd28a54b12bde3537c1ae118d1c686cba3ea30605cf89bfdac2f1c889d3b513
3
+ size 14512
rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:698dd5ce82f28d4ada87e4512bb18826853b15318abbd51003665abb635e7dab
3
+ size 14512
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f60d1d2d190161d57aafae98632eb27f6d5335b31b913c32c549cfeb405fc0a
3
+ size 1064
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "64000": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 256,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "PhobertTokenizer",
53
+ "unk_token": "<unk>"
54
+ }
trainer_state.json ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.7469254634972615,
3
+ "best_model_checkpoint": "checkpoints/bkai-foundation-models-vietnamese-bi-encoder-sts_example-Apr-11_09-40/checkpoint-9",
4
+ "epoch": 0.9230769230769231,
5
+ "eval_steps": 1,
6
+ "global_step": 9,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.1,
13
+ "grad_norm": 1.4329559803009033,
14
+ "learning_rate": 2e-05,
15
+ "loss": 0.2895,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.1,
20
+ "eval_loss": 0.17446795105934143,
21
+ "eval_pearson_cosine": 0.7352753127420032,
22
+ "eval_pearson_dot": 0.686758677756194,
23
+ "eval_pearson_euclidean": 0.7173498836125568,
24
+ "eval_pearson_manhattan": 0.7148706172408587,
25
+ "eval_pearson_max": 0.7352753127420032,
26
+ "eval_runtime": 4.9797,
27
+ "eval_samples_per_second": 20.082,
28
+ "eval_spearman_cosine": 0.7210935594862063,
29
+ "eval_spearman_dot": 0.6923873928081955,
30
+ "eval_spearman_euclidean": 0.7268846214769229,
31
+ "eval_spearman_manhattan": 0.7230502456992293,
32
+ "eval_spearman_max": 0.7268846214769229,
33
+ "eval_steps_per_second": 0.201,
34
+ "step": 1
35
+ },
36
+ {
37
+ "epoch": 0.21,
38
+ "grad_norm": 1.5664677619934082,
39
+ "learning_rate": 1.7500000000000002e-05,
40
+ "loss": 0.3072,
41
+ "step": 2
42
+ },
43
+ {
44
+ "epoch": 0.21,
45
+ "eval_loss": 0.14536823332309723,
46
+ "eval_pearson_cosine": 0.7443285684058452,
47
+ "eval_pearson_dot": 0.7004870662431258,
48
+ "eval_pearson_euclidean": 0.7311397302983574,
49
+ "eval_pearson_manhattan": 0.7298089636403461,
50
+ "eval_pearson_max": 0.7443285684058452,
51
+ "eval_runtime": 4.9136,
52
+ "eval_samples_per_second": 20.352,
53
+ "eval_spearman_cosine": 0.7313692004372047,
54
+ "eval_spearman_dot": 0.6972061883576584,
55
+ "eval_spearman_euclidean": 0.7333137333197118,
56
+ "eval_spearman_manhattan": 0.7343163830872546,
57
+ "eval_spearman_max": 0.7343163830872546,
58
+ "eval_steps_per_second": 0.204,
59
+ "step": 2
60
+ },
61
+ {
62
+ "epoch": 0.31,
63
+ "grad_norm": 1.1925618648529053,
64
+ "learning_rate": 1.5000000000000002e-05,
65
+ "loss": 0.2494,
66
+ "step": 3
67
+ },
68
+ {
69
+ "epoch": 0.31,
70
+ "eval_loss": 0.13241076469421387,
71
+ "eval_pearson_cosine": 0.7475582818227843,
72
+ "eval_pearson_dot": 0.7072187737528184,
73
+ "eval_pearson_euclidean": 0.7395368453244974,
74
+ "eval_pearson_manhattan": 0.7390320701180779,
75
+ "eval_pearson_max": 0.7475582818227843,
76
+ "eval_runtime": 4.9299,
77
+ "eval_samples_per_second": 20.284,
78
+ "eval_spearman_cosine": 0.734443993057669,
79
+ "eval_spearman_dot": 0.7085877823855826,
80
+ "eval_spearman_euclidean": 0.7370326524575067,
81
+ "eval_spearman_manhattan": 0.7358051660754241,
82
+ "eval_spearman_max": 0.7370326524575067,
83
+ "eval_steps_per_second": 0.203,
84
+ "step": 3
85
+ },
86
+ {
87
+ "epoch": 0.41,
88
+ "grad_norm": 0.7575017809867859,
89
+ "learning_rate": 1.25e-05,
90
+ "loss": 0.2055,
91
+ "step": 4
92
+ },
93
+ {
94
+ "epoch": 0.41,
95
+ "eval_loss": 0.12977080047130585,
96
+ "eval_pearson_cosine": 0.7480467529027109,
97
+ "eval_pearson_dot": 0.7098231030406952,
98
+ "eval_pearson_euclidean": 0.744921218850896,
99
+ "eval_pearson_manhattan": 0.7451527031212235,
100
+ "eval_pearson_max": 0.7480467529027109,
101
+ "eval_runtime": 4.8972,
102
+ "eval_samples_per_second": 20.42,
103
+ "eval_spearman_cosine": 0.7384545921278399,
104
+ "eval_spearman_dot": 0.7140993177744387,
105
+ "eval_spearman_euclidean": 0.7418271413459383,
106
+ "eval_spearman_manhattan": 0.7415597680745935,
107
+ "eval_spearman_max": 0.7418271413459383,
108
+ "eval_steps_per_second": 0.204,
109
+ "step": 4
110
+ },
111
+ {
112
+ "epoch": 0.51,
113
+ "grad_norm": 0.5988340973854065,
114
+ "learning_rate": 1e-05,
115
+ "loss": 0.1953,
116
+ "step": 5
117
+ },
118
+ {
119
+ "epoch": 0.51,
120
+ "eval_loss": 0.13172687590122223,
121
+ "eval_pearson_cosine": 0.7481749312063645,
122
+ "eval_pearson_dot": 0.7114101061443809,
123
+ "eval_pearson_euclidean": 0.7487853408979873,
124
+ "eval_pearson_manhattan": 0.7493155754097085,
125
+ "eval_pearson_max": 0.7493155754097085,
126
+ "eval_runtime": 4.9354,
127
+ "eval_samples_per_second": 20.262,
128
+ "eval_spearman_cosine": 0.7410797115192246,
129
+ "eval_spearman_dot": 0.715740017394054,
130
+ "eval_spearman_euclidean": 0.7440876608218527,
131
+ "eval_spearman_manhattan": 0.7416873780450081,
132
+ "eval_spearman_max": 0.7440876608218527,
133
+ "eval_steps_per_second": 0.203,
134
+ "step": 5
135
+ },
136
+ {
137
+ "epoch": 0.62,
138
+ "grad_norm": 0.45404252409935,
139
+ "learning_rate": 7.500000000000001e-06,
140
+ "loss": 0.1955,
141
+ "step": 6
142
+ },
143
+ {
144
+ "epoch": 0.62,
145
+ "eval_loss": 0.13533657789230347,
146
+ "eval_pearson_cosine": 0.7481572181840599,
147
+ "eval_pearson_dot": 0.7119585324914705,
148
+ "eval_pearson_euclidean": 0.7513750385090986,
149
+ "eval_pearson_manhattan": 0.7520214602427692,
150
+ "eval_pearson_max": 0.7520214602427692,
151
+ "eval_runtime": 4.9045,
152
+ "eval_samples_per_second": 20.389,
153
+ "eval_spearman_cosine": 0.7413227781295381,
154
+ "eval_spearman_dot": 0.7178243135774913,
155
+ "eval_spearman_euclidean": 0.7454609871701232,
156
+ "eval_spearman_manhattan": 0.7439235908598911,
157
+ "eval_spearman_max": 0.7454609871701232,
158
+ "eval_steps_per_second": 0.204,
159
+ "step": 6
160
+ },
161
+ {
162
+ "epoch": 0.72,
163
+ "grad_norm": 0.29167646169662476,
164
+ "learning_rate": 5e-06,
165
+ "loss": 0.1817,
166
+ "step": 7
167
+ },
168
+ {
169
+ "epoch": 0.72,
170
+ "eval_loss": 0.1384459137916565,
171
+ "eval_pearson_cosine": 0.7486794628287365,
172
+ "eval_pearson_dot": 0.7128799880836781,
173
+ "eval_pearson_euclidean": 0.7534668223526225,
174
+ "eval_pearson_manhattan": 0.7542705307695821,
175
+ "eval_pearson_max": 0.7542705307695821,
176
+ "eval_runtime": 4.8885,
177
+ "eval_samples_per_second": 20.456,
178
+ "eval_spearman_cosine": 0.7447864773265037,
179
+ "eval_spearman_dot": 0.7191307966079258,
180
+ "eval_spearman_euclidean": 0.7470348434719025,
181
+ "eval_spearman_manhattan": 0.7471259934507699,
182
+ "eval_spearman_max": 0.7471259934507699,
183
+ "eval_steps_per_second": 0.205,
184
+ "step": 7
185
+ },
186
+ {
187
+ "epoch": 0.82,
188
+ "grad_norm": 0.2743995785713196,
189
+ "learning_rate": 2.5e-06,
190
+ "loss": 0.1779,
191
+ "step": 8
192
+ },
193
+ {
194
+ "epoch": 0.82,
195
+ "eval_loss": 0.14035974442958832,
196
+ "eval_pearson_cosine": 0.7493716294557077,
197
+ "eval_pearson_dot": 0.7139035538213145,
198
+ "eval_pearson_euclidean": 0.7549035222973965,
199
+ "eval_pearson_manhattan": 0.755825827892859,
200
+ "eval_pearson_max": 0.755825827892859,
201
+ "eval_runtime": 4.9234,
202
+ "eval_samples_per_second": 20.311,
203
+ "eval_spearman_cosine": 0.7468890035057144,
204
+ "eval_spearman_dot": 0.7218835259697248,
205
+ "eval_spearman_euclidean": 0.7492406729604966,
206
+ "eval_spearman_manhattan": 0.7483170198413056,
207
+ "eval_spearman_max": 0.7492406729604966,
208
+ "eval_steps_per_second": 0.203,
209
+ "step": 8
210
+ },
211
+ {
212
+ "epoch": 0.92,
213
+ "grad_norm": 0.2744670808315277,
214
+ "learning_rate": 0.0,
215
+ "loss": 0.1778,
216
+ "step": 9
217
+ },
218
+ {
219
+ "epoch": 0.92,
220
+ "eval_loss": 0.14107824862003326,
221
+ "eval_pearson_cosine": 0.7498873452997028,
222
+ "eval_pearson_dot": 0.7146359714365732,
223
+ "eval_pearson_euclidean": 0.7556639064226516,
224
+ "eval_pearson_manhattan": 0.7566449947956903,
225
+ "eval_pearson_max": 0.7566449947956903,
226
+ "eval_runtime": 4.89,
227
+ "eval_samples_per_second": 20.45,
228
+ "eval_spearman_cosine": 0.7469254634972615,
229
+ "eval_spearman_dot": 0.7228071790889158,
230
+ "eval_spearman_euclidean": 0.7491738296426602,
231
+ "eval_spearman_manhattan": 0.7486816197567756,
232
+ "eval_spearman_max": 0.7491738296426602,
233
+ "eval_steps_per_second": 0.205,
234
+ "step": 9
235
+ }
236
+ ],
237
+ "logging_steps": 1,
238
+ "max_steps": 9,
239
+ "num_input_tokens_seen": 0,
240
+ "num_train_epochs": 1,
241
+ "save_steps": 1,
242
+ "total_flos": 0.0,
243
+ "train_batch_size": 64,
244
+ "trial_name": null,
245
+ "trial_params": null
246
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:668b6b369a532d848cfe9611573fbe1fe3274136bc4c59d99fb8e9e3c1a4388b
3
+ size 5176
vocab.txt ADDED
The diff for this file is too large to render. See raw diff