Netta1994 commited on
Commit
96f2176
·
verified ·
1 Parent(s): 243725d

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,291 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ library_name: setfit
4
+ metrics:
5
+ - accuracy
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - setfit
9
+ - sentence-transformers
10
+ - text-classification
11
+ - generated_from_setfit_trainer
12
+ widget:
13
+ - text: 'Reasoning:
14
+
15
+ The answer provides a solid overview of identifying a funnel spider, including
16
+ its dark brown or black body, shiny carapace, and large fangs. These points align
17
+ well with the details in the provided document. However, while the answer includes
18
+ the key features described in the document, it misses a few additional characteristics
19
+ such as the spinnerets, size variations, and geographical habitat that are valuable
20
+ in identifying funnel spiders more comprehensively. Nonetheless, the answer remains
21
+ relevant and concise based on the essential points covered.
22
+
23
+ Evaluation: Good'
24
+ - text: 'Reasoning:
25
+
26
+ The answer provides a comprehensive and accurate description of how to write a
27
+ paper in MLA format. It mentions key points such as setting up 1-inch margins,
28
+ using 12-point font, and double-spacing the text, which are directly aligned with
29
+ the instructions in the provided document. It also addresses creating a running
30
+ header, typing the heading information in the upper left corner, and centering
31
+ the paper''s title, all of which are specified in the document. The answer is
32
+ well-supported by the document, relevant to the question asked, and presented
33
+ concisely.
34
+
35
+
36
+ Final Evaluation: Good
37
+
38
+ Evaluation: Good'
39
+ - text: 'Reasoning:
40
+
41
+ The provided answer offers relevant and practical advice for getting into medical
42
+ school, including focusing on core science subjects, gaining clinical experience,
43
+ and preparing for the MCAT. These points align well with the suggestions in the
44
+ document. However, the answer overlooks several important details such as engaging
45
+ in extracurricular activities, seeking leadership opportunities, and preparing
46
+ a comprehensive application, which the document emphasizes. Including these aspects
47
+ would have made the response more comprehensive and better aligned with the provided
48
+ document.
49
+
50
+
51
+ Evaluation: Good'
52
+ - text: 'Reasoning:
53
+
54
+ The provided answer offers several strategic tips for becoming adept at hide and
55
+ seek. The suggestions include creative hiding strategies like staying in the room
56
+ where the seeker started, using camouflage, and hiding in plain sight, which align
57
+ well with the document''s advice. The document recommends looking for long edges
58
+ to hide behind, using dense curtains, hiding in laundry baskets, and looking for
59
+ multi-colored areas, all of which are echoed in the answer. The advice given is
60
+ concise, relevant, and matches the tips and guidelines from the document.
61
+
62
+
63
+ Final Evaluation: good
64
+
65
+ Evaluation: Good
66
+
67
+
68
+ '
69
+ - text: 'Reasoning:
70
+
71
+ 1. **Context Grounding**: The answer refers to making a saline solution for treating
72
+ a baby''s cough, using a method that significantly deviates from the details provided
73
+ in the document. The document provides specific instructions for a saline solution
74
+ and its administration, but the quantities and steps in the answer do not align
75
+ with the document''s instructions. Additionally, the document does not suggest
76
+ using 2 cups of water and the method of inserting the suction bulb into the baby''s
77
+ nostril about an inch is incorrect and potentially harmful, deviating significantly
78
+ from the recommended depth.
79
+
80
+ 2. **Relevance**: While the answer aims to address how to treat a baby''s cough,
81
+ it includes incorrect measurements and method details, which makes it less reliable
82
+ and potentially unsafe.
83
+
84
+ 3. **Conciseness**: The answer is fairly concise but includes inaccuracies that
85
+ make it untrustworthy.
86
+
87
+
88
+ Due to these points, the answer does not meet the necessary criteria of being
89
+ well-grounded in the document, relevant, or correctly concise.
90
+
91
+
92
+ Final Evaluation: Bad'
93
+ inference: true
94
+ model-index:
95
+ - name: SetFit with BAAI/bge-base-en-v1.5
96
+ results:
97
+ - task:
98
+ type: text-classification
99
+ name: Text Classification
100
+ dataset:
101
+ name: Unknown
102
+ type: unknown
103
+ split: test
104
+ metrics:
105
+ - type: accuracy
106
+ value: 0.8666666666666667
107
+ name: Accuracy
108
+ ---
109
+
110
+ # SetFit with BAAI/bge-base-en-v1.5
111
+
112
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
113
+
114
+ The model has been trained using an efficient few-shot learning technique that involves:
115
+
116
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
117
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
118
+
119
+ ## Model Details
120
+
121
+ ### Model Description
122
+ - **Model Type:** SetFit
123
+ - **Sentence Transformer body:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
124
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
125
+ - **Maximum Sequence Length:** 512 tokens
126
+ - **Number of Classes:** 2 classes
127
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
128
+ <!-- - **Language:** Unknown -->
129
+ <!-- - **License:** Unknown -->
130
+
131
+ ### Model Sources
132
+
133
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
134
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
135
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
136
+
137
+ ### Model Labels
138
+ | Label | Examples |
139
+ |:------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
140
+ | 0 | <ul><li>"Reasoning:\nThe answer provided closely aligns with the specific instructions given in the document about petting a bearded dragon. It correctly mentions using 1 or 2 fingers to gently stroke the dragon's head, lowering your hand slowly to avoid startling it, and washing hands before and after petting to reduce the risk of bacteria transfer. However, the part about using a specific perfume or scent to help the dragon recognize you is not supported by the text and is, in fact, incorrect.\n\nFinal Evaluation: Bad\nResult: Bad\n"</li><li>'Reasoning:\nThe answer provided addresses the physical characteristics of a funnel spider but includes several inaccuracies and deviations from the information in the provided document. Key errors include describing the funnel spider as light brown or gray with a soft, dull carapace, which contradicts the document’s description of a dark brown or black body and a hard, shiny carapace. Additionally, the claim that funnel spiders have 3 non-poisonous fangs pointing sideways is incorrect based on the document, which states that the funnel spider has two large, downward-pointing fangs that are poisonous. The document provides clear and detailed descriptions that should form the basis for an accurate answer.\n\nFinal Evaluation: Bad'</li><li>'Reasoning:\nThe answer provided, "Luis Figo left Barcelona to join Real Madrid," while factually correct according to the provided document, is entirely unrelated to the question "How to Calculate Real Estate Commissions." The document and the answer focus on a historical event in soccer rather than providing any information or calculations related to real estate commissions. \n\nFinal Evaluation: Bad'</li></ul> |
141
+ | 1 | <ul><li>'Reasoning:\nThe answer is well-supported by the document and directly relates to the question of how to hold a note while singing. It addresses key aspects such as breathing techniques, posture, and controlled release of air, all of which are mentioned in the provided document. The answer stays concise and clear, without deviating into unrelated topics, effectively summarizing the necessary steps for holding a note.\n\nFinal result: Good'</li><li>'Reasoning:\nThe answer is well-founded in the provided document and directly relates to the question of how to stop feeling empty. It suggests practical actions like keeping a journal, trying new activities, and making new friends, all of which are discussed in the document. The recommendations in the answer are summarized clearly and are appropriate responses to the question without providing extraneous information.\n\nFinal Evaluation:\nGood'</li><li>'Reasoning:\nThe answer aligns well with the instructions provided in the document and effectively addresses the question of how to dry curly hair. It begins by recommending gently squeezing out excess water, followed by the application of a leave-in conditioner and the use of a wide-tooth comb for detangling, which are all steps mentioned in the document. The answer then advises adding styling products and parting the hair to lift the roots, which helps expedite the air-drying process. The key points from the document are reflected in the answer, ensuring it is contextually grounded and relevant.\n\nEvaluation: Good'</li></ul> |
142
+
143
+ ## Evaluation
144
+
145
+ ### Metrics
146
+ | Label | Accuracy |
147
+ |:--------|:---------|
148
+ | **all** | 0.8667 |
149
+
150
+ ## Uses
151
+
152
+ ### Direct Use for Inference
153
+
154
+ First install the SetFit library:
155
+
156
+ ```bash
157
+ pip install setfit
158
+ ```
159
+
160
+ Then you can load this model and run inference.
161
+
162
+ ```python
163
+ from setfit import SetFitModel
164
+
165
+ # Download from the 🤗 Hub
166
+ model = SetFitModel.from_pretrained("Netta1994/setfit_baai_wikisum_gpt-4o_improved-cot-instructions_chat_few_shot_generated_only_rea")
167
+ # Run inference
168
+ preds = model("Reasoning:
169
+ The answer provides a solid overview of identifying a funnel spider, including its dark brown or black body, shiny carapace, and large fangs. These points align well with the details in the provided document. However, while the answer includes the key features described in the document, it misses a few additional characteristics such as the spinnerets, size variations, and geographical habitat that are valuable in identifying funnel spiders more comprehensively. Nonetheless, the answer remains relevant and concise based on the essential points covered.
170
+ Evaluation: Good")
171
+ ```
172
+
173
+ <!--
174
+ ### Downstream Use
175
+
176
+ *List how someone could finetune this model on their own dataset.*
177
+ -->
178
+
179
+ <!--
180
+ ### Out-of-Scope Use
181
+
182
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
183
+ -->
184
+
185
+ <!--
186
+ ## Bias, Risks and Limitations
187
+
188
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
189
+ -->
190
+
191
+ <!--
192
+ ### Recommendations
193
+
194
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
195
+ -->
196
+
197
+ ## Training Details
198
+
199
+ ### Training Set Metrics
200
+ | Training set | Min | Median | Max |
201
+ |:-------------|:----|:--------|:----|
202
+ | Word count | 58 | 93.6479 | 177 |
203
+
204
+ | Label | Training Sample Count |
205
+ |:------|:----------------------|
206
+ | 0 | 34 |
207
+ | 1 | 37 |
208
+
209
+ ### Training Hyperparameters
210
+ - batch_size: (16, 16)
211
+ - num_epochs: (5, 5)
212
+ - max_steps: -1
213
+ - sampling_strategy: oversampling
214
+ - num_iterations: 20
215
+ - body_learning_rate: (2e-05, 2e-05)
216
+ - head_learning_rate: 2e-05
217
+ - loss: CosineSimilarityLoss
218
+ - distance_metric: cosine_distance
219
+ - margin: 0.25
220
+ - end_to_end: False
221
+ - use_amp: False
222
+ - warmup_proportion: 0.1
223
+ - l2_weight: 0.01
224
+ - seed: 42
225
+ - eval_max_steps: -1
226
+ - load_best_model_at_end: False
227
+
228
+ ### Training Results
229
+ | Epoch | Step | Training Loss | Validation Loss |
230
+ |:------:|:----:|:-------------:|:---------------:|
231
+ | 0.0056 | 1 | 0.2213 | - |
232
+ | 0.2809 | 50 | 0.2497 | - |
233
+ | 0.5618 | 100 | 0.1439 | - |
234
+ | 0.8427 | 150 | 0.0107 | - |
235
+ | 1.1236 | 200 | 0.003 | - |
236
+ | 1.4045 | 250 | 0.0022 | - |
237
+ | 1.6854 | 300 | 0.002 | - |
238
+ | 1.9663 | 350 | 0.0018 | - |
239
+ | 2.2472 | 400 | 0.0017 | - |
240
+ | 2.5281 | 450 | 0.0015 | - |
241
+ | 2.8090 | 500 | 0.0015 | - |
242
+ | 3.0899 | 550 | 0.0014 | - |
243
+ | 3.3708 | 600 | 0.0014 | - |
244
+ | 3.6517 | 650 | 0.0012 | - |
245
+ | 3.9326 | 700 | 0.0013 | - |
246
+ | 4.2135 | 750 | 0.0012 | - |
247
+ | 4.4944 | 800 | 0.0013 | - |
248
+ | 4.7753 | 850 | 0.0012 | - |
249
+
250
+ ### Framework Versions
251
+ - Python: 3.10.14
252
+ - SetFit: 1.1.0
253
+ - Sentence Transformers: 3.1.0
254
+ - Transformers: 4.44.0
255
+ - PyTorch: 2.4.1+cu121
256
+ - Datasets: 2.19.2
257
+ - Tokenizers: 0.19.1
258
+
259
+ ## Citation
260
+
261
+ ### BibTeX
262
+ ```bibtex
263
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
264
+ doi = {10.48550/ARXIV.2209.11055},
265
+ url = {https://arxiv.org/abs/2209.11055},
266
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
267
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
268
+ title = {Efficient Few-Shot Learning Without Prompts},
269
+ publisher = {arXiv},
270
+ year = {2022},
271
+ copyright = {Creative Commons Attribution 4.0 International}
272
+ }
273
+ ```
274
+
275
+ <!--
276
+ ## Glossary
277
+
278
+ *Clearly define terms in order to be accessible across audiences.*
279
+ -->
280
+
281
+ <!--
282
+ ## Model Card Authors
283
+
284
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
285
+ -->
286
+
287
+ <!--
288
+ ## Model Card Contact
289
+
290
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
291
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.44.0",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.0",
4
+ "transformers": "4.44.0",
5
+ "pytorch": "2.4.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "labels": null,
3
+ "normalize_embeddings": false
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec36f1eb276381073abc50769ce1e93527a7091881f91823529b80b8d1859e95
3
+ size 437951328
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd3129265930c3a3d60a1a0ee774ec4d411c45201a554c39a600a7fc3a8ae160
3
+ size 7007
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff