Mihaiii commited on
Commit
9daeb12
1 Parent(s): fd5b7be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -64
README.md CHANGED
@@ -1,21 +1,23 @@
1
  ---
 
2
  library_name: sentence-transformers
3
  pipeline_tag: sentence-similarity
4
  tags:
5
  - sentence-transformers
6
  - feature-extraction
7
  - sentence-similarity
8
- - transformers
9
-
10
  ---
 
11
 
12
- # {MODEL_NAME}
13
 
14
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
15
 
16
- <!--- Describe your model here -->
17
 
18
- ## Usage (Sentence-Transformers)
19
 
20
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
21
 
@@ -29,14 +31,14 @@ Then you can use the model like this:
29
  from sentence_transformers import SentenceTransformer
30
  sentences = ["This is an example sentence", "Each sentence is converted"]
31
 
32
- model = SentenceTransformer('{MODEL_NAME}')
33
  embeddings = model.encode(sentences)
34
  print(embeddings)
35
  ```
36
 
37
 
38
 
39
- ## Usage (HuggingFace Transformers)
40
  Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
41
 
42
  ```python
@@ -55,8 +57,8 @@ def mean_pooling(model_output, attention_mask):
55
  sentences = ['This is an example sentence', 'Each sentence is converted']
56
 
57
  # Load model from HuggingFace Hub
58
- tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
59
- model = AutoModel.from_pretrained('{MODEL_NAME}')
60
 
61
  # Tokenize sentences
62
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -72,57 +74,5 @@ print("Sentence embeddings:")
72
  print(sentence_embeddings)
73
  ```
74
 
75
-
76
-
77
- ## Evaluation Results
78
-
79
- <!--- Describe how your model was evaluated -->
80
-
81
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
82
-
83
-
84
- ## Training
85
- The model was trained with the parameters:
86
-
87
- **DataLoader**:
88
-
89
- `torch.utils.data.dataloader.DataLoader` of length 137553 with parameters:
90
- ```
91
- {'batch_size': 64, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
92
- ```
93
-
94
- **Loss**:
95
-
96
- `sentence_transformers.losses.MSELoss.MSELoss`
97
-
98
- Parameters of the fit()-Method:
99
- ```
100
- {
101
- "epochs": 1,
102
- "evaluation_steps": 5000,
103
- "evaluator": "sentence_transformers.evaluation.SequentialEvaluator.SequentialEvaluator",
104
- "max_grad_norm": 1,
105
- "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
106
- "optimizer_params": {
107
- "eps": 1e-06,
108
- "lr": 0.0001
109
- },
110
- "scheduler": "WarmupLinear",
111
- "steps_per_epoch": null,
112
- "warmup_steps": 1000,
113
- "weight_decay": 0.01
114
- }
115
- ```
116
-
117
-
118
- ## Full Model Architecture
119
- ```
120
- SentenceTransformer(
121
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
122
- (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
123
- )
124
- ```
125
-
126
- ## Citing & Authors
127
-
128
- <!--- Describe where people can find more information -->
 
1
  ---
2
+ license: mit
3
  library_name: sentence-transformers
4
  pipeline_tag: sentence-similarity
5
  tags:
6
  - sentence-transformers
7
  - feature-extraction
8
  - sentence-similarity
9
+ - gte
10
+ - mteb
11
  ---
12
+ # gte-micro-v4
13
 
14
+ This is a distill of [gte-tiny](https://huggingface.co/TaylorAI/gte-tiny).
15
 
16
+ ## Intended purpose
17
 
18
+ <span style="color:blue">This model is designed for use in semantic-autocomplete ([click here for demo](https://mihaiii.github.io/semantic-autocomplete/)).</span>
19
 
20
+ ## Usage (Sentence-Transformers) (same as [gte-tiny](https://huggingface.co/TaylorAI/gte-tiny))
21
 
22
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
23
 
 
31
  from sentence_transformers import SentenceTransformer
32
  sentences = ["This is an example sentence", "Each sentence is converted"]
33
 
34
+ model = SentenceTransformer('Mihaiii/gte-micro-v4')
35
  embeddings = model.encode(sentences)
36
  print(embeddings)
37
  ```
38
 
39
 
40
 
41
+ ## Usage (HuggingFace Transformers) (same as [gte-tiny](https://huggingface.co/TaylorAI/gte-tiny))
42
  Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
43
 
44
  ```python
 
57
  sentences = ['This is an example sentence', 'Each sentence is converted']
58
 
59
  # Load model from HuggingFace Hub
60
+ tokenizer = AutoTokenizer.from_pretrained('Mihaiii/gte-micro-v4')
61
+ model = AutoModel.from_pretrained('Mihaiii/gte-micro-v4')
62
 
63
  # Tokenize sentences
64
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
74
  print(sentence_embeddings)
75
  ```
76
 
77
+ ### Limitation (same as [gte-small](https://huggingface.co/thenlper/gte-small))
78
+ This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.