Commit
•
c5890d6
1
Parent(s):
cb9be94
Update README.md (#2)
Browse files- Update README.md (058979f6127b341ff455d429cbe9c590be829e44)
Co-authored-by: Louis Brulé Naudet <louisbrulenaudet@users.noreply.huggingface.co>
README.md
CHANGED
@@ -5,14 +5,22 @@ tags:
|
|
5 |
- feature-extraction
|
6 |
- sentence-similarity
|
7 |
- transformers
|
8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
-
#
|
12 |
|
13 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
14 |
|
15 |
-
|
|
|
|
|
16 |
|
17 |
## Usage (Sentence-Transformers)
|
18 |
|
@@ -28,7 +36,7 @@ Then you can use the model like this:
|
|
28 |
from sentence_transformers import SentenceTransformer
|
29 |
sentences = ["This is an example sentence", "Each sentence is converted"]
|
30 |
|
31 |
-
model = SentenceTransformer(
|
32 |
embeddings = model.encode(sentences)
|
33 |
print(embeddings)
|
34 |
```
|
@@ -51,8 +59,8 @@ def cls_pooling(model_output, attention_mask):
|
|
51 |
sentences = ['This is an example sentence', 'Each sentence is converted']
|
52 |
|
53 |
# Load model from HuggingFace Hub
|
54 |
-
tokenizer = AutoTokenizer.from_pretrained(
|
55 |
-
model = AutoModel.from_pretrained(
|
56 |
|
57 |
# Tokenize sentences
|
58 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
@@ -68,15 +76,6 @@ print("Sentence embeddings:")
|
|
68 |
print(sentence_embeddings)
|
69 |
```
|
70 |
|
71 |
-
|
72 |
-
|
73 |
-
## Evaluation Results
|
74 |
-
|
75 |
-
<!--- Describe how your model was evaluated -->
|
76 |
-
|
77 |
-
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
|
78 |
-
|
79 |
-
|
80 |
## Training
|
81 |
The model was trained with the parameters:
|
82 |
|
@@ -96,7 +95,6 @@ Parameters of the fit()-Method:
|
|
96 |
{
|
97 |
"epochs": 1,
|
98 |
"evaluation_steps": 0,
|
99 |
-
"evaluator": "NoneType",
|
100 |
"max_grad_norm": 1,
|
101 |
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
|
102 |
"optimizer_params": {
|
@@ -120,4 +118,13 @@ SentenceTransformer(
|
|
120 |
|
121 |
## Citing & Authors
|
122 |
|
123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- feature-extraction
|
6 |
- sentence-similarity
|
7 |
- transformers
|
8 |
+
- doping
|
9 |
+
- anti-doping
|
10 |
+
pretty_name: Domain-adapted GTE for anti-doping practice
|
11 |
+
license: apache-2.0
|
12 |
+
language:
|
13 |
+
- en
|
14 |
+
library_name: sentence-transformers
|
15 |
---
|
16 |
|
17 |
+
# Domain-adapted GTE for anti-doping practice
|
18 |
|
19 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
20 |
|
21 |
+
Pretrained transformers model on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including information retrieval, semantic textual similarity, text reranking, etc. Fitted using Transformer-based Sequential Denoising Auto-Encoder for unsupervised sentence embedding learning with one objective : anti-doping domain adaptation.
|
22 |
+
|
23 |
+
This way, the model learns an inner representation of the anti-doping language in the training set that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the model as inputs.
|
24 |
|
25 |
## Usage (Sentence-Transformers)
|
26 |
|
|
|
36 |
from sentence_transformers import SentenceTransformer
|
37 |
sentences = ["This is an example sentence", "Each sentence is converted"]
|
38 |
|
39 |
+
model = SentenceTransformer("timotheeplanes/anti-doping-gte-base")
|
40 |
embeddings = model.encode(sentences)
|
41 |
print(embeddings)
|
42 |
```
|
|
|
59 |
sentences = ['This is an example sentence', 'Each sentence is converted']
|
60 |
|
61 |
# Load model from HuggingFace Hub
|
62 |
+
tokenizer = AutoTokenizer.from_pretrained("timotheeplanes/anti-doping-gte-base")
|
63 |
+
model = AutoModel.from_pretrained("timotheeplanes/anti-doping-gte-base")
|
64 |
|
65 |
# Tokenize sentences
|
66 |
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
|
|
76 |
print(sentence_embeddings)
|
77 |
```
|
78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
## Training
|
80 |
The model was trained with the parameters:
|
81 |
|
|
|
95 |
{
|
96 |
"epochs": 1,
|
97 |
"evaluation_steps": 0,
|
|
|
98 |
"max_grad_norm": 1,
|
99 |
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
|
100 |
"optimizer_params": {
|
|
|
118 |
|
119 |
## Citing & Authors
|
120 |
|
121 |
+
If you use this code in your research, please use the following BibTeX entry.
|
122 |
+
|
123 |
+
```BibTeX
|
124 |
+
@misc{louisbrulenaudet2023,
|
125 |
+
author = {Brulé Naudet (L.), Planes (T.).},
|
126 |
+
title = {Domain-adapted GTE for anti-doping practice},
|
127 |
+
year = {2023}
|
128 |
+
howpublished = {\url{https://huggingface.co/timotheeplanes/anti-doping-gte-base}},
|
129 |
+
}
|
130 |
+
```
|