diwank commited on
Commit
dfe1e58
·
1 Parent(s): 09dc26f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -10
README.md CHANGED
@@ -39,11 +39,11 @@ Unfortunately, this does not work for dialog because conversational statements a
39
  5. Diwank loves to eat Ishita's head.
40
 
41
  **Dialog**:
42
- > Diwank: Hey, what are we eating for dinner today?
43
- > Ishita: Already? I thought we just ate lol
44
- > Diwank: Yeah, some of us work hard and get hungy
45
- > Ishita: Okay, what do you want to eat then?
46
- > Diwank: I want to eat out but I am thinking of something light.
47
 
48
  Now, a text/vector/hybrid search would probably match all 5 facts to this conversation but, as you can see, only facts 1 and 2 are relevant. The only way to get the correct fact, right now, is to ask an LLM like gpt-3.5 to "generate a query" for querying the database and then using that for similarity. Unfortunately, there are three big problems with that:
49
  - It adds latency and cost.
@@ -60,6 +60,22 @@ This solves all of the three problems from the "query generation" method from ea
60
 
61
  The "query generation" method is still far superior in quality but is too prohibitive (costly + slow) in normal circumstances and DFE solves that. :)
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ## Usage (Sentence-Transformers)
64
 
65
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
@@ -72,13 +88,34 @@ Then you can use the model like this:
72
 
73
  ```python
74
  from sentence_transformers import SentenceTransformer
75
- sentences = ["This is an example sentence", "Each sentence is converted"]
76
 
77
- model = SentenceTransformer('{MODEL_NAME}')
78
- embeddings = model.encode(sentences)
79
- print(embeddings)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  ```
81
 
 
 
 
 
 
 
82
 
83
  ## Training
84
  The model was trained with the parameters:
@@ -113,6 +150,7 @@ Parameters of the fit()-Method:
113
 
114
  <!--- Describe how your model was evaluated -->
115
 
 
116
 
117
  ## Full Model Architecture
118
  ```
@@ -140,4 +178,6 @@ SentenceTransformer(
140
 
141
  ## Citing & Authors
142
 
143
- <!--- Describe where people can find more information -->
 
 
 
39
  5. Diwank loves to eat Ishita's head.
40
 
41
  **Dialog**:
42
+ > Diwank: Hey, what are we eating for dinner today?
43
+ > Ishita: Already? I thought we just ate lol
44
+ > Diwank: Yeah, some of us work hard and get hungy
45
+ > Ishita: Okay, what do you want to eat then?
46
+ > Diwank: I want to eat out but I am thinking of something light.
47
 
48
  Now, a text/vector/hybrid search would probably match all 5 facts to this conversation but, as you can see, only facts 1 and 2 are relevant. The only way to get the correct fact, right now, is to ask an LLM like gpt-3.5 to "generate a query" for querying the database and then using that for similarity. Unfortunately, there are three big problems with that:
49
  - It adds latency and cost.
 
60
 
61
  The "query generation" method is still far superior in quality but is too prohibitive (costly + slow) in normal circumstances and DFE solves that. :)
62
 
63
+ ## Technical details
64
+
65
+ It inherits the base BERT model and pooling layer from BGE to generate 768-dimensional embeddings for input text.
66
+
67
+ DFE then adds an Asymmetric projection layer with separate dense layers for "dialog" and "fact" inputs:
68
+
69
+ Dialog inputs pass through 2x1536D tanh layers, a dropout layer, and another 1536D tanh layer before projecting back to 768 dimensions.
70
+ Fact inputs pass through similar 1536D tanh layers with dropout before projecting back to 768D.
71
+ This asymmetric architecture allows specialization of the embeddings for relevance matching between dialogs and facts.
72
+
73
+ DFE is trained with a triplet loss using the TripletDistanceMetric.EUCLIDEAN distance function and a margin of 5. It pulls dialog embeddings closer to positively matched fact embeddings, while pushing non-relevant pairs beyond the margin.
74
+
75
+ The model was trained for 12 epochs using the Lion optimizer with 100 warmup steps and a learning rate of 0.0001. No evaluation steps were used during training.
76
+
77
+ This approach teaches DFE to transform dialog and fact embeddings into a joint relevance space optimized for low-latency semantic matching. The specialized projections allow fast approximation of relevant facts for conversational dialog turns.
78
+
79
  ## Usage (Sentence-Transformers)
80
 
81
  Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
 
88
 
89
  ```python
90
  from sentence_transformers import SentenceTransformer
 
91
 
92
+ dialog = """
93
+ Diwank: Hey, what are we eating for dinner today?
94
+ Ishita: Already? I thought we just ate lol
95
+ Diwank: Yeah, some of us work hard and get hungy
96
+ Ishita: Okay, what do you want to eat then?
97
+ Diwank: I want to eat out but I am thinking of something light.
98
+ """.strip()
99
+
100
+ facts = [
101
+ "Diwank likes Sushi.",
102
+ "Ishita does not like unnecessarily-pricey places restaurants",
103
+ "Diwank likes cooking.",
104
+ "Ishita is terrible at cooking.",
105
+ "Diwank loves to eat Ishita's head.",
106
+ ]
107
+
108
+ model = SentenceTransformer("julep-ai/dfe-base-en")
109
+ dialog_embeddings = model.encode({"dialog": dialog})
110
+ fact_embeddings = model.encode([{"fact": fact} for fact in facts])
111
  ```
112
 
113
+ ## Dataset
114
+
115
+ The model was trained on a custom dataset [julep-ai/dfe-stacked_samsum](https://huggingface.co/datasets/julep-ai/dfe-stacked_samsum) that we created from [stacked-summaries/stacked-samsum-1024](https://huggingface.co/datasets/stacked-summaries/stacked-samsum-1024) by:
116
+ 1. Extracting summaries for corresponding dialogs to emulate "facts"
117
+ 2. Then truncating the dialogs to emulate "missing information"
118
+ 3. And then augmenting the dialogs using LLMs to emulate "additional information"
119
 
120
  ## Training
121
  The model was trained with the parameters:
 
150
 
151
  <!--- Describe how your model was evaluated -->
152
 
153
+ TBD
154
 
155
  ## Full Model Architecture
156
  ```
 
178
 
179
  ## Citing & Authors
180
 
181
+ ```
182
+ Diwank Singh Tomer, Julep AI Inc. Dialog Fact Encoder (DFE). https://julep.ai (2023).
183
+ ```