julep-ai
/

dfe-base-en

@@ -39,11 +39,11 @@ Unfortunately, this does not work for dialog because conversational statements a
 5. Diwank loves to eat Ishita's head.
 **Dialog**:
-> Diwank: Hey, what are we eating for dinner today?
-> Ishita: Already? I thought we just ate lol
-> Diwank: Yeah, some of us work hard and get hungy
-> Ishita: Okay, what do you want to eat then?
-> Diwank: I want to eat out but I am thinking of something light.
 Now, a text/vector/hybrid search would probably match all 5 facts to this conversation but, as you can see, only facts 1 and 2 are relevant. The only way to get the correct fact, right now, is to ask an LLM like gpt-3.5 to "generate a query" for querying the database and then using that for similarity. Unfortunately, there are three big problems with that:
 - It adds latency and cost.
@@ -60,6 +60,22 @@ This solves all of the three problems from the "query generation" method from ea
 The "query generation" method is still far superior in quality but is too prohibitive (costly + slow) in normal circumstances and DFE solves that. :)
 ## Usage (Sentence-Transformers)
 Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
@@ -72,13 +88,34 @@ Then you can use the model like this:
 ```python
 from sentence_transformers import SentenceTransformer
-sentences = ["This is an example sentence", "Each sentence is converted"]
-model = SentenceTransformer('{MODEL_NAME}')
-embeddings = model.encode(sentences)
-print(embeddings)
 ```
 ## Training
 The model was trained with the parameters:
@@ -113,6 +150,7 @@ Parameters of the fit()-Method:
 <!--- Describe how your model was evaluated -->
 ## Full Model Architecture
 ```
@@ -140,4 +178,6 @@ SentenceTransformer(
 ## Citing & Authors
-<!--- Describe where people can find more information -->

 5. Diwank loves to eat Ishita's head.
 **Dialog**:
+> Diwank: Hey, what are we eating for dinner today?
+> Ishita: Already? I thought we just ate lol
+> Diwank: Yeah, some of us work hard and get hungy
+> Ishita: Okay, what do you want to eat then?
+> Diwank: I want to eat out but I am thinking of something light.
 Now, a text/vector/hybrid search would probably match all 5 facts to this conversation but, as you can see, only facts 1 and 2 are relevant. The only way to get the correct fact, right now, is to ask an LLM like gpt-3.5 to "generate a query" for querying the database and then using that for similarity. Unfortunately, there are three big problems with that:
 - It adds latency and cost.
 The "query generation" method is still far superior in quality but is too prohibitive (costly + slow) in normal circumstances and DFE solves that. :)
+## Technical details
+It inherits the base BERT model and pooling layer from BGE to generate 768-dimensional embeddings for input text.
+DFE then adds an Asymmetric projection layer with separate dense layers for "dialog" and "fact" inputs:
+Dialog inputs pass through 2x1536D tanh layers, a dropout layer, and another 1536D tanh layer before projecting back to 768 dimensions.
+Fact inputs pass through similar 1536D tanh layers with dropout before projecting back to 768D.
+This asymmetric architecture allows specialization of the embeddings for relevance matching between dialogs and facts.
+DFE is trained with a triplet loss using the TripletDistanceMetric.EUCLIDEAN distance function and a margin of 5. It pulls dialog embeddings closer to positively matched fact embeddings, while pushing non-relevant pairs beyond the margin.
+The model was trained for 12 epochs using the Lion optimizer with 100 warmup steps and a learning rate of 0.0001. No evaluation steps were used during training.
+This approach teaches DFE to transform dialog and fact embeddings into a joint relevance space optimized for low-latency semantic matching. The specialized projections allow fast approximation of relevant facts for conversational dialog turns.
 ## Usage (Sentence-Transformers)
 Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
 ```python
 from sentence_transformers import SentenceTransformer
+dialog = """
+Diwank: Hey, what are we eating for dinner today?
+Ishita: Already? I thought we just ate lol
+Diwank: Yeah, some of us work hard and get hungy
+Ishita: Okay, what do you want to eat then?
+Diwank: I want to eat out but I am thinking of something light.
+""".strip()
+facts = [
+  "Diwank likes Sushi.",
+  "Ishita does not like unnecessarily-pricey places restaurants",
+  "Diwank likes cooking.",
+  "Ishita is terrible at cooking.",
+  "Diwank loves to eat Ishita's head.",
+]
+model = SentenceTransformer("julep-ai/dfe-base-en")
+dialog_embeddings = model.encode({"dialog": dialog})
+fact_embeddings = model.encode([{"fact": fact} for fact in facts])
 ```
+## Dataset
+The model was trained on a custom dataset [julep-ai/dfe-stacked_samsum](https://huggingface.co/datasets/julep-ai/dfe-stacked_samsum) that we created from [stacked-summaries/stacked-samsum-1024](https://huggingface.co/datasets/stacked-summaries/stacked-samsum-1024) by:
+1. Extracting summaries for corresponding dialogs to emulate "facts"
+2. Then truncating the dialogs to emulate "missing information"
+3. And then augmenting the dialogs using LLMs to emulate "additional information"
 ## Training
 The model was trained with the parameters:
 <!--- Describe how your model was evaluated -->
+TBD
 ## Full Model Architecture
 ```
 ## Citing & Authors
+```
+Diwank Singh Tomer, Julep AI Inc. Dialog Fact Encoder (DFE). https://julep.ai (2023).
+```