masonbarnes commited on
Commit
3017f76
·
1 Parent(s): 4f73b0f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **Model Overview**
2
+
3
+ As the demand for large language models grows, a common limitation surfaces: their inability to directly search the internet. Although tech giants like Google (with Bard), Bing, and Perplexity are addressing this challenge, their proprietary methods have data logging issues.
4
+
5
+ **Introducing Open LLM Search** — A specialized adaptation of Together AI's `llama-2-7b-32k` model, purpose-built for extracting information from web pages. While the model only has a 7 billion parameters, its fine-tuned capabilities and expanded context limit enable it to excel in search tasks.
6
+
7
+ **License:** This model uses Meta's Llama 2 license.
8
+
9
+ # **Fine-Tuning Process**
10
+
11
+ The model's fine tuning involved a synergy of GPT-4 and GPT-4-32k to generate synthetic data. Here is the training workflow used:
12
+ 1. Use GPT-4 to generate a multitude of queries.
13
+ 2. For each query, identify the top five website results from Google.
14
+ 3. Extract content from these websites and use GPT-4-32k for their summarization.
15
+ 4. Record the text and summarizes from GPT-4-32k for fine-tuning.
16
+ 5. Feed the summaries from all five sources with GPT-4 to craft a cohesive response.
17
+ 6. Document both the input and output from GPT-4 for fine-tuning.
18
+
19
+ The fine-tuning adopted the format: `<instructions>:`, `<user>:`, and `<assistant>:`.
20
+
21
+ # **Getting Started**
22
+
23
+ - Experience it firsthand! Check out the live demo [here](https://google.com).
24
+ - For DIY enthusiasts, explore or self-deploy this solution using our [GitHub repository](https://google.com).