Commit
·
3017f76
1
Parent(s):
4f73b0f
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# **Model Overview**
|
2 |
+
|
3 |
+
As the demand for large language models grows, a common limitation surfaces: their inability to directly search the internet. Although tech giants like Google (with Bard), Bing, and Perplexity are addressing this challenge, their proprietary methods have data logging issues.
|
4 |
+
|
5 |
+
**Introducing Open LLM Search** — A specialized adaptation of Together AI's `llama-2-7b-32k` model, purpose-built for extracting information from web pages. While the model only has a 7 billion parameters, its fine-tuned capabilities and expanded context limit enable it to excel in search tasks.
|
6 |
+
|
7 |
+
**License:** This model uses Meta's Llama 2 license.
|
8 |
+
|
9 |
+
# **Fine-Tuning Process**
|
10 |
+
|
11 |
+
The model's fine tuning involved a synergy of GPT-4 and GPT-4-32k to generate synthetic data. Here is the training workflow used:
|
12 |
+
1. Use GPT-4 to generate a multitude of queries.
|
13 |
+
2. For each query, identify the top five website results from Google.
|
14 |
+
3. Extract content from these websites and use GPT-4-32k for their summarization.
|
15 |
+
4. Record the text and summarizes from GPT-4-32k for fine-tuning.
|
16 |
+
5. Feed the summaries from all five sources with GPT-4 to craft a cohesive response.
|
17 |
+
6. Document both the input and output from GPT-4 for fine-tuning.
|
18 |
+
|
19 |
+
The fine-tuning adopted the format: `<instructions>:`, `<user>:`, and `<assistant>:`.
|
20 |
+
|
21 |
+
# **Getting Started**
|
22 |
+
|
23 |
+
- Experience it firsthand! Check out the live demo [here](https://google.com).
|
24 |
+
- For DIY enthusiasts, explore or self-deploy this solution using our [GitHub repository](https://google.com).
|