forrest-gradient commited on
Commit
c155d16
1 Parent(s): 5ebdef2

Update README.md (#3)

Browse files

- Update README.md (5e0dac3a0f38a8d3408b4b6f6c582e461561ab16)

Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -10,7 +10,9 @@ license: llama3
10
  <a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
11
 
12
  # Llama-3 8B Gradient Instruct 1048k
13
- Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. To learn more or collaborate on a custom model, drop us a message at contact@gradient.ai.
 
 
14
 
15
  This model extends LLama-3 8B's context length from 8k to > 1040K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 320M total tokens, which is < 0.002% of Lamma-3's original pre-training data.
16
 
 
10
  <a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
11
 
12
  # Llama-3 8B Gradient Instruct 1048k
13
+ Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. If you're looking to build custom AI models or agents, email us a message contact@gradient.ai.
14
+
15
+ For more info see our [End-to-end development service for custom LLMs and AI systems](https://gradient.ai/development-lab)
16
 
17
  This model extends LLama-3 8B's context length from 8k to > 1040K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 320M total tokens, which is < 0.002% of Lamma-3's original pre-training data.
18