soniox
/

Soniox-7B-v1.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ambroz-soniox commited on Jan 19, 2024

Commit

2b57bed

·

1 Parent(s): 22a39ee

Added info to readme.

Files changed (1) hide show

README.md +41 -0

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+# Model Card for Soniox-7B-v1.0
+Soniox 7B is a powerful large language model. Supports English and code with 8K context.
+Approaches GPT-4 performance on many benchmarks.
+Built on top of Mistral 7B, enhanced with additional pre-training and fine-tuning for strong problem-solving capabilities.
+Apache 2.0 License.
+For more details, please read our [blog post](https://soniox.com/news/soniox-7b).
+## Usage in Transformers
+The model is available in transformers and can be used as follows:
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_path = "soniox/Soniox-7B-v1.0"
+model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+device = "cuda"
+model.to(device)
+messages = [
+    {"role": "user", "content": "12 plus 21?"},
+    {"role": "assistant", "content": "33."},
+    {"role": "user", "content": "Five minus one?"},
+]
+tok_prompt = tokenizer.apply_chat_template(messages, return_tensors="pt")
+model_inputs = tok_prompt.to(device)
+generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
+decoded = tokenizer.batch_decode(generated_ids)
+print(decoded[0])
+```
+## Inference deployment
+Refer to our [documentation](https://docs.soniox.com) for inference with vLLM and other
+deployment options.