|
--- |
|
license: apache-2.0 |
|
tags: |
|
- snowflake |
|
- arctic |
|
- moe |
|
--- |
|
|
|
## Model Details |
|
|
|
Arctic is a Dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI |
|
Research Team. We are releasing model checkpoints for both the base and instruct-tuned versions of |
|
Arctic under an Apache-2.0 license. This means you can use them freely in your own research, |
|
prototypes, and products. Please see our blog [Snowflake Arctic: Efficient Intelligence, Truly Open]() |
|
for more information on Arctic and links to other relevant resources such as our series of cookbooks |
|
covering topics around training your own custom MoE models, how to produce high-quality training data, |
|
and much more. |
|
|
|
* [Arctic-Base](link-here) |
|
* [Acrtic-Instruct](link-to-instruct) |
|
|
|
**Model developers** Snowflake |
|
|
|
**License** Apache-2.0 |
|
|
|
**Input** Models input text only. |
|
|
|
**Output** Models generate text and code only. |
|
|
|
**Model Release Date** April, 24th 2024. |
|
|
|
## Model Architecture |
|
|
|
Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B |
|
total and 17B active parameters chosen using a top-2 gating. For more details about Arctic's model |
|
architecture please see our cookbook |
|
|
|
|
|
## Usage |
|
|
|
As of 4/24/2024 we are actively working with the maintainers of `transformers` to include the Arctic |
|
model implementation. Until this support is released please follow these instructions to get the |
|
required dependencies for using Arctic: |
|
|
|
```python |
|
pip install git+https://github.com/Snowflake-Labs/transformers.git |
|
``` |
|
|
|
Arctic leverages several features from [DeepSpeed](https://github.com/microsoft/DeepSpeed), you will need to |
|
install the latest version of DeepSpeed to get all of these required features: |
|
|
|
```python |
|
pip install "deepspeed>=0.15.0" |
|
``` |
|
|
|
### Inference |
|
|
|
To get the best performance with Arctic we highly recommend using TRT-LLM or vLLM for inference. However you |
|
can also use `transformers` to load |
|
the model for text generation. Due to the model size we recommend using a single 8xH100 instance from your |
|
favorite cloud provider such as: AWS [p5.48xlarge](https://aws.amazon.com/ec2/instance-types/p5/), |
|
Azure [ND96isr_H100_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/nd-h100-v5-series), etc. |
|
|
|
In addition, if you would like to access Acrtic via API we have colloborated with several inference API |
|
providers to host Acrtic such as AWS, Microsoft Azure, NVIDIA Foundry, Lamini, Perplexity, Replicate and Together. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("snowflake/arctic") |
|
model = AutoModelForCausalLM.from_pretrained("snowflake/arctic", device_map="auto", torch_dtype=torch.bfloat16) |
|
|
|
input_text = "Hello my name is " |
|
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
outputs = model.generate(**input_ids, max_new_tokens=20) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
### Fine-Tuning |
|
|
|
TODO: add link and extra details about fine-tuning scripts |
|
|
|
## Metrics |
|
|
|
TODO: add summary of metrics here, we don't necessarily need to compare to others but we can if we want |
|
|
|
## Training Data |
|
|
|
TODO: add short description and links to training data related cookbook(s) |