Snowflake
/

snowflake-arctic-instruct

Text Generation

Mixture of Experts

Model card Files Files and versions Community

snowflake-arctic-instruct / README.md

jeffra's picture

Update README.md

1f9a0a0 verified 10 months ago

|

3.22 kB

	---
	license: apache-2.0
	tags:
	- snowflake
	- arctic
	- moe
	---

	## Model Details

	Arctic is a Dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI
	Research Team. We are releasing model checkpoints for both the base and instruct-tuned versions of
	Arctic under an Apache-2.0 license. This means you can use them freely in your own research,
	prototypes, and products. Please see our blog [Snowflake Arctic: Efficient Intelligence, Truly Open]()
	for more information on Arctic and links to other relevant resources such as our series of cookbooks
	covering topics around training your own custom MoE models, how to produce high-quality training data,
	and much more.

	* [Arctic-Base](link-here)
	* [Acrtic-Instruct](link-to-instruct)

	Model developers Snowflake

	License Apache-2.0

	Input Models input text only.

	Output Models generate text and code only.

	Model Release Date April, 24th 2024.

	## Model Architecture

	Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B
	total and 17B active parameters chosen using a top-2 gating. For more details about Arctic's model
	architecture please see our cookbook


	## Usage

	As of 4/24/2024 we are actively working with the maintainers of `transformers` to include the Arctic
	model implementation. Until this support is released please follow these instructions to get the
	required dependencies for using Arctic:

	```python
	pip install git+https://github.com/Snowflake-Labs/transformers.git
	```

	Arctic leverages several features from [DeepSpeed](https://github.com/microsoft/DeepSpeed), you will need to
	install the latest version of DeepSpeed to get all of these required features:

	```python
	pip install "deepspeed>=0.15.0"
	```

	### Inference

	To get the best performance with Arctic we highly recommend using TRT-LLM or vLLM for inference. However you
	can also use `transformers` to load
	the model for text generation. Due to the model size we recommend using a single 8xH100 instance from your
	favorite cloud provider such as: AWS [p5.48xlarge](https://aws.amazon.com/ec2/instance-types/p5/),
	Azure [ND96isr_H100_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/nd-h100-v5-series), etc.

	In addition, if you would like to access Acrtic via API we have colloborated with several inference API
	providers to host Acrtic such as AWS, Microsoft Azure, NVIDIA Foundry, Lamini, Perplexity, Replicate and Together.

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("snowflake/arctic")
	model = AutoModelForCausalLM.from_pretrained("snowflake/arctic", device_map="auto", torch_dtype=torch.bfloat16)

	input_text = "Hello my name is "
	input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

	outputs = model.generate(**input_ids, max_new_tokens=20)
	print(tokenizer.decode(outputs[0]))
	```

	### Fine-Tuning

	TODO: add link and extra details about fine-tuning scripts

	## Metrics

	TODO: add summary of metrics here, we don't necessarily need to compare to others but we can if we want

	## Training Data

	TODO: add short description and links to training data related cookbook(s)