|
--- |
|
license: mit |
|
datasets: |
|
- databricks/databricks-dolly-15k |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- dolly |
|
- dolly-v2 |
|
- instruct |
|
- sharded |
|
inference: False |
|
--- |
|
|
|
# dolly-v2-12b: sharded checkpoint |
|
|
|
<a href="https://colab.research.google.com/gist/pszemraj/6eb7ccce28ea6aa07b8ec86388ac010e/sharded-instruction-model-playground.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
This is a sharded checkpoint (with ~4GB shards) of the `databricks/dolly-v2-12b` model. Refer to the [original model](https://huggingface.co/databricks/dolly-v2-12b) for all details. |
|
|
|
- this enables low-RAM loading, i.e. Colab :) |
|
|
|
## Basic Usage |
|
|
|
|
|
install `transformers`, `accelerate`, and `bitsandbytes`. |
|
|
|
```bash |
|
pip install -U -q transformers bitsandbytes accelerate |
|
``` |
|
|
|
Load the model in 8bit, then [run inference](https://huggingface.co/docs/transformers/generation_strategies#contrastive-search): |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "ethzanalytics/dolly-v2-12b-sharded" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, load_in_8bit=True, device_map="auto", |
|
) |
|
``` |