File size: 4,599 Bytes

6ca711b

---

license: other
license_name: yi-license
license_link: LICENSE
base_model: 01-ai/Yi-34B-200K
inference: false
model_creator: 01-ai
model_name: Yi-34B-200K
model_type: yi
prompt_template: '{system}\n\nHuman:\n{prompt}\n\nAssistant:\n'
quantized_by: Sombressoul
---

<div align="center">
<img src="./Yi.svg" width="200px">
</div>

# Yi-34B-200K - AWQ
- Model creator: [01-ai](https://huggingface.co/01-ai)
- Original model: [`Yi-34B-200K`](https://huggingface.co/01-ai/Yi-34B-200K)

This is a quantized (AWQ) version of [`Yi-34B-200K`](https://huggingface.co/01-ai/Yi-34B-200K).

For more information about the model, see the original page.

## Quantization
Quantization was performed using [casper-hansen/AutoAWQ](https://github.com/casper-hansen/AutoAWQ).

The Orca dataset was used to weigh the attention activations.

**Quantization code**
```python

from awq            import AutoAWQForCausalLM

from datasets       import load_dataset

from random         import randint

from transformers   import AutoTokenizer



dataset_size    = 20_000

model_path      = '/PATH/TO/ORIGINAL/YI-34B-200K'

quant_path      = 'quant'

quant_config    = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

model           = AutoAWQForCausalLM.from_pretrained(model_path, trust_remote_code=True)

tokenizer       = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)



def load_orca():

    data = load_dataset('Open-Orca/OpenOrca', split='train')

    def concatenate_data(x):

        s   = str(x['system_prompt']).strip()

        q   = str(x['question']).strip().split('Answer:')[0].strip()

        a   = str(x['response']).strip()

        return '{}\n\nHuman:\n{}\n\nAssistant:\n{}\n'.format(s, q, a)

    concatenated = [concatenate_data(data[randint(0, len(data))]) for _ in range(dataset_size)]

    return [text for text in concatenated]



model.quantize(tokenizer, quant_config=quant_config, calib_data=load_orca())



model.save_quantized(quant_path)

tokenizer.save_pretrained(quant_path)

```

The model has been successfully tested.

## Inferencing
The model feels great on the RTX4090 (consuming more than 95% of VRAM), but get ready to be patient with a relatively large context size.

During testing, the following query template was used:
```python

# ...

template    = f'{system_prompt}\n\nHuman:\n{user_prompt}\n\nAssistant:\n'

# ...

```

**Sample output**

An example of the model output (in this case, to the model was submitted a text (consists of instructions, examples and a user's query) in the total amount of ~2k tokens (hence the long processing time of the request)):
```plaintext

User's request:

Prove that there is a connection between nails and hunger.





Model's response:

Task Statement: Argumentatively prove or refute the following logical statement: "There is a connection between nails and hunger."



Logical Reasoning Solution:

1. Analyze the Task Statement: The task statement asks to argueatively prove or refute the statement "There is a connection between nails and hunger."

2. Define the Terms: Nails refer to the keratin-based protective structures on the tips of fingers and toes. Hunger, on the other hand, is a physiological sensation of lacking food and a drive to consume it.

3. Analyze the Relationship: There is no direct physiological or biological connection between nails and hunger. Nails are not directly related to the sensation of hunger or the regulation of food intake.

4. Exploring Potential Connections: It is important to consider indirect or metaphorical connections. However, there is no established indirect connection between nails and hunger.

5. Analyze the Task: The task is to argueatively prove or refute the statement. Since there is no direct or indirect connection between nails and hunger, the statement is refutable.



Therefore, the statement "There is a connection between nails and hunger" is refutable, as there is no direct or indirect relationship between nails and hunger.

(-------------------------------------------------------------------------------------)

(----> Generated in 684.5489482879639 seconds

(-------------------------------------------------------------------------------------)

```

## Notes
Even though it's a basic model, when it used with the right context, it does a surprisingly good job of executing instructions. However, it requires some polishing (I mean fine-tuning for downstream tasks).

I hope that the authors of the original model will release fine-tuned versions in the near future.