yi-01-ai commited on
Commit
0a0cf9f
1 Parent(s): 1aac23e

Auto Sync from git://github.com/01-ai/Yi.git/commit/d3236fb36b17896131cf4ca544b9aa266d9a4141

Browse files
Files changed (1) hide show
  1. README.md +259 -244
README.md CHANGED
@@ -78,6 +78,14 @@ pipeline_tag: text-generation
78
  - [Base models](#base-models)
79
  - [Other info](#other-info)
80
  - [🎉 News](#-news)
 
 
 
 
 
 
 
 
81
  - [🟢 Why Yi?](#-why-yi)
82
  - [🌎 Ecosystem](#-ecosystem)
83
  - [💦 Upstream](#-upstream)
@@ -85,16 +93,12 @@ pipeline_tag: text-generation
85
  - [🔗 Serving](#-serving)
86
  - [⚙️ Quantitation](#️-quantitation)
87
  - [🛠️ Fine-tuning](#️-fine-tuning)
 
88
  - [📌 Benchmarks](#-benchmarks)
89
  - [📊 Base model performance](#-base-model-performance)
90
  - [📊 Chat model performance](#-chat-model-performance)
91
  - [📊 Quantized chat model performance](#-quantized-chat-model-performance)
92
- - [⛔️ Limitations of chat model](#️-limitations-of-chat-model)
93
  - [🟢 Who can use Yi?](#-who-can-use-yi)
94
- - [🟢 How to use Yi?](#-how-to-use-yi)
95
- - [Quick start](#quick-start)
96
- - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
97
- - [Learning hub](https://github.com/01-ai/Yi/blob/main/docs/learning_hub.md)
98
  - [🟢 Misc.](#-misc)
99
  - [Ackknowledgements](#acknowledgments)
100
  - [📡 Disclaimer](#-disclaimer)
@@ -108,7 +112,7 @@ pipeline_tag: text-generation
108
 
109
  ## 📌 Introduction
110
 
111
- - 🤖 The Yi series models are the next generation of open source large language models trained from scratch by [01.AI](https://01.ai/).
112
 
113
  - 🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example,
114
 
@@ -124,6 +128,8 @@ pipeline_tag: text-generation
124
 
125
  Yi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements.
126
 
 
 
127
  ### Chat models
128
 
129
  | Model | Download
@@ -135,7 +141,7 @@ Yi-34B-Chat | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat)
135
  Yi-34B-Chat-4bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat-4bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat-4bits/summary)
136
  Yi-34B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat-8bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat-8bits/summary)
137
 
138
- <sub><sup> - 4-bit series models are quantized by AWQ. <br> - 8-bit series models are quantized by GPTQ <br> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090).</sup></sub>
139
 
140
  ### Base models
141
 
@@ -150,17 +156,21 @@ Yi-34B-200K|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-200K)
150
 
151
  ### Other info
152
 
153
- For chat models and base models:
154
 
155
- - 6B series models are suitable for personal and academic use.
156
 
157
- - 34B series models suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It's a cost-effective solution that's affordable and equipped with emergent ability.
158
 
159
- - The **default context window** is **4k tokens**.
160
-
161
- - The pretrained tokens are 3T.
 
 
 
 
162
 
163
- - The training data are up to June 2023.
164
 
165
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
166
 
@@ -217,8 +227,236 @@ sequence length and can be extended to 32K during inference time.
217
 
218
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
219
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
220
 
221
  # 🟢 Why Yi?
 
 
 
 
 
 
 
 
 
 
 
 
222
 
223
  ## 🌎 Ecosystem
224
 
@@ -257,7 +495,9 @@ model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-34b", device_map="auto")
257
 
258
  If you want to get up with Yi in a few minutes, you can use the following services built upon Yi.
259
 
260
- - [Yi-34B-Chat](https://platform.lingyiwanwu.com/) (Yi official beta): you can chat with it. **Note** that currently it's available through a whitelist. Welcome to apply (fill out a form in [English](https://cn.mikecrm.com/l91ODJf) or [Chinese](https://cn.mikecrm.com/gnEZjiQ)) and experience it firsthand!
 
 
261
 
262
  - [Yi-6B-Chat (Replicate)](https://replicate.com/01-ai): you can use this model with more options by setting additional parameters and calling APIs.
263
 
@@ -267,7 +507,7 @@ If you want to get up with Yi in a few minutes, you can use the following servic
267
 
268
  If you have limited computational capabilities, you can use Yi's quantized models as follows.
269
 
270
- These quantized models have reduced precision and but offer increased efficiency, such as faster inference speed and smaller RAM usage.
271
 
272
  - [TheBloke/Yi-34B-GPTQ](https://huggingface.co/TheBloke/Yi-34B-GPTQ)
273
  - [TheBloke/Yi-34B-GGUF](https://huggingface.co/TheBloke/Yi-34B-GGUF)
@@ -302,7 +542,6 @@ If you're seeking to explore the diverse capabilities within Yi's thriving famil
302
  - [📊 Base model performance](#-base-model-performance)
303
  - [📊 Chat model performance](#-chat-model-performance)
304
  - [📊 Quantized chat model performance](#-quantized-chat-model-performance)
305
- - [⛔️ Limitations of chat model](#️-limitations-of-chat-model)
306
 
307
  ### 📊 Base model performance
308
 
@@ -363,39 +602,13 @@ Falcon-180B's performance was not underestimated.
363
  | Yi-34B-Chat-8bits(GPTQ) | 66.24 | **73.69** | 79.05 | 81.23 | 76.82 | 78.97 | 61.84 | **52.08** | 70.97 | 70.74 | 75.74 |
364
  | Yi-34B-Chat-4bits(AWQ) | 65.77 | 72.42 | 78.21 | 80.50 | 75.71 | 77.27 | 61.84 | 48.30 | 69.39 | 70.51 | 74.00 |
365
 
366
- We evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA. Generally, the zero-shot approach is more common in chat models. Our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text. Some models are not well-suited to produce output in the specific format required by instructions in few datasets, which leads to suboptimal results.
367
 
368
  <strong>*</strong>: C-Eval results are evaluated on the validation datasets
369
 
370
  ### 📊 Quantized chat model performance
371
 
372
- We also provide both 4-bit (AWQ) and 8-bit (GPTQ) quantized Yi chat models. Evaluation results on various benchmarks have shown that the quantized models have negligible losses. Additionally, they reduce the memory footprint size. After testing different configurations of prompts and generation lengths, we highly recommend following the guidelines in the memory footprint table below when selecting a device to run our models.
373
-
374
- | | batch=1 | batch=4 | batch=16 | batch=32 |
375
- | ----------------------- | ------- | ------- | -------- | -------- |
376
- | Yi-34B-Chat | 65GiB | 68GiB | 76GiB | >80GiB |
377
- | Yi-34B-Chat-8bits(GPTQ) | 35GiB | 37GiB | 46GiB | 58GiB |
378
- | Yi-34B-Chat-4bits(AWQ) | 19GiB | 20GiB | 30GiB | 40GiB |
379
- | Yi-6B-Chat | 12GiB | 13GiB | 15GiB | 18GiB |
380
- | Yi-6B-Chat-8bits(GPTQ) | 7GiB | 8GiB | 10GiB | 14GiB |
381
- | Yi-6B-Chat-4bits(AWQ) | 4GiB | 5GiB | 7GiB | 10GiB |
382
-
383
- Note: All the numbers in the table represent the minimum recommended memory for running models of the corresponding size.
384
-
385
- ### ⛔️ Limitations of chat model
386
-
387
- The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training.
388
-
389
- However, this higher diversity might amplify certain existing issues, including:
390
-
391
- - **Hallucination**: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.
392
- - **Non-determinism in re-generation**: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.
393
- - **Cumulative Error**: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.
394
-
395
- To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as`temperature`,`top_p`, or`top_k`. These adjustments can help in the balance between creativity and coherence in the model's outputs.
396
-
397
- <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
398
-
399
 
400
  # 🟢 Who can use Yi?
401
 
@@ -407,203 +620,6 @@ Everyone! 🙌 ✅
407
 
408
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
409
 
410
- # 🟢 How to use Yi?
411
-
412
- - [Quick start](#quick-start)
413
-
414
- - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
415
-
416
- - [Learning hub](https://github.com/01-ai/Yi/blob/main/docs/learning_hub.md)
417
-
418
- ## Quick start
419
-
420
- [1. Prepare development environment](#1-prepare-development-environment)
421
- <br>[2. Download the model](#2-download-the-model-optional)
422
- <br>[3. Examples](#3-examples)
423
-
424
- ### 1. Prepare development environment
425
-
426
- #### 1.1 Docker
427
- The best approach to try the **Yi** series models is through Docker with GPUs. We
428
- provide the following docker images to help you get started.
429
-
430
- - `registry.lingyiwanwu.com/ci/01-ai/yi:latest`
431
- - `ghcr.io/01-ai/yi:latest`
432
-
433
- Note that the `latest` tag always points to the latest code in the `main`
434
- branch. To test a stable version, please replace it with a specific
435
- [tag](https://github.com/01-ai/Yi/tags).
436
-
437
- #### 1.2 Local development environment
438
- We use [`conda-lock`](https://github.com/conda/conda-lock) to generate fully reproducible lock files for conda environments. You can refer to [conda-lock.yml](./conda-lock.yml) for the exact versions of the dependencies. Additionally, we utilize [`micromamba`](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) for installing these dependencies.
439
-
440
- To install the dependencies, please follow these steps:
441
- 1. Install `micromamba` by following the instructions available [here](https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html).
442
- 2. Execute `micromamba install -y -n yi -f conda-lock.yml` to create a conda environment named `yi` and install the necessary dependencies.
443
-
444
- ### 2. Download the model (optional)
445
-
446
- By default, the model weights and tokenizer will be downloaded from
447
- [Hugging Face](https://huggingface.co/01-ai) automatically in the next step. You
448
- can also download them manually from the following places:
449
-
450
- - [ModelScope](https://www.modelscope.cn/organization/01ai/)
451
- - [WiseModel](https://wisemodel.cn/organization/01.AI)
452
-
453
- ### 3. Examples
454
-
455
- #### 3.1 Use the chat model
456
-
457
- ```python
458
- from transformers import AutoModelForCausalLM, AutoTokenizer
459
-
460
- model_path = '01-ai/Yi-34b-Chat'
461
-
462
- tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
463
-
464
- # Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
465
- model = AutoModelForCausalLM.from_pretrained(
466
- model_path,
467
- device_map="auto",
468
- torch_dtype='auto'
469
- ).eval()
470
-
471
- # Prompt content: "hi"
472
- messages = [
473
- {"role": "user", "content": "hi"}
474
- ]
475
-
476
- input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
477
- output_ids = model.generate(input_ids.to('cuda'))
478
- response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
479
-
480
- # Model response: "Hello! How can I assist you today?"
481
- print(response)
482
- ```
483
-
484
-
485
- To construct the prompt template manually, you can refer the `chat_template` field in the `tokenizer_config.json` [file](https://huggingface.co/01-ai/Yi-34B-Chat/blob/main/tokenizer_config.json#L60).
486
-
487
- ```
488
- <|im_start|>system
489
- {system_message}<|im_end|>
490
- <|im_start|>user
491
- {prompt}<|im_end|>
492
- <|im_start|>assistant
493
- ```
494
-
495
- #### 3.2 Use the base model
496
-
497
- ```bash
498
- python demo/text_generation.py
499
- ```
500
-
501
- To reuse the downloaded models in the previous step, you can provide the extra
502
- `--model` argument:
503
-
504
- ```bash
505
- python demo/text_generation.py --model /path/to/model
506
- ```
507
-
508
- Or if you'd like to get your hands dirty:
509
-
510
- ```python
511
- from transformers import AutoModelForCausalLM, AutoTokenizer
512
-
513
- model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-34B", device_map="auto", torch_dtype="auto")
514
- tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-34B")
515
- inputs = tokenizer("There's a place where time stands still. A place of breath taking wonder, but also", return_tensors="pt")
516
- max_length = 256
517
-
518
- outputs = model.generate(
519
- inputs.input_ids.cuda(),
520
- max_length=max_length,
521
- eos_token_id=tokenizer.eos_token_id,
522
- do_sample=True,
523
- repetition_penalty=1.3,
524
- no_repeat_ngram_size=5,
525
- temperature=0.7,
526
- top_k=40,
527
- top_p=0.8,
528
- )
529
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
530
- ```
531
-
532
- <details>
533
-
534
- <summary>Output</summary>
535
-
536
- **Prompt**: There's a place where time stands still. A place of breath taking wonder, but also
537
-
538
- **Generation**: There's a place where time stands still. A place of breath taking wonder, but also of great danger. A place where the very air you breathe could kill you. A place where the only way to survive is to be prepared.
539
- The place is called the Arctic.
540
- The Arctic is a vast, frozen wilderness. It is a place of extremes. The temperatures can drop to -40 degrees Celsius. The winds can reach speeds of 100 kilometers per hour. The sun can shine for 24 hours a day, or not at all for weeks on end.
541
- The Arctic is also a place of great beauty. The ice and snow are a pristine white. The sky is a deep blue. The sunsets are spectacular.
542
- But the Arctic is also a place of great danger. The ice can be treacherous. The winds can be deadly. The sun can be blinding.
543
- The Arctic is a place where the only way to survive is to be prepared.
544
- The Arctic is a place of extremes. The temperatures can drop to -40 degrees Celsius. The winds can reach speeds of 100 kilometers per hour. The sun can shine for 24 hours a day, or not at all for weeks on end.
545
- The Arctic is a place of great beauty. The ice and snow are a
546
-
547
- </details>
548
-
549
- For more advanced usage, please refer to the
550
- [doc](https://github.com/01-ai/Yi/tree/main/demo).
551
-
552
- #### 3.3 Finetune from the base model
553
-
554
- ```bash
555
- bash finetune/scripts/run_sft_Yi_6b.sh
556
- ```
557
-
558
- Once finished, you can compare the finetuned model and the base model with the following command:
559
-
560
- ```bash
561
- bash finetune/scripts/run_eval.sh
562
- ```
563
-
564
- For more advanced usage like fine-tuning based on your custom data, please refer
565
- the [doc](https://github.com/01-ai/Yi/tree/main/finetune).
566
-
567
- #### 3.4 Quantization
568
-
569
- ##### GPT-Q
570
- ```bash
571
- python quantization/gptq/quant_autogptq.py \
572
- --model /base_model \
573
- --output_dir /quantized_model \
574
- --trust_remote_code
575
- ```
576
-
577
- Once finished, you can then evaluate the resulting model as follows:
578
-
579
- ```bash
580
- python quantization/gptq/eval_quantized_model.py \
581
- --model /quantized_model \
582
- --trust_remote_code
583
- ```
584
-
585
- For a more detailed explanation, please read the [doc](https://github.com/01-ai/Yi/tree/main/quantization/gptq)
586
-
587
- ##### AWQ
588
- ```bash
589
- python quantization/awq/quant_autoawq.py \
590
- --model /base_model \
591
- --output_dir /quantized_model \
592
- --trust_remote_code
593
- ```
594
-
595
- Once finished, you can then evaluate the resulting model as follows:
596
-
597
- ```bash
598
- python quantization/awq/eval_quantized_model.py \
599
- --model /quantized_model \
600
- --trust_remote_code
601
- ```
602
-
603
- For more detailed explanation, please read the [doc](https://github.com/01-ai/Yi/tree/main/quantization/awq)
604
-
605
- <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
606
-
607
  # 🟢 Misc.
608
 
609
  ### Acknowledgments
@@ -661,7 +677,6 @@ as well as any associated data security concerns.
661
 
662
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
663
 
664
-
665
  ### 🪪 License
666
 
667
  The source code in this repo is licensed under the [Apache 2.0
@@ -670,4 +685,4 @@ are fully open for academic research and free commercial usage with permission
670
  via applications. All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt).
671
  For free commercial use, you only need to send an email to [get official commercial permission](https://www.lingyiwanwu.com/yi-license).
672
 
673
- <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
 
78
  - [Base models](#base-models)
79
  - [Other info](#other-info)
80
  - [🎉 News](#-news)
81
+ - [🟢 How to use Yi?](#-how-to-use-yi)
82
+ - [Quick start](#quick-start)
83
+ - [Choose your path](#choose-your-parth)
84
+ - [Tutorial](#tutorial)
85
+ - [Fine tune](#fine-tune)
86
+ - [Quantization](#quantization)
87
+ - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
88
+ - [Learning hub](https://github.com/01-ai/Yi/blob/main/docs/learning_hub.md)
89
  - [🟢 Why Yi?](#-why-yi)
90
  - [🌎 Ecosystem](#-ecosystem)
91
  - [💦 Upstream](#-upstream)
 
93
  - [🔗 Serving](#-serving)
94
  - [⚙️ Quantitation](#️-quantitation)
95
  - [🛠️ Fine-tuning](#️-fine-tuning)
96
+ - [API](#api)
97
  - [📌 Benchmarks](#-benchmarks)
98
  - [📊 Base model performance](#-base-model-performance)
99
  - [📊 Chat model performance](#-chat-model-performance)
100
  - [📊 Quantized chat model performance](#-quantized-chat-model-performance)
 
101
  - [🟢 Who can use Yi?](#-who-can-use-yi)
 
 
 
 
102
  - [🟢 Misc.](#-misc)
103
  - [Ackknowledgements](#acknowledgments)
104
  - [📡 Disclaimer](#-disclaimer)
 
112
 
113
  ## 📌 Introduction
114
 
115
+ - 🤖 The Yi series models are the next generation of open-source large language models trained from scratch by [01.AI](https://01.ai/).
116
 
117
  - 🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the Yi series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example,
118
 
 
128
 
129
  Yi models come in multiple sizes and cater to different use cases. You can also fine-tune Yi models to meet your specific requirements.
130
 
131
+ For detailed deployment requirements, see [hardware requirements](https://github.com/01-ai/Yi/blob/main/docs/deployment.md#hardware-requirements).
132
+
133
  ### Chat models
134
 
135
  | Model | Download
 
141
  Yi-34B-Chat-4bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat-4bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat-4bits/summary)
142
  Yi-34B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-Chat-8bits) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-Chat-8bits/summary)
143
 
144
+ <sub><sup> - 4-bit series models are quantized by AWQ. <br> - 8-bit series models are quantized by GPTQ <br> - All quantized models have a low barrier to use since they can be deployed on consumer-grade GPUs (e.g., 3090, 4090). </sup></sub>
145
 
146
  ### Base models
147
 
 
156
 
157
  ### Other info
158
 
159
+ - For chat and base models:
160
 
161
+ - 6B series models are suitable for personal and academic use.
162
 
163
+ - 34B series models suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It's a cost-effective solution that's affordable and equipped with emergent ability.
164
 
165
+ - The **default context window** is **4k tokens**.
166
+
167
+ - The pretrained tokens are 3T.
168
+
169
+ - The training data are up to June 2023.
170
+
171
+ - For chat models:
172
 
173
+ - For detailed chat model limitations, see [limitations of chat model](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#limitations-of-chat-model).
174
 
175
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
176
 
 
227
 
228
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
229
 
230
+ # 🟢 How to use Yi?
231
+
232
+ - [Quick start](#quick-start)
233
+ - [Choose your path](#choose-your-parth)
234
+ - [Tutorial](#tutorial)
235
+ - [Fine tune](#fine-tune)
236
+ - [Quantization](#quantization)
237
+ - [Deployment](https://github.com/01-ai/Yi/blob/main/docs/deployment.md)
238
+ - [Learning hub](https://github.com/01-ai/Yi/blob/main/docs/learning_hub.md)
239
+
240
+ ## Quick start
241
+
242
+ Getting up and running with Yi models is simple with multiple choices available.
243
+
244
+ ### Choose your path
245
+
246
+ Select one of the following paths to begin your journey with Yi!
247
+
248
+ ![Quick start - Choose your path](./assets/img/quick_start_path.png)
249
+
250
+ #### 🎯 Deploy Yi locally
251
+
252
+ If you prefer to deploy Yi models locally,
253
+
254
+ - 🙋‍♀️ and you have **sufficient** resources (for example, NVIDIA A800 80GB), you can choose one of the following methods:
255
+ - [pip](#tutorial)
256
+ - [Docker](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#11-docker)
257
+ - [conda-lock](https://github.com/01-ai/Yi/blob/main/docs/README_legacy.md#12-local-development-environment)
258
+
259
+ - 🙋‍♀️ and you have **limited** resources (for example, a MacBook Pro), you can use [llama.cpp](https://github.com/01-ai/Yi/blob/main/docs/yi_llama.cpp.md).
260
+
261
+ #### 🎯 Not to deploy Yi locally
262
+
263
+ If you prefer not to deploy Yi models locally, you can explore Yi's capabilities using any of the following options.
264
+
265
+ ##### 🙋‍♀️ Run Yi with APIs
266
+
267
+ If you want to explore more features of Yi, you can adopt one of these methods:
268
+
269
+ - Yi APIs (Yi official)
270
+ - [Early access has been granted](https://x.com/01AI_Yi/status/1735728934560600536?s=20) to some applicants. Stay tuned for the next round of access!
271
+
272
+ - [Yi APIs](https://replicate.com/01-ai/yi-34b-chat/api?tab=nodejs) (Replicate)
273
+
274
+ ##### 🙋‍♀️ Run Yi in playground
275
+
276
+ If you want to chat with Yi with more customizable options (e.g., system prompt, temperature, repetition penalty, etc.), you can try one of the following options:
277
+
278
+ - [Yi-34B-Chat-Playground](https://platform.lingyiwanwu.com/prompt/playground) (Yi official)
279
+ - Access is available through a whitelist. Welcome to apply (fill out a form in [English](https://cn.mikecrm.com/l91ODJf) or [Chinese](https://cn.mikecrm.com/gnEZjiQ)).
280
+
281
+ - [Yi-34B-Chat-Playground](https://replicate.com/01-ai/yi-34b-chat) (Replicate)
282
+
283
+ ##### 🙋‍♀️ Chat with Yi
284
+
285
+ If you want to chat with Yi, you can use one of these online services, which offer a similar user experience:
286
+
287
+ - [Yi-34B-Chat](https://huggingface.co/spaces/01-ai/Yi-34B-Chat) (Yi official on Hugging Face)
288
+ - No registration is required.
289
+
290
+ - [Yi-34B-Chat](https://platform.lingyiwanwu.com/) (Yi official beta)
291
+ - Access is available through a whitelist. Welcome to apply (fill out a form in [English](https://cn.mikecrm.com/l91ODJf) or [Chinese](https://cn.mikecrm.com/gnEZjiQ)).
292
+
293
+ ## Tutorial
294
+
295
+ This tutorial guides you through every step of running Yi (Yi-34B-Chat) locally and then performing inference.
296
+
297
+ ### Step 0: Prerequistes
298
+
299
+ - This tutorial assumes you are running the **Yi-34B-Chat** with an **A800 (80G)** GPU.
300
+ - For detailed deployment requirements to run Yi models, see [hardware requirements]( https://github.com/01-ai/Yi/blob/main/docs/deployment.md).
301
+
302
+ - Make sure Python 3.10 or later version is installed.
303
+
304
+ ### Step 1: Prepare environment
305
+
306
+ To set up the environment and install the required packages, execute the following command.
307
+
308
+ ```bash
309
+ git clone https://github.com/01-ai/Yi.git
310
+ cd yi
311
+ pip install -r requirements.txt
312
+ ```
313
+
314
+ ### Step 2: Download Yi model
315
+
316
+ You can download the weights and tokenizer of Yi models from the following sources:
317
+
318
+ - [Hugging Face](https://huggingface.co/01-ai)
319
+ - [ModelScope](https://www.modelscope.cn/organization/01ai/)
320
+ - [WiseModel](https://wisemodel.cn/organization/01.AI)
321
+
322
+ ### Step 3: Perform inference
323
+
324
+ You can perform inference with Yi chat or base models as below.
325
+
326
+ #### Perform inference with Yi chat model
327
+
328
+ 1. Create a file named `quick_start.py` and copy the following content to it.
329
+
330
+ ```python
331
+ from transformers import AutoModelForCausalLM, AutoTokenizer
332
+
333
+ model_path = '<your-model-path>'
334
+
335
+ tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
336
+
337
+ # Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM.
338
+ model = AutoModelForCausalLM.from_pretrained(
339
+ model_path,
340
+ device_map="auto",
341
+ torch_dtype='auto'
342
+ ).eval()
343
+
344
+ # Prompt content: "hi"
345
+ messages = [
346
+ {"role": "user", "content": "hi"}
347
+ ]
348
+
349
+ input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt')
350
+ output_ids = model.generate(input_ids.to('cuda'))
351
+ response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
352
+
353
+ # Model response: "Hello! How can I assist you today?"
354
+ print(response)
355
+ ```
356
+
357
+ 2. Run `quick_start.py`.
358
+
359
+ ```bash
360
+ python quick_start.py
361
+ ```
362
+
363
+ Then you can see an output similar to the one below. 🥳
364
+
365
+ ```bash
366
+ Hello! How can I assist you today?
367
+ ```
368
+
369
+ #### Perform inference with Yi base model
370
+
371
+ The steps are similar to [Run Yi chat model](#run-yi-chat-model).
372
+
373
+ You can use the existing file [`text_generation.py`](https://github.com/01-ai/Yi/tree/main/demo).
374
+
375
+ ```bash
376
+ python demo/text_generation.py --model <your-model-path>
377
+ ```
378
+
379
+ Then you can see an output similar to the one below. 🥳
380
+
381
+ <details>
382
+
383
+ <summary>Output</summary>
384
+
385
+ <br>
386
+
387
+ **Prompt**: Let me tell you an interesting story about cat Tom and mouse Jerry,
388
+
389
+ **Generation**: Let me tell you an interesting story about cat Tom and mouse Jerry, which happened in my childhood. My father had a big house with two cats living inside it to kill mice. One day when I was playing at home alone, I found one of the tomcats lying on his back near our kitchen door, looking very much like he wanted something from us but couldn’t get up because there were too many people around him! He kept trying for several minutes before finally giving up...
390
+
391
+ </details>
392
+
393
+ ### Finetuning
394
+
395
+ ```bash
396
+ bash finetune/scripts/run_sft_Yi_6b.sh
397
+ ```
398
+
399
+ Once finished, you can compare the finetuned model and the base model with the following command:
400
+
401
+ ```bash
402
+ bash finetune/scripts/run_eval.sh
403
+ ```
404
+
405
+ For advanced usage (like fine-tuning based on your custom data), see [fine-tune code for Yi 6B and 34B](https://github.com/01-ai/Yi/tree/main/finetune).
406
+
407
+ ### Quantization
408
+
409
+ #### GPT-Q
410
+ ```bash
411
+ python quantization/gptq/quant_autogptq.py \
412
+ --model /base_model \
413
+ --output_dir /quantized_model \
414
+ --trust_remote_code
415
+ ```
416
+
417
+ Once finished, you can then evaluate the resulting model as follows:
418
+
419
+ ```bash
420
+ python quantization/gptq/eval_quantized_model.py \
421
+ --model /quantized_model \
422
+ --trust_remote_code
423
+ ```
424
+
425
+ For a more detailed explanation, please read the [doc](https://github.com/01-ai/Yi/tree/main/quantization/gptq)
426
+
427
+ #### AWQ
428
+ ```bash
429
+ python quantization/awq/quant_autoawq.py \
430
+ --model /base_model \
431
+ --output_dir /quantized_model \
432
+ --trust_remote_code
433
+ ```
434
+
435
+ Once finished, you can then evaluate the resulting model as follows:
436
+
437
+ ```bash
438
+ python quantization/awq/eval_quantized_model.py \
439
+ --model /quantized_model \
440
+ --trust_remote_code
441
+ ```
442
+
443
+ For detailed explanations, see [AWQ quantization](https://github.com/01-ai/Yi/tree/main/quantization/awq).
444
+
445
+ <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
446
 
447
  # 🟢 Why Yi?
448
+
449
+ - [🌎 Ecosystem](#-ecosystem)
450
+ - [💦 Upstream](#-upstream)
451
+ - [🌊 Downstream](#-downstream)
452
+ - [🔗 Serving](#-serving)
453
+ - [⚙️ Quantitation](#️-quantitation)
454
+ - [🛠️ Fine-tuning](#️-fine-tuning)
455
+ - [API](#api)
456
+ - [📌 Benchmarks](#-benchmarks)
457
+ - [📊 Base model performance](#-base-model-performance)
458
+ - [📊 Chat model performance](#-chat-model-performance)
459
+ - [📊 Quantized chat model performance](#-quantized-chat-model-performance)
460
 
461
  ## 🌎 Ecosystem
462
 
 
495
 
496
  If you want to get up with Yi in a few minutes, you can use the following services built upon Yi.
497
 
498
+ - Yi-34B-Chat: you can chat with Yi using one of the following platforms:
499
+ - [Yi-34B-Chat | Hugging Face](https://huggingface.co/spaces/01-ai/Yi-34B-Chat)
500
+ - [Yi-34B-Chat | Yi Platform](https://platform.lingyiwanwu.com/): **Note** that currently it's available through a whitelist. Welcome to apply (fill out a form in [English](https://cn.mikecrm.com/l91ODJf) or [Chinese](https://cn.mikecrm.com/gnEZjiQ)) and experience it firsthand!
501
 
502
  - [Yi-6B-Chat (Replicate)](https://replicate.com/01-ai): you can use this model with more options by setting additional parameters and calling APIs.
503
 
 
507
 
508
  If you have limited computational capabilities, you can use Yi's quantized models as follows.
509
 
510
+ These quantized models have reduced precision but offer increased efficiency, such as faster inference speed and smaller RAM usage.
511
 
512
  - [TheBloke/Yi-34B-GPTQ](https://huggingface.co/TheBloke/Yi-34B-GPTQ)
513
  - [TheBloke/Yi-34B-GGUF](https://huggingface.co/TheBloke/Yi-34B-GGUF)
 
542
  - [📊 Base model performance](#-base-model-performance)
543
  - [📊 Chat model performance](#-chat-model-performance)
544
  - [📊 Quantized chat model performance](#-quantized-chat-model-performance)
 
545
 
546
  ### 📊 Base model performance
547
 
 
602
  | Yi-34B-Chat-8bits(GPTQ) | 66.24 | **73.69** | 79.05 | 81.23 | 76.82 | 78.97 | 61.84 | **52.08** | 70.97 | 70.74 | 75.74 |
603
  | Yi-34B-Chat-4bits(AWQ) | 65.77 | 72.42 | 78.21 | 80.50 | 75.71 | 77.27 | 61.84 | 48.30 | 69.39 | 70.51 | 74.00 |
604
 
605
+ We evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA. Generally, the zero-shot approach is more common in chat models. Our evaluation strategy involves generating responses while following instructions explicitly or implicitly (such as using few-shot examples). We then isolate relevant answers from the generated text. Some models are not well-suited to produce output in the specific format required by instructions in a few datasets, which leads to suboptimal results.
606
 
607
  <strong>*</strong>: C-Eval results are evaluated on the validation datasets
608
 
609
  ### 📊 Quantized chat model performance
610
 
611
+ We also provide both 4-bit (AWQ) and 8-bit (GPTQ) quantized Yi chat models. Evaluation results on various benchmarks have shown that the quantized models have **negligible** losses. Additionally, they reduce the memory footprint size.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
612
 
613
  # 🟢 Who can use Yi?
614
 
 
620
 
621
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
622
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
623
  # 🟢 Misc.
624
 
625
  ### Acknowledgments
 
677
 
678
  <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>
679
 
 
680
  ### 🪪 License
681
 
682
  The source code in this repo is licensed under the [Apache 2.0
 
685
  via applications. All usage must adhere to the [Yi Series Models Community License Agreement 2.1](https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMENT.txt).
686
  For free commercial use, you only need to send an email to [get official commercial permission](https://www.lingyiwanwu.com/yi-license).
687
 
688
+ <div align="right"> [ <a href="#building-the-next-generation-of-open-source-and-bilingual-llms">Back to top ⬆️ </a> ] </div>