R0k1e
/

UltraLink-LM

@@ -15,24 +15,37 @@ metrics:
   - accuracy
 ---
-<img src="flow_diagram.png" alt="UltraLink Flow Diagram" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 # Model Card for UltraLink-LM
 ## Model Summary
-> The UltraLink-LM is a massively multilingual generative language model that follows instructions in 5 languages, English, French, Russian, Spanish, and Chinese. It is trained on a combination of publicly available datasets and UltraLink, including ShareGPT, UltraChat, Magicoder-Evol-Instruct-110K, and Magicoder-OSS-Instruct-75K. The model is capable of generating text in 5 languages with high quality and diversity.
 > UltraLink-LM outperforms [PolyLM-Chat-13b](https://huggingface.co/DAMO-NLP-MT/polylm-chat-13b), [Guanaco](JosephusCheung/Guanaco),  and [Bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt) in code, math and chat abilities in four languages, and has a high-quality and diverse text generation performance in all languages.
-> The UltraLink-LM is trained using [UltraLink](https://huggingface.co/datasets/R0k1e/UltraLink), [UltraChat](https://huggingface.co/datasets/stingning/ultrachat), [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K), and ShareGPT.
 > We release the checkpoints under a MIT license to further our mission of multilingual technologies empowering a multilingual world.
-- **Developed by:** [THUNLP]((http://nlp.csai.tsinghua.edu.cn/))
 - **Model type:** a Transformer style autoregressive massively multilingual language model.
 - **Paper**: [UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset](https://arxiv.org/abs/2402.04588)
 - **Languages**: Refer to the list of languages in the `language` section of this model card.
 - **License**: MIT
 - **Model**: [UltraLink-LM](https://huggingface.co/R0k1e/UltraLink-LM)
 - **Model Size**: 13 billion parameters
-- **Datasets**: [UltraLink](https://huggingface.co/datasets/R0k1e/UltraLink), [UltraChat](https://huggingface.co/datasets/stingning/ultrachat), [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K), and ShareGPT.
 ## Use
@@ -45,111 +58,128 @@ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
 ultralink_lm = AutoModelForCausalLM.from_pretrained(checkpoint)
 # Chat abilities in Chinese
-# Please tell us about Tang San Cai(An ancient Chinese pottery type).
-chat_inputs = tokenizer.encode("请介绍一下唐三彩。", return_tensors="pt")
 chat_outputs = ultralink_lm.generate(chat_inputs, max_new_tokens=512)
-print(tokenizer.decode(chat_outputs[0]))
 # Expected output:
 """
-唐三彩是中国古代的一种绘画艺术形式，起源于唐朝（618-907）。它以其精美的色彩和细致的绘画技巧而闻名。唐三���是一种陶瓷，通常以圆形或扁平的形状制成，并刻有图案或图像。这些图案通常是以彩色的方式呈现，使用釉质陶瓷上的颜料。唐三彩的主题各异，包括动物、植物、人物和日常生活场景。它是中国艺朩的重要贡献，仍然是中国艺术的重要组成部分。
 """
 # Translations in English:
 """
-Tang Sancai is an ancient Chinese art form of painting that originated in the Tang Dynasty (618-907). It is known for its exquisite colours and detailed painting techniques. Tang Sancai is a type of ceramic, usually made in round or flat shapes and inscribed with designs or images. These designs are usually rendered in colour, using pigments on glazed ceramics. Tang sancai has a variety of themes, including animals, plants, figures and scenes from daily life. It was an important contribution to Chinese art kwun and remains an important part of Chinese art.
 """
 # Code abilities in Russian
 # Please implement a bubble sort algorithm in Python.
-code_inputs = tokenizer.encode("Реализуйте алгоритм пузырьковой сортировки на Python.", return_tensors="pt")
 code_outputs = ultralink_lm.generate(code_inputs, max_new_tokens=512)
 print(tokenizer.decode(code_outputs[0]))
 # Expected output:
 """
-Вот простая реализация алгоритма пузырьковой сортировки на Python:
-```python
-def bubbleSort(arr):
-    n = len(arr)
     for i in range(n):
         for j in range(0, n-i-1):
             if arr[j] > arr[j+1]:
                 arr[j], arr[j+1] = arr[j+1], arr[j]
 arr = [64, 34, 25, 12, 22, 11, 90]
 bubbleSort(arr)
 print("Отсортированный массив:", arr)
 \```
-В этом коде функция `bubbleSort` сортирует массив `arr` с помощью алгоритма пузырьковой сортировки. Внутри функции происходит итерация по массиву, сравнение соседних элементов и обмен их местами, если они находятся в неправильном порядке. Этот процесс повторяется до тех пор, пока массив не будет отсортирован.
-После вызова функции `bubbleSort` отсортированный массив выводится на экран.
-Обратите внимание, что алгоритм пузырьковой сортировки не является эффективным для больших наборов данных, так как его средняя и худшая временная сложность составляют O(n^2), где n - количество элементов в массиве.
 """
 # Translations in English:
 """
-Here is a simple implementation of the bubble sort algorithm in Python:
-```python
-def bubbleSort(arr):
-    n = len(arr)
     for i in range(n):
         for j in range(0, n-i-1):
             if arr[j] > arr[j+1]:
                 arr[j], arr[j+1] = arr[j+1], arr[j]
 arr = [64, 34, 25, 12, 22, 11, 90]
 bubbleSort(arr)
 print("Sorted array:", arr)
 \```
-In this code, the `bubbleSort` function sorts the `arr` array using the bubble sort algorithm. Inside the function iterates through the array, comparing neighbouring elements and swapping them if they are in the wrong order. This process is repeated until the array is sorted.
-After calling the `bubbleSort` function, the sorted array is displayed on the screen.
-Note that the bubble sort algorithm is not efficient for large datasets because its average and worst-case time complexity are O(n^2), where n is the number of elements in the array.
 """
 # Math abilities in French
 # When the length of a rectangle is twice its width, calculate the area of the rectangle if its perimeter is 18 units?
-math_inputs = tokenizer.encode("Lorsque la longueur d'un rectangle est le double de sa largeur, calculer l'aire du rectangle si son périmètre est de 18 unités?", return_tensors="pt")
 math_outputs = ultralink_lm.generate(math_inputs, max_new_tokens=512)
 print(tokenizer.decode(math_outputs[0]))
 # Expected output:
 """
-La longueur du rectangle est deux fois sa largeur, donc la longueur est de 2w et la largeur est de w.
-Le périmètre d'un rectangle est deux fois la somme de sa longueur et de sa largeur, donc l'équation est 2(2w + w) = 18.
-En simplifiant l'équation, nous obtenons 2(3w) = 18, ce qui se simplifie en 6w = 18.
-En divisant les deux côtés par 6, nous obtenons w = 3.
-Par conséquent, la longueur du rectangle est de 2w = 2(3) = 6.
-L'aire d'un rectangle est le produit de sa longueur et de sa largeur, donc l'aire est de 6 * 3 = 18.
-La réponse est : 18
 """
 # Translations in English:
 """
-The length of the rectangle is twice its width, so the length is 2w and the width is w.
-The perimeter of a rectangle is twice the sum of its length and width, so the equation is 2(2w + w) = 18.
-Simplifying the equation, we get 2(3w) = 18, which simplifies to 6w = 18.
-Dividing the two sides by 6 gives w = 3.
-So the length of the rectangle is 2w = 2(3) = 6.
-The area of a rectangle is the product of its length and width, so the area is 6 * 3 = 18.
-The answer is: 18
 """
 ```
@@ -161,7 +191,7 @@ The answer is: 18
 - Number of Samples seen during Finetuning: 1023K
 - Batch size: 128
 - Hardware: NVIDIA A100 80GB PCIe
-- Software: BMTrain
 ### Data Sources
@@ -171,12 +201,17 @@ The UltraLink-LM is trained on the following datasets:
 - [UltraChat](https://huggingface.co/datasets/stingning/ultrachat)
 - [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)
 - [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K)
-- ShareGPT
 All the datasets are integrated into the UltraLink dataset.
 ## Evaluation
 ### Multilingual HumanEval
 [HumanEval](https://github.com/openai/human-eval) is a well-known benchmark for evaluating the code ability of LLMs. It execute the code snippets generated by the model and evaluate their correctness.  Since there are no existing multilingual test set for code generation, we use GPT-3.5 with carefully-designed prompts to translation HumanEval into other languages.
@@ -191,7 +226,7 @@ All the datasets are integrated into the UltraLink dataset.
 |Okapi-7b | 12.2 | 11.0 | 8.5 | 8.5 | 8.5 | 9.8 |
 |Guanaco-7b | 9.2 | 6.7 | 11.0 | 9.8 | 12.8 | 9.9 |
 |Guanaco-13b| 18.3 | 15.9 | 9.8 | 8.5 | 14.6 | 12.2 |
-|UltraLink-LM  | 60.4 | 43.9 | 40.9 | 49.4 | 39.6 | 46.8|
 ### MGSM
@@ -207,7 +242,7 @@ We employ [MGSM](https://github.com/google-research/url-nlp/tree/main/mgsm) to e
 |Okapi-7b | 4.0 | 2.4 | 3.6 | 4.4 | 4.8 | 3.8 |
 |Guanaco-7b | 4.0 | 1.6 | 3.2 | 2.8 | 4.4 | 3.0 |
 |Guanaco-13b | 13.6 | 10.8 | 11.2 | 6.4 | 5.2 | 8.4 |
-|UltraLink-LM| 70.4 | 56.0 | 70.4 | 64.8 | 63.6 | 63.7  |
 ### OMGEval
 We use the [OMGEval](https://github.com/blcuicall/OMGEval) to evaluate the chat ability, which is a multilingual version of the widely-used English benchmark AlpacaEval.
@@ -221,11 +256,13 @@ We use the [OMGEval](https://github.com/blcuicall/OMGEval) to evaluate the chat
 |Chimera-inst-chat-13b | 15.5 | 9.7 | 11.8 | 13.7 | 13.8 | 12.9 |
 |Okapi-7b | 8.8 | 6.2 | 5.0 | 12.1 | 8.7 | 8.2 |
 |Guanaco-7b  | 4.6 | 3.8 | 0.4 | 1.8 | 1.2 | 2.4 |
-|Guanaco-13b  |  29.0 | 8.6 | 16.9 | 15.4 | 17.3 | 17.5 |
-|UltraLink-LM |  28.8 |  21.9 |  23.5 |  37.6 |  29.0 |  28.2  |
 ## Citation
 ```bibtex
 @misc{wang2024ultralink,
       title={UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset},

   - accuracy
 ---
+<div align="center">
+<img src="title.png" alt="UltraLink" width="200">
+**multi-lingual, knowledge-grounded, multi-round dialogue dataset and model**
+<p align="center">
+ <a href="#Introduction"> Introduction </a> •
+ <a href="#Construction-of-UltraLink">Construction Process</a> •
+ <a href="https://arxiv.org/abs/2402.04588">Paper</a> •
+ <a href="https://huggingface.co/datasets/R0k1e/UltraLink"> UltraLink</a> •
+ <a href="https://github.com/OpenBMB/UltraLink"> Github</a>
+</p>
+</div>
 # Model Card for UltraLink-LM
 ## Model Summary
+> The UltraLink-LM is a massively multilingual generative language model that follows instructions in 5 languages, English, French, Russian, Spanish, and Chinese. The model is capable of generating text in 5 languages with high quality and diversity.
 > UltraLink-LM outperforms [PolyLM-Chat-13b](https://huggingface.co/DAMO-NLP-MT/polylm-chat-13b), [Guanaco](JosephusCheung/Guanaco),  and [Bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt) in code, math and chat abilities in four languages, and has a high-quality and diverse text generation performance in all languages.
+> The UltraLink-LM is trained using [UltraLink](https://huggingface.co/datasets/R0k1e/UltraLink), [UltraChat](https://huggingface.co/datasets/stingning/ultrachat), [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K), [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), and [ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/).
 > We release the checkpoints under a MIT license to further our mission of multilingual technologies empowering a multilingual world.
+- **Developed by:** [OpenBMB]((https://www.openbmb.cn/home))
 - **Model type:** a Transformer style autoregressive massively multilingual language model.
 - **Paper**: [UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset](https://arxiv.org/abs/2402.04588)
 - **Languages**: Refer to the list of languages in the `language` section of this model card.
 - **License**: MIT
 - **Model**: [UltraLink-LM](https://huggingface.co/R0k1e/UltraLink-LM)
 - **Model Size**: 13 billion parameters
+- **Datasets**: [UltraLink](https://huggingface.co/datasets/R0k1e/UltraLink), [UltraChat](https://huggingface.co/datasets/stingning/ultrachat)(random select 10k samples), [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K), [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA), and [ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/)(the English part of the dataset whose sample length is greater than 4k).
 ## Use
 ultralink_lm = AutoModelForCausalLM.from_pretrained(checkpoint)
 # Chat abilities in Chinese
+# What is heavy cavalry?
+first_question = "<s>[INST] 什么是重骑兵？ [/INST]"
+chat_inputs = tokenizer.encode(first_question, add_special_tokens=False, return_tensors="pt")
 chat_outputs = ultralink_lm.generate(chat_inputs, max_new_tokens=512)
+first_answer = tokenizer.decode(chat_outputs[0])
+print(first_answer)
 # Expected output:
 """
+<s> [INST] 什么是重骑兵？ [/INST] 重骑兵是一种历史上的战斗单位，通常由骑兵组成，他们在战斗中使用重型装甲和长矛。他们以在战场上的强大攻击能力而闻名，并且通常被用于突破敌军阵线或攻击敌方骑兵。重骑兵通常被认为是中世纪战争中最强大和最具威慑力的单位之一。</s>
 """
 # Translations in English:
 """
+<s> [INST] What is heavy cavalry? [/INST] The Heavy Cavalry is a historical fighting unit, usually consisting of cavalrymen who use heavy armour and spears in battle. They were known for their powerful attacks on the battlefield and were often used to break through enemy lines or attack enemy cavalry. Heavy cavalry is often considered one of the most powerful and intimidating units in medieval warfare.</s>
+"""
+# Second turn:
+second_question = "<s>[INST] 重骑兵对中世纪的战场有哪些影响？ [/INST]"
+second_question = first_answer + second_question
+chat_inputs = tokenizer.encode(second_question, add_special_tokens=False, return_tensors="pt")
+chat_outputs = ultralink_lm.generate(chat_inputs, max_new_tokens=512)
+second_answer = tokenizer.decode(chat_outputs[0])
+print(second_answer)
+# Expected output:
+"""
+<s>  [INST] 什么是重骑兵？ [/INST] 重骑兵是一种历史上的战斗单位，通常由骑兵组成，他们在战斗中使用重型装甲和长矛。他们以在战场上的强大攻击能力而闻名，并且
+通常被用于突破敌军阵线或攻击敌方骑兵。重骑兵通常被认为是中世纪战争中最强大和最具威慑力的单位之一。</s><s> [INST] 重骑兵对中世纪的战场有哪些影响？ [/INST]
+ 重骑兵在中世纪的战场上起到了重要的作用。他们的强大攻击能力使他们成为战斗中的强大力量，并且他们的存在常常能够改变战斗的结果。重骑兵通常被用于突破敌军阵线
+，并在战斗中创造突破口，这使得其他部队能够进入敌方阵地。他们还被用于攻击敌方骑兵，并且他们的重型装甲和长矛使他们在这种情况下具有优势。总的来说，重骑兵的
+存在使得中世纪的战场更加复杂和不可预测，他们的存在对战斗的结果产生了重大影响。</s>
+"""
+#Translations in English:
+"""
+<s> [INST] What is heavy cavalry? [/INST] The Heavy Cavalry is a historical fighting unit, usually consisting of cavalrymen who use heavy armour and spears in battle. They were known for their powerful attacks on the battlefield and were often used to break through enemy lines or attack enemy cavalry. Heavy cavalry is often considered one of the most powerful and intimidating units in medieval warfare.</s><s>[INST] What impact did heavy cavalry have on the medieval battlefield? [/INST] Heavy cavalry played an important role on the battlefields of the Middle Ages. Their powerful attacks made them a formidable force in battle and their presence could often change the outcome of a battle. Heavy cavalry were often used to break through enemy lines and create breaches in battle, which allowed other troops to move into enemy positions. They were also used to attack enemy cavalry, and their heavy armour and spears gave them an advantage in such situations. Overall, the presence of heavy cavalry made the medieval battlefield more complex and unpredictable, and their presence had a major impact on the outcome of battles.</s>
 """
 # Code abilities in Russian
 # Please implement a bubble sort algorithm in Python.
+code_inputs = tokenizer.encode("<s>[INST] Реализуйте алгоритм пузырьковой сортировки на Python. [/INST]", add_special_tokens=False, return_tensors="pt")
 code_outputs = ultralink_lm.generate(code_inputs, max_new_tokens=512)
 print(tokenizer.decode(code_outputs[0]))
 # Expected output:
 """
+<s> [INST] Реализуйте алгоритм пузырьковой сортировки на Python. [/INST] Конечно, вот простая реализация алгоритма пузырьковой сортировки на Python:
+```python
+def bubbleSort(arr):
+    n = len(arr)
+    # Проходимся по всем элементам массива
     for i in range(n):
+        # Последние i элементов уже отсортированы
         for j in range(0, n-i-1):
+            # Проходимся по массиву, пока не найдем элемент, больший, чем следующий
             if arr[j] > arr[j+1]:
+                # Обмениваем элементы местами
                 arr[j], arr[j+1] = arr[j+1], arr[j]
+# Проверяем функцию
 arr = [64, 34, 25, 12, 22, 11, 90]
 bubbleSort(arr)
 print("Отсортированный массив:", arr)
 \```
+Эта реализация работает путем повторного прохождения по массиву и сравнения соседних элементов. Если элемент сравнивается больше, чем следующий элемент, они меняются местами. Этот процесс повторяется до тех пор, пока массив не будет отсортирован.
+Вы можете изменить размер массива и его начальное состояние, чтобы увидеть, как функция работает с разными входными данными.
+Примечание: Алгоритм пузырьковой сортировки не является эффективным для больших наборов данных, так как его средняя и худшая временная сложность составляют O(n^2), где n - количество элементов в массиве.</s>
 """
 # Translations in English:
 """
+<s> [INST] Implement the bubble sorting algorithm in Python. [/INST] Sure enough, here's a simple implementation of the bubble sort algorithm in Python:
+```python
+def bubbleSort(arr):
+    n = len(arr)
+    # Go through all elements of the array
     for i in range(n):
+        # The last i elements are already sorted
         for j in range(0, n-i-1):
+            # We traverse the array until we find an element greater than the next one
             if arr[j] > arr[j+1]:
+                # Swapping elements
                 arr[j], arr[j+1] = arr[j+1], arr[j]
+# Check the function
 arr = [64, 34, 25, 12, 22, 11, 90]
 bubbleSort(arr)
 print("Sorted array:", arr)
 \```
+This implementation works by repeatedly traversing the array and comparing neighbouring elements. If an element is compared more than the next element, they are swapped. This process is repeated until the array is sorted.
+You can change the array size and initial state to see how the function works with different input data.
+Note: The bubble sort algorithm is not efficient for large datasets because its average and worst-case time complexity are O(n^2), where n is the number of elements in the array.</s>
 """
 # Math abilities in French
 # When the length of a rectangle is twice its width, calculate the area of the rectangle if its perimeter is 18 units?
+math_inputs = tokenizer.encode("<s>[INST] Lorsque la longueur d'un rectangle est le double de sa largeur, calculer l'aire du rectangle si son périmètre est de 18 unités? [/INST]", add_special_tokens=False, return_tensors="pt")
 math_outputs = ultralink_lm.generate(math_inputs, max_new_tokens=512)
 print(tokenizer.decode(math_outputs[0]))
 # Expected output:
 """
+<s> [INST] Lorsque la longueur d'un rectangle est le double de sa largeur, calculer l'aire du rectangle si son périmètre est de 18 unités? [/INST]
+Soit la largeur du rectangle $w$. Alors la longueur du rectangle est $2w$.
+Le périmètre du rectangle est $2(w+2w)=18$.
+En simplifiant, nous avons $6w=18$, donc $w=3$.
+L'aire du rectangle est $w \cdot (2w) = 3 \cdot 6 = \boxed{18}$ unités carrées.
+La réponse est : 18</s>
 """
 # Translations in English:
 """
+<s> [INST] When the length of a rectangle is twice its width, calculate the area of the rectangle if its perimeter is 18 units? [/INST]
+Let $w$ be the width of the rectangle. Then the length of the rectangle is $2w$.
+La réponse est : 18
+The perimeter of the rectangle is $2(w+2w)=18$.
+Simplifying, we have $6w=18$, so $w=3$.
+The area of the rectangle is $w \cdot (2w) = 3 \cdot 6 = \boxed{18}$ square units.
+The answer is: 18</s>
 """
 ```
 - Number of Samples seen during Finetuning: 1023K
 - Batch size: 128
 - Hardware: NVIDIA A100 80GB PCIe
+- Software: [BMTrain](https://github.com/OpenBMB/BMTrain)
 ### Data Sources
 - [UltraChat](https://huggingface.co/datasets/stingning/ultrachat)
 - [Magicoder-Evol](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)
 - [Magicoder-OSS](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K)
+- [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)
+- [ShareGPT](https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/)
+We randomly select 10k samples from the UltraChat dataset and use them as the training set. And ShareGPT is filtered to keep only the English part of the dataset whose sample length is greater than 4k. The other datasets are used as auxiliary datasets for training.
 All the datasets are integrated into the UltraLink dataset.
 ## Evaluation
+We report three evaluations in this section: multilingual HumanEval, MGSM, and OMGEval.
+Evaluations of modern LLMs may be biased and affected by many factors, we are also actively working on more comprehensive evaluation methods.
 ### Multilingual HumanEval
 [HumanEval](https://github.com/openai/human-eval) is a well-known benchmark for evaluating the code ability of LLMs. It execute the code snippets generated by the model and evaluate their correctness.  Since there are no existing multilingual test set for code generation, we use GPT-3.5 with carefully-designed prompts to translation HumanEval into other languages.
 |Okapi-7b | 12.2 | 11.0 | 8.5 | 8.5 | 8.5 | 9.8 |
 |Guanaco-7b | 9.2 | 6.7 | 11.0 | 9.8 | 12.8 | 9.9 |
 |Guanaco-13b| 18.3 | 15.9 | 9.8 | 8.5 | 14.6 | 12.2 |
+|UltraLink-LM  | __60.4__ | __43.9__ | __40.9__ | __49.4__ | __39.6__ | __46.8__|
 ### MGSM
 |Okapi-7b | 4.0 | 2.4 | 3.6 | 4.4 | 4.8 | 3.8 |
 |Guanaco-7b | 4.0 | 1.6 | 3.2 | 2.8 | 4.4 | 3.0 |
 |Guanaco-13b | 13.6 | 10.8 | 11.2 | 6.4 | 5.2 | 8.4 |
+|UltraLink-LM| __70.4__ | __56.0__ | __70.4__ | __64.8__ | __63.6__ | __63.7__ |
 ### OMGEval
 We use the [OMGEval](https://github.com/blcuicall/OMGEval) to evaluate the chat ability, which is a multilingual version of the widely-used English benchmark AlpacaEval.
 |Chimera-inst-chat-13b | 15.5 | 9.7 | 11.8 | 13.7 | 13.8 | 12.9 |
 |Okapi-7b | 8.8 | 6.2 | 5.0 | 12.1 | 8.7 | 8.2 |
 |Guanaco-7b  | 4.6 | 3.8 | 0.4 | 1.8 | 1.2 | 2.4 |
+|Guanaco-13b  |  __29.0__ | 8.6 | 16.9 | 15.4 | 17.3 | 17.5 |
+|UltraLink-LM |  28.8 |  __21.9__ |  __23.5__ | __37.6__ | __29.0__ |  __28.2__  |
 ## Citation
+Feel free to cite the repo if you think UltraLink is useful.
 ```bibtex
 @misc{wang2024ultralink,
       title={UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset},

infer.py CHANGED Viewed

@@ -6,109 +6,118 @@ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
 ultralink_lm = AutoModelForCausalLM.from_pretrained(checkpoint)
 # Chat abilities in Chinese
-# Please tell us about Tang San Cai(An ancient Chinese pottery type).
-chat_inputs = tokenizer.encode("请介绍一下唐三彩。", return_tensors="pt")
 chat_outputs = ultralink_lm.generate(chat_inputs, max_new_tokens=512)
-print(tokenizer.decode(chat_outputs[0]))
 # Expected output:
 """
-唐三彩是中国古代的一种绘画艺术形式，起源于唐朝（618-907）。它以其精美的色彩和细致的绘画技巧而闻名。唐三彩是一种陶瓷，通常以圆形或扁平的形状制成，并刻有图案或图像。这些图案通常是以彩色的方式呈现，使用釉质陶瓷上的颜料。唐三彩的主题各异，包括动物、植物、人物和日常生活场景。它是中国艺朩的重要贡献，仍然是中国艺术的重要组成部分。
 """
 # Translations in English:
 """
-Tang Sancai is an ancient Chinese art form of painting that originated in the Tang Dynasty (618-907). It is known for its exquisite colours and detailed painting techniques. Tang Sancai is a type of ceramic, usually made in round or flat shapes and inscribed with designs or images. These designs are usually rendered in colour, using pigments on glazed ceramics. Tang sancai has a variety of themes, including animals, plants, figures and scenes from daily life. It was an important contribution to Chinese art kwun and remains an important part of Chinese art.
 """
 # Code abilities in Russian
 # Please implement a bubble sort algorithm in Python.
-code_inputs = tokenizer.encode("Реализуйте алгоритм пузырьковой сортировки на Python.", return_tensors="pt")
 code_outputs = ultralink_lm.generate(code_inputs, max_new_tokens=512)
 print(tokenizer.decode(code_outputs[0]))
 # Expected output:
 """
-Вот простая реализация алгоритма пузырьковой сортировки на Python:
-```python
-def bubbleSort(arr):
-    n = len(arr)
     for i in range(n):
         for j in range(0, n-i-1):
             if arr[j] > arr[j+1]:
                 arr[j], arr[j+1] = arr[j+1], arr[j]
 arr = [64, 34, 25, 12, 22, 11, 90]
 bubbleSort(arr)
 print("Отсортированный массив:", arr)
 ```
-В этом коде функция `bubbleSort` сортирует массив `arr` с помощью алгоритма пузырьковой сортировки. Внутри функции происходит итерация по массиву, сравнение соседних элементов и обмен их местами, если они находятся в неправильном порядке. Этот процесс повторяется до тех пор, пока массив не будет отсортирован.
-После вызова функции `bubbleSort` отсортированный массив выводится на экран.
-Обратите внимание, что алгоритм пузырьковой сортировки не является эффективным для больших наборов данных, так как его средняя и худшая временная сложность составляют O(n^2), где n - количество элементов в массиве.
 """
 # Translations in English:
 """
-Here is a simple implementation of the bubble sort algorithm in Python:
-```python
-def bubbleSort(arr):
-    n = len(arr)
     for i in range(n):
         for j in range(0, n-i-1):
             if arr[j] > arr[j+1]:
                 arr[j], arr[j+1] = arr[j+1], arr[j]
 arr = [64, 34, 25, 12, 22, 11, 90]
 bubbleSort(arr)
 print("Sorted array:", arr)
 ```
-In this code, the `bubbleSort` function sorts the `arr` array using the bubble sort algorithm. Inside the function iterates through the array, comparing neighbouring elements and swapping them if they are in the wrong order. This process is repeated until the array is sorted.
-After calling the `bubbleSort` function, the sorted array is displayed on the screen.
-Note that the bubble sort algorithm is not efficient for large datasets because its average and worst-case time complexity are O(n^2), where n is the number of elements in the array.
 """
 # Math abilities in French
 # When the length of a rectangle is twice its width, calculate the area of the rectangle if its perimeter is 18 units?
-math_inputs = tokenizer.encode("Lorsque la longueur d'un rectangle est le double de sa largeur, calculer l'aire du rectangle si son périmètre est de 18 unités?", return_tensors="pt")
 math_outputs = ultralink_lm.generate(math_inputs, max_new_tokens=512)
 print(tokenizer.decode(math_outputs[0]))
 # Expected output:
 """
-La longueur du rectangle est deux fois sa largeur, donc la longueur est de 2w et la largeur est de w.
-Le périmètre d'un rectangle est deux fois la somme de sa longueur et de sa largeur, donc l'équation est 2(2w + w) = 18.
-En simplifiant l'équation, nous obtenons 2(3w) = 18, ce qui se simplifie en 6w = 18.
-En divisant les deux côtés par 6, nous obtenons w = 3.
-Par conséquent, la longueur du rectangle est de 2w = 2(3) = 6.
-L'aire d'un rectangle est le produit de sa longueur et de sa largeur, donc l'aire est de 6 * 3 = 18.
-La réponse est : 18
 """
 # Translations in English:
 """
-The length of the rectangle is twice its width, so the length is 2w and the width is w.
-The perimeter of a rectangle is twice the sum of its length and width, so the equation is 2(2w + w) = 18.
-Simplifying the equation, we get 2(3w) = 18, which simplifies to 6w = 18.
-Dividing the two sides by 6 gives w = 3.
-So the length of the rectangle is 2w = 2(3) = 6.
-The area of a rectangle is the product of its length and width, so the area is 6 * 3 = 18.
-The answer is: 18
 """

 ultralink_lm = AutoModelForCausalLM.from_pretrained(checkpoint)
 # Chat abilities in Chinese
+# What is heavy cavalry?
+first_question = "<s>[INST] 什么是重骑兵？ [/INST]"
+chat_inputs = tokenizer.encode(first_question, add_special_tokens=False, return_tensors="pt")
 chat_outputs = ultralink_lm.generate(chat_inputs, max_new_tokens=512)
+first_answer = tokenizer.decode(chat_outputs[0])
+print(first_answer)
 # Expected output:
 """
+<s> [INST] 什么是重骑兵？ [/INST] 重骑兵是一种历史上的战斗单位，通常由骑兵组成，他们在战斗中使用重型装甲和长矛。他们以在战场上的强大攻击能力而闻名，并且通常被用于突破敌军阵线或攻击敌方骑兵。重骑兵通常被认为是中世纪战争中最强大和最具威慑力的单位之一。</s>
 """
 # Translations in English:
 """
+<s> [INST] What is heavy cavalry? [/INST] The Heavy Cavalry is a historical fighting unit, usually consisting of cavalrymen who use heavy armour and spears in battle. They were known for their powerful attacks on the battlefield and were often used to break through enemy lines or attack enemy cavalry. Heavy cavalry is often considered one of the most powerful and intimidating units in medieval warfare.</s>
+"""
+# Second turn:
+second_question = "<s>[INST] What impact did heavy cavalry have on the medieval battlefield? [/INST]"
+second_question = first_answer + second_question
+chat_inputs = tokenizer.encode(second_question, add_special_tokens=False, return_tensors="pt")
+chat_outputs = ultralink_lm.generate(chat_inputs, max_new_tokens=512)
+second_answer = tokenizer.decode(chat_outputs[0])
+print(second_answer)
+# Expected output:
+"""
+<s> [INST] What is heavy cavalry? [/INST] The Heavy Cavalry is a historical fighting unit, usually consisting of cavalrymen who use heavy armour and spears in battle. They were known for their powerful attacks on the battlefield and were often used to break through enemy lines or attack enemy cavalry. Heavy cavalry is often considered one of the most powerful and intimidating units in medieval warfare.</s><s>[INST] What impact did heavy cavalry have on the medieval battlefield? [/INST] Heavy cavalry played an important role on the battlefields of the Middle Ages. Their powerful attacks made them a formidable force in battle and their presence could often change the outcome of a battle. Heavy cavalry were often used to break through enemy lines and create breaches in battle, which allowed other troops to move into enemy positions. They were also used to attack enemy cavalry, and their heavy armour and spears gave them an advantage in such situations. Overall, the presence of heavy cavalry made the medieval battlefield more complex and unpredictable, and their presence had a major impact on the outcome of battles.</s>
 """
 # Code abilities in Russian
 # Please implement a bubble sort algorithm in Python.
+code_inputs = tokenizer.encode("<s>[INST] Реализуйте алгоритм пузырьковой сортировки на Python. [/INST]", add_special_tokens=False, return_tensors="pt")
 code_outputs = ultralink_lm.generate(code_inputs, max_new_tokens=512)
 print(tokenizer.decode(code_outputs[0]))
 # Expected output:
 """
+<s> [INST] Реализуйте алгоритм пузырьковой сортировки на Python. [/INST] Конечно, вот простая реализация алгоритма пузырьковой сортировки на Python:
+```python
+def bubbleSort(arr):
+    n = len(arr)
+    # Проходимся по всем элементам массива
     for i in range(n):
+        # Последние i элементов уже отсортированы
         for j in range(0, n-i-1):
+            # Проходимся по массиву, пока не найдем элемент, больший, чем следующий
             if arr[j] > arr[j+1]:
+                # Обмениваем элементы местами
                 arr[j], arr[j+1] = arr[j+1], arr[j]
+# Проверяем функцию
 arr = [64, 34, 25, 12, 22, 11, 90]
 bubbleSort(arr)
 print("Отсортированный массив:", arr)
 ```
+Эта реализация работает путем повторного прохождения по массиву и сравнения соседних элементов. Если элемент сравнивается больше, чем следующий элемент, они меняются местами. Этот процесс повторяется до тех пор, пока массив не будет отсортирован.
+Вы можете изменить размер массива и его начальное состояние, чтобы увидеть, как функция работает с разными входными данными.
+Примечание: Алгоритм пузырьковой сортировки не является эффективным для больших наборов данных, так как его средняя и худшая временная сложность составляют O(n^2), где n - количество элементов в массиве.</s>
 """
 # Translations in English:
 """
+<s> [INST] Implement the bubble sorting algorithm in Python. [/INST] Sure enough, here's a simple implementation of the bubble sort algorithm in Python:
+```python
+def bubbleSort(arr):
+    n = len(arr)
+    # Go through all elements of the array
     for i in range(n):
+        # The last i elements are already sorted
         for j in range(0, n-i-1):
+            # We traverse the array until we find an element greater than the next one
             if arr[j] > arr[j+1]:
+                # Swapping elements
                 arr[j], arr[j+1] = arr[j+1], arr[j]
+# Check the function
 arr = [64, 34, 25, 12, 22, 11, 90]
 bubbleSort(arr)
 print("Sorted array:", arr)
 ```
+This implementation works by repeatedly traversing the array and comparing neighbouring elements. If an element is compared more than the next element, they are swapped. This process is repeated until the array is sorted.
+You can change the array size and initial state to see how the function works with different input data.
+Note: The bubble sort algorithm is not efficient for large datasets because its average and worst-case time complexity are O(n^2), where n is the number of elements in the array.</s>
 """
 # Math abilities in French
 # When the length of a rectangle is twice its width, calculate the area of the rectangle if its perimeter is 18 units?
+math_inputs = tokenizer.encode("<s>[INST] Lorsque la longueur d'un rectangle est le double de sa largeur, calculer l'aire du rectangle si son périmètre est de 18 unités? [/INST]", add_special_tokens=False, return_tensors="pt")
 math_outputs = ultralink_lm.generate(math_inputs, max_new_tokens=512)
 print(tokenizer.decode(math_outputs[0]))
 # Expected output:
 """
+<s> [INST] Lorsque la longueur d'un rectangle est le double de sa largeur, calculer l'aire du rectangle si son périmètre est de 18 unités? [/INST]
+Soit la largeur du rectangle $w$. Alors la longueur du rectangle est $2w$.
+Le périmètre du rectangle est $2(w+2w)=18$.
+En simplifiant, nous avons $6w=18$, donc $w=3$.
+L'aire du rectangle est $w \cdot (2w) = 3 \cdot 6 = \boxed{18}$ unités carrées.
+La réponse est : 18</s>
 """
 # Translations in English:
 """
+<s> [INST] When the length of a rectangle is twice its width, calculate the area of the rectangle if its perimeter is 18 units? [/INST]
+Let $w$ be the width of the rectangle. Then the length of the rectangle is $2w$.
+La réponse est : 18
+The perimeter of the rectangle is $2(w+2w)=18$.
+Simplifying, we have $6w=18$, so $w=3$.
+The area of the rectangle is $w \cdot (2w) = 3 \cdot 6 = \boxed{18}$ square units.
+The answer is: 18</s>
 """