BSC-LT
/

salamandra-7b-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

javier-ab-bsc commited on Oct 1, 2024

Commit

4fb7f42

·

verified ·

1 Parent(s): 6a1dda0

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -610,6 +610,7 @@ This instruction-tuned variant has been trained with a mixture of 276k English,
 | tower-blocks          | -      | 19,895 | 2,000  |
 | **Total** | **36,456** | **196,426** | **43,665** |
 ## Evaluation
@@ -904,7 +905,6 @@ An instruction (might include an Input inside it), a response to evaluate, and a
 ###Feedback:"
 ```
 As an example, prompts for the Math task in English are based on instances from [MGSM](https://huggingface.co/datasets/juletxara/mgsm), and each instance is presented within these prompts:
 ```python
@@ -937,7 +937,6 @@ Score 1: The answer is mathematically correct, with accurate calculations and ap
 }
 ```
 #### Multilingual results
 Here, we present results for seven categories of tasks in Spanish, Catalan, Basque, Galician, and English. Results are presented for each task, criterion and language. Criteria with a `(B)` after their name are binary criteria (i.e., numbers go from 0 to 1, where 1 is best). The rest of the criteria are measured using a 5-point Likert scale, where 5 is best. The first number of the pair of numbers separated by `/` shows the average score for the criterion (and language). The second number of each pair is the robustness score, where numbers closer to 0 mean that the model generates similar responses when comparing the three prompt varieties for a single instance.
@@ -946,6 +945,8 @@ Further details on all tasks and criteria, a full list of results compared to ot
 ![](./images/results_eval_7b_judge.png)
 ## Ethical Considerations and Limitations
 We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,

 | tower-blocks          | -      | 19,895 | 2,000  |
 | **Total** | **36,456** | **196,426** | **43,665** |
+---
 ## Evaluation
 ###Feedback:"
 ```
 As an example, prompts for the Math task in English are based on instances from [MGSM](https://huggingface.co/datasets/juletxara/mgsm), and each instance is presented within these prompts:
 ```python
 }
 ```
 #### Multilingual results
 Here, we present results for seven categories of tasks in Spanish, Catalan, Basque, Galician, and English. Results are presented for each task, criterion and language. Criteria with a `(B)` after their name are binary criteria (i.e., numbers go from 0 to 1, where 1 is best). The rest of the criteria are measured using a 5-point Likert scale, where 5 is best. The first number of the pair of numbers separated by `/` shows the average score for the criterion (and language). The second number of each pair is the robustness score, where numbers closer to 0 mean that the model generates similar responses when comparing the three prompt varieties for a single instance.
 ![](./images/results_eval_7b_judge.png)
+---
 ## Ethical Considerations and Limitations
 We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,