divineRatio commited on
Commit
244e12b
1 Parent(s): 85a120d

Added benchmark results

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -31,5 +31,20 @@ Dynamo 8B has not been instruction fine-tuned and has not undergone alignment us
31
 
32
  *Pretraining on the multilingual dataset was done with a sequence length of 4096 tokens
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  # Notice
35
  Dynamo 8B is a pre-trained model that can be adapted and fine-tuned for a variety of tasks. However, it is new technology that carries risks. In some scenarios, it may generate inaccurate, unverified, or biased output despite efforts we have made to maximize model safety. As with all LLMs, we recommend users exercise critical thinking, validate outputs, and perform the requisite safety evaluations for specific downstream applications of the Dynamo model.
 
31
 
32
  *Pretraining on the multilingual dataset was done with a sequence length of 4096 tokens
33
 
34
+ # Evaluation Results:
35
+
36
+ In our recent evaluation, we used several multilingual benchmarks to assess our model's capabilities. These benchmarks included PAWS, XCOPA, and xstorycloze, all part of EleutherAI's evaluation harness. All runs were done in 32bit precision. Here is an in-depth description of each benchmark we used:
37
+
38
+ | Multilingual Benchmark | Language | Dynamo 8B | Mistral 7B | Llama2 13B | Bloom 7B | PolyLM 13B |
39
+ |------------------------|----------|-----------|------------|------------|----------|------------|
40
+ | PAWS | German | **0.516** | 0.363 | 0.377 | 0.502 | 0.390 |
41
+ | | English | **0.497** | 0.311 | 0.336 | 0.422 | 0.413 |
42
+ | | Spanish | **0.515** | 0.339 | 0.422 | 0.424 | 0.452 |
43
+ | | Korean | **0.552** | 0.422 | 0.534 | **0.551** | 0.544 |
44
+ | XCOPA | Italian | **0.710** | 0.63 | 0.692 | 0.516 | 0.644 |
45
+ | | Turkish | **0.672** | 0.562 | 0.550 | 0.520 | 0.574 |
46
+ | xstorycloze | Spanish | **0.645** | 0.632 | 0.622 | 0.639 | **0.642** |
47
+
48
+
49
  # Notice
50
  Dynamo 8B is a pre-trained model that can be adapted and fine-tuned for a variety of tasks. However, it is new technology that carries risks. In some scenarios, it may generate inaccurate, unverified, or biased output despite efforts we have made to maximize model safety. As with all LLMs, we recommend users exercise critical thinking, validate outputs, and perform the requisite safety evaluations for specific downstream applications of the Dynamo model.