mrm8488 commited on
Commit
902fd69
·
1 Parent(s): edc2efd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -46,7 +46,12 @@ Instruction Tuned version of BigScience Large Open-science Open-access Multiling
46
 
47
  ## Training data
48
 
49
- TBA
 
 
 
 
 
50
 
51
  ### Supported Tasks and Leaderboards
52
 
 
46
 
47
  ## Training data
48
 
49
+ This collection of datasets are machine-translated (and soon curated) versions of the `databricks-dolly-15k` [dataset](https://github.com/databrickslabs/dolly/tree/master/data) originally created by Databricks, Inc. in 2023.
50
+
51
+ The goal is to give practitioners a starting point for training open-source instruction-following models beyond English. However, as the translation quality will not be perfect, we highly recommend dedicating time to curate and fix translation issues. Below we explain how to load the datasets into [Argilla for data curation and fixing](https://github.com/argilla-io/argilla). Additionally, we'll be improving the datasets made available here, with the help of different communities.
52
+
53
+ **We highly recommend dataset curation beyond proof-of-concept experiments.**
54
+
55
 
56
  ### Supported Tasks and Leaderboards
57