Commit
•
f7fcbfb
1
Parent(s):
220be08
Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ datasets:
|
|
14 |
> [Announcement tweet](https://twitter.com/dvilasuero/status/1643234487386374148?s=20)
|
15 |
|
16 |
A cross-lingual SetFit model to **detect bad instructions from Alpaca Datasets** and other instruction-following datasets.
|
17 |
-
`GarbageCollector` can greatly speed up the validation of
|
18 |
|
19 |
Data quality is key for LLMs, but open-source LLMs are being built with data of "unknown" quality. This model can help practitioners to find and fix frequent issues (e.g., the model hallucinating stock prices, describing non-existing images, etc.)
|
20 |
|
|
|
14 |
> [Announcement tweet](https://twitter.com/dvilasuero/status/1643234487386374148?s=20)
|
15 |
|
16 |
A cross-lingual SetFit model to **detect bad instructions from Alpaca Datasets** and other instruction-following datasets.
|
17 |
+
`GarbageCollector` can greatly speed up the validation of instruction-datasets across many languages, flagging examples that need to be fixed or simply discarded.
|
18 |
|
19 |
Data quality is key for LLMs, but open-source LLMs are being built with data of "unknown" quality. This model can help practitioners to find and fix frequent issues (e.g., the model hallucinating stock prices, describing non-existing images, etc.)
|
20 |
|