Post
1535
Can you create domain-specific synthetic datasets in under 20 minutes?
@burtenshaw recently launched the Domain Specific Dataset Project as part of Data is Better Together. As part of this, Ben created a Space that you can use to define some key perspectives and concepts from a domain. This seed dataset can then be used to generate a synthetic dataset for a particular domain.
In less than 30 minutes this afternoon, I created a domain-specific dataset focused on data-centric machine learning using these tools: davanstrien/data-centric-ml-sft.
You can create your own domain specific datasets using this approach. Find the steps to follow here: https://github.com/huggingface/data-is-better-together/blob/main/domain-specific-datasets/README.md
@burtenshaw recently launched the Domain Specific Dataset Project as part of Data is Better Together. As part of this, Ben created a Space that you can use to define some key perspectives and concepts from a domain. This seed dataset can then be used to generate a synthetic dataset for a particular domain.
In less than 30 minutes this afternoon, I created a domain-specific dataset focused on data-centric machine learning using these tools: davanstrien/data-centric-ml-sft.
You can create your own domain specific datasets using this approach. Find the steps to follow here: https://github.com/huggingface/data-is-better-together/blob/main/domain-specific-datasets/README.md