Spaces:

davanstrien
/

synthetic-data-workshop

Running

App Files Files Community

synthetic-data-workshop / notebooks /README.md

davanstrien's picture

davanstrien HF staff

Update README.md with notebooks for creating synthetic data for training sentence similarity models

d287b55 4 months ago

|

737 Bytes

	# Table of Contents

	## Creating data for training sentence similarity models

	These notebooks demonstrate how to create synthetic data for training sentence similarity models.

	- [01_dataset_preparation](notebooks/01_dataset_preperation.ipynb) covers the initial processing steps to prepare a dataset for the synthetic dataset creation. This notebook uses [LlamaIndex](https://docs.llamaindex.ai/en/stable/) to chunk texts into sections that will serve as inputs for creating a synthetic dataset.
	[02_synthetic_data_creation.ipynb](notebooks/02_synthetic_data_creation.ipynb): covers synthetic data creation for training sentence similarity models. The notebook uses `Outlines` to generate structured data and `vLLM`` to run the LLM.