davanstrien
HF staff
Update README.md with notebooks for creating synthetic data for training sentence similarity models
d287b55
Table of Contents
Creating data for training sentence similarity models
These notebooks demonstrate how to create synthetic data for training sentence similarity models.
- 01_dataset_preparation covers the initial processing steps to prepare a dataset for the synthetic dataset creation. This notebook uses LlamaIndex to chunk texts into sections that will serve as inputs for creating a synthetic dataset.
02_synthetic_data_creation.ipynb: covers synthetic data creation for training sentence similarity models. The notebook uses
Outlines
to generate structured data and `vLLM`` to run the LLM.