davanstrien's picture
davanstrien HF staff
Update README.md with notebooks for creating synthetic data for training sentence similarity models
d287b55
|
raw
history blame
737 Bytes

Table of Contents

Creating data for training sentence similarity models

These notebooks demonstrate how to create synthetic data for training sentence similarity models.

  • 01_dataset_preparation covers the initial processing steps to prepare a dataset for the synthetic dataset creation. This notebook uses LlamaIndex to chunk texts into sections that will serve as inputs for creating a synthetic dataset. 02_synthetic_data_creation.ipynb: covers synthetic data creation for training sentence similarity models. The notebook uses Outlines to generate structured data and `vLLM`` to run the LLM.