{% extends "page.html" %} {% block stylesheet %} {% endblock %} {% block site %}
Space Logo

This Space is designed to provide you with an easy way to get started generating synthetic datasets using Spaces compute to host open LLMs. The Space comes with a ready-to-go environment and a series of notebooks showing various examples of generating synthetic datasets.

What's covered?

Currently this Space has notebooks covering the following topics:

Creating synthetic text similarity datasets

A set of notebooks covering the steps for creating a synthetic dataset for fine-tuning a sentence similarity model. These notebooks cover:

Using the Space

To use this Space, use the duplicate button. You'll want to enable persistent storage so you can save your work. To start, you may want to use a smaller GPU like the T4 and switch out to a bigger GPU when you want to use bigger models for generating data. Reminder you can preview the notebooks in the Space without running them. You can find the notebooks in the `notebooks` folder here.

Duplicate the Space to run your own instance


Duplicate Space

The default token is huggingface

{% if login_available %}
{{ xsrf_form_html() | safe }} {% if token_available %} {% else %} {% endif %}
{% else %}

{% trans %}No login available, you shouldn't be seeing this page.{% endtrans %}

{% endif %}

This template was created by camenduru and nateraw, with contributions of osanseviero and azzr

{% if message %}
{% for key in message %}
{{message[key]}}
{% endfor %}
{% endif %} {% if token_available %} {% block token_message %} {% endblock token_message %} {% endif %}
{% endblock %} {% block script %} {% endblock %}