{% extends "page.html" %} {% block stylesheet %} {% endblock %} {% block site %}
Space Logo

This Space is designed to provide you with an easy way to get started generating synthetic datasets using Spaces compute to host open LLMs. The Space comes with a ready-to-go environment and a series of notebooks showing various examples of generating synthetic datasets.

What's covered?

Currently this Space has notebooks covering the following topics:

Creating synthetic text similarity datasets

A set of notebooks covering the steps for creating a synthetic dataset for fine-tuning a sentence similarity model. These notebooks cover:

Using the Space

To use this Space, use the duplicate button. You'll want to enable persistent storage so you can save your work. To start, you may want to use a smaller GPU like the T4 and switch out to a bigger GPU when you want to use bigger models for generating data.

Duplicate the Space to run your own instance

The default token is huggingface

{% if login_available %}
{{ xsrf_form_html() | safe }} {% if token_available %} {% else %} {% endif %}
{% else %}

{% trans %}No login available, you shouldn't be seeing this page.{% endtrans %}

{% endif %}

This template was created by camenduru and nateraw, with contributions of osanseviero and azzr

{% if message %}
{% for key in message %}
{{message[key]}}
{% endfor %}
{% endif %} {% if token_available %} {% block token_message %} {% endblock token_message %} {% endif %}
{% endblock %} {% block script %} {% endblock %}