regarding working on my own dataset

#3
by prascoder - opened

Hello,
I am trying to use bonito and i have created my own csv data to generate some instructions pair but it is not allowing me to do so. The error i am getting is i have to get access for my own dataset to work on it. Could you please help me, i am rookie to this?

Hi @prascoder .

Based on authors' tutorial file, I guess you should modify the unannotated_text part.
This documentation might helpful to your situation.

For me, I worked with json file with [{"input": "something", "output": ""}] format.

Bats Research org

@seungwoos is right! You will need to modify the unannotated_text object. You should load the dataset as follows:

from datasets import load_dataset
unannotated_dataset = load_dataset("csv", data_files="my_file.csv")

Once you load the dataset, pass the object along the column containing the unannotated text (context_col):

synthetic_dataset = bonito.generate_tasks(
    unannotated_dataset,
    context_col="input",
    task_type="nli",
    sampling_params=sampling_params
)

Let me know if you run into any more issues.

nihalnayak changed discussion status to closed

Sign up or log in to comment