regarding working on my own dataset
Hello,
I am trying to use bonito and i have created my own csv data to generate some instructions pair but it is not allowing me to do so. The error i am getting is i have to get access for my own dataset to work on it. Could you please help me, i am rookie to this?
Hi @prascoder .
Based on authors' tutorial file, I guess you should modify the unannotated_text
part.
This documentation might helpful to your situation.
For me, I worked with json file with [{"input": "something", "output": ""}]
format.
@seungwoos
is right! You will need to modify the unannotated_text
object. You should load the dataset as follows:
from datasets import load_dataset
unannotated_dataset = load_dataset("csv", data_files="my_file.csv")
Once you load the dataset, pass the object along the column containing the unannotated text (context_col
):
synthetic_dataset = bonito.generate_tasks(
unannotated_dataset,
context_col="input",
task_type="nli",
sampling_params=sampling_params
)
Let me know if you run into any more issues.