License

#2
by ryota39 - opened

Hi,
Thanks for releasing such compact but powerful models ever!

I would like to ask you an interpretation of a license of "augmxnt/ultra-orca-boros-en-ja-v1" that you used to train the instruction model.
The dataset contains a portion from airoboros as you can see "source" colomn of the dataset.

The developer of airoboros clearly described as follows.
Also note that the data was generated primarily with gpt-4, and therefore may have some strings attached to the OpenAI terms of service.

So, I think "augmxnt/ultra-orca-boros-en-ja-v1" dataset cannot use to develop any large language models because this contains the outputs from gpt-4, even releasing apache-2.0.

In these days, when LLM development has become so advanced, I sympathize with the fact that it is becoming more and more difficult to strictly adhere to those terms of use. This is just my opinion.

If you have some insights to deal with a kind of license problem, it would be glad to share in this community.

Sign up or log in to comment