attribution of training data

by HunleyExpress - opened

I am interested in using the models stemming from the MPT base model. I see a fairly deep dive into the sources used to train the model, but after digging a couple of layers it is not easy to be certain that the source data is free from copyright/licensing issues if I were to use this model. I.e. what prevents someone from suing for infringement. Is there a clear attribution statement bubbled up from all source repos that the training data was all permissively licensed for commercial use and related? Thanks!

Good luck getting anyone on HF or the companies support this. In their view (for sure I cannot know their view) the discussion seems to be too harmful for their business. The guess is "not a single" dataset or model here would fulfill this requirement.

Sign up or log in to comment