Model's Data Sources 🌟

#31
by StephanePop - opened

Hi Mistral Team,

First off, I want to express my admiration for your amazing work with the models. They're truly impressive! I'm considering using one of your models for an association I'm involved with. They have specific data sensitivity requirements, so I have a couple of inquiries regarding the training set.

I've gone through your papers but couldn't find specific details about this. Could you kindly confirm if the datasets used are rights-free, such as Redpajamas or CommonCrawl, and that no GPT-4 generated data were included? This information would greatly help us in making an informed decision.

Thanks a lot for your help!

Sign up or log in to comment