RAG dataset & method release?

by pszemraj - opened 9 days ago

9 days ago

Hi, great work on this model/ really interesting research direction too!

a new dataset of 45,088,768,000 tokens modeling common retrieval tasks.

I wanted to ask if the new RAG dataset you created as mentioned in the README will be released and/or the methodology/code to create it given a generic large corpus? I understand it's derived from the Common Corpus, but it would be great to know how.

MaziyarPanahi

7 days ago

I also have a similar question, since this model went through a SFT stage if not more (RLHF), where are the instruct datasets?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment