Nous-Capybara-7B-V1 / README.md
LDJnr's picture
Update README.md
ad0e272
|
raw
history blame
4.26 kB
metadata
language:
  - eng
tags:
  - llama-2
  - sft
license:
  - mit
datasets:
  - LDJnr/Puffin

Nous-Capybara-7B

A model created with the goal of a synergistic combination of different techniques used for SOTA models such as Evol-Instruct, Orca, Platypus, Lamini, FLASK and others, all into one lean holistically formed dataset and model. The example seeds are largely based on highly rated datasets like Airoboros, EverythingLM, GPTeacher and more, as well as being supplemented with certain multi-turn datasets like Dove(A successor to Puffin).

Entirely contained within 20K training examples!

Thank you to dataset creators!

While most of the tokens within Capybara are newly synthsized and part of datasets like Puffin/Dove, we would like to credit the single-turn datasets we leveraged as seeds that are used to initiate the beggining of many of the multi-turn conversations:

Capybara

This model was fine-tuned by Nous Research, with LDJ leading the training and dataset curation, along with significant dataset formation contributions by J-Supha, Also thank you to Emozilla for also assisting to expedite the training experimentation process.

Special thank you to A16Z for sponsoring our training, as well as Yield Protocol for their support in resources during R&D of aspects outside of training, such as dataset development/synthesis.

Model Training

Nous-Capybara 7B is a new model trained for multiple epochs on a dataset of 3,000 carefully curated GPT-4 examples, most of which are long context conversations between a real human and GPT-4.

Additional data came from carefully curated sub sections of datasets such as CamelAI's Physics, Chemistry, Biology and Math.

Prompt Format

The reccomended model usage is:

USER:

ASSISTANT:

When should I use Capybara or Hermes 2?

  • Hermes 2 is trained on purely single turn instruction examples.

  • Capybara is trained mostly on multi-turn, long context, highly curated and cleaned GPT-4 conversations with real humans.

For these reasons, it's reccomended to give Capybara a try if you want to have multi-turn conversations and/or long context communication.

Example Outputs!:

Coming soon!

Notable Features:

  • The first Nous model trained on over 15,000 multi-turn conversations.

  • Over 1,000 tokens average per conversation example during training!

  • Able to effectively do complex summary of advanced studies on topics.

  • Ability to recall information upto 2023 without internet (ChatGPT cut off date is in 2021)

  • Context length of 4096 tokens, and fine-tuned on a significant amount of multi-turn conversations reaching that full token limit.

  • Includes a portion of conversational data synthesized from less wrong posts, speaking in-depth about the nature of rationality, reasoning and self-improvement.

Limitations

We noticed that the current version of Capybara still has some issues in some situations with censoring itself and not acting as expected in certain edge cases, we plan to have this largely resolved in the near future with Capybara 1.1

Future Changes

This is a relatively early build amongst the grand plans for the future of Capybara!

Current limitations: We are still running experimentation and tests for the training pipeline and dataset cleaning process to be more refined, we plan to release a Capybara 1.1 with these improvements.

Future model sizes

We plan on releasing a 3B, 13B and 70B version, as well as a potential 1B version based on phi-1.5 or similar architectures.

How you can help!

In the near future we plan on leveraging the help of domain specific expert volunteers to eliminate any mathematically/verifiably incorrect answers from our training curations.

If you have at-least a bachelors in mathematics, physics, biology or chemistry and would like to volunteer even just 30 minutes of your expertise time, please contact LDJ on discord!

Dataset contamination.

We checked for 100%, 99%, 98% and 97% similarity matches between our data and many popular benchmarks, we found no exact matches!

The following are benchmarks we checked for contamination for:

  • HumanEval

  • AGIEval

  • TruthfulQA

  • MMLU

  • GPT4All

Benchmarks! COMING SOON