README.md · rhysjones/gpt2-774M-fineweb-150B at refs/pr/4

metadata

license: mit
base_model: rhysjones/gpt2-774M-fineweb-150B
datasets:
  - HuggingFaceFW/fineweb
widget:
  - text: >-
      <|user|><|content|>Hello! Who are you?<|end_turn|>

      <|assistant|><|content|>Hello! I am ChatBot, a large language model. I am
      here to help you with information, answer questions, provide
      recommendations, and assist with a variety of tasks. How can I assist you
      today?<|end_turn|>

      <|user|><|content|>Write a message asking a friend to come to my wedding
      next month. I want it to be very short and casual, but with an option to
      opt out.<|end_turn|> <|assistant|><|content|>

This is karpathy's model from the llm.c project converted to HF format to investigate bfloat16 performance.

The training run was 150B tokens, 1.5 epochs over the 100B FineWeb sample dataset.

There's active work underway at https://github.com/karpathy/llm.c so I'd suggest following the developments there!