GritLM-8x7B / README.md
Muennighoff's picture
Create README.md
9a00d10 verified
|
raw
history blame
1.06 kB
metadata
pipeline_tag: text-generation
inference: true
license: apache-2.0

Table of Contents

  1. Model Summary
  2. Use
  3. Training
  4. Citation

Model Summary

GritLM is a generative-representational instruction-tuned language model. It performs well at both text representation and text generation.

Use

The models usage is documented here. It supports GritLM, Transformers, Sentence Transformers.

Training

Model

  • Architecture: Mistral-8x7B
  • Steps: 250k pretraining & 30 instruction tuning
  • Pretraining tokens: ? pretraining & 2M instruction tuning
  • Precision: bfloat16

Hardware

  • Pretraining:
    • GPUs: 512 Tesla A100
    • Training time: 1 day
  • Instruction tuning:
    • GPUs: 8 Tesla A100
    • Training time: 4 hours

Software

https://github.com/ContextualAI/gritlm

Citation

TODO