GritLM
/

GritLM-8x7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

GritLM-8x7B / README.md

Muennighoff's picture

Create README.md

9a00d10 verified 10 months ago

|

1.06 kB

metadata

pipeline_tag: text-generation
inference: true
license: apache-2.0

Table of Contents

Model Summary
Use
Training
Citation

Model Summary

GritLM is a generative-representational instruction-tuned language model. It performs well at both text representation and text generation.

Repository: ContextualAI/gritlm
Paper: TODO

Use

The models usage is documented here. It supports GritLM, Transformers, Sentence Transformers.

Training

Model

Architecture: Mistral-8x7B
Steps: 250k pretraining & 30 instruction tuning
Pretraining tokens: ? pretraining & 2M instruction tuning
Precision: bfloat16

Hardware

Pretraining:
- GPUs: 512 Tesla A100
- Training time: 1 day
Instruction tuning:
- GPUs: 8 Tesla A100
- Training time: 4 hours

Software

https://github.com/ContextualAI/gritlm

Citation

TODO