BabyMistral Model Card

Model Overview

BabyMistral is a compact yet powerful language model designed for efficient text generation tasks. Built on the Mistral architecture, this model offers impressive performance despite its relatively small size.

Key Specifications

  • Parameters: 1.5 billion
  • Training Data: 1.5 trillion tokens
  • Architecture: Based on Mistral
  • Training Duration: 70 days
  • Hardware: 4x NVIDIA A100 GPUs

Model Details

Architecture

BabyMistral utilizes the Mistral AI architecture, which is known for its efficiency and performance. The model scales this architecture to 1.5 billion parameters, striking a balance between capability and computational efficiency.

Training

  • Dataset Size: 1.5 trillion tokens
  • Training Approach: Trained from scratch
  • Hardware: 4x NVIDIA A100 GPUs
  • Duration: 70 days of continuous training

Capabilities

BabyMistral is designed for a wide range of natural language processing tasks, including:

  • Text completion and generation
  • Creative writing assistance
  • Dialogue systems
  • Question answering
  • Language understanding tasks

Usage

Getting Started

To use BabyMistral with the Hugging Face Transformers library:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("OEvortex/BabyMistral")
tokenizer = AutoTokenizer.from_pretrained("OEvortex/BabyMistral")

# Define the chat input
chat = [
#     { "role": "system", "content": "You are BabyMistral" },
    { "role": "user", "content": "Hey there! How are you? ๐Ÿ˜Š" }
]

inputs = tokenizer.apply_chat_template(
    chat,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)


# Generate text
outputs = model.generate(
    inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    eos_token_id=tokenizer.eos_token_id,

    
)

response = outputs[0][inputs.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

#I am doing well! How can I assist you today? ๐Ÿ˜Š

Ethical Considerations

While BabyMistral is a powerful tool, users should be aware of its limitations and potential biases:

  • The model may reproduce biases present in its training data
  • It should not be used as a sole source of factual information
  • Generated content should be reviewed for accuracy and appropriateness

Limitations

  • May struggle with very specialized or technical domains
  • Lacks real-time knowledge beyond its training data
  • Potential for generating plausible-sounding but incorrect information
Downloads last month
327
Safetensors
Model size
1.55B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for OEvortex/BabyMistral

Finetunes
1 model
Quantizations
2 models