Thought-Ranked Llama 3.2 3B

Model Description

This model is a fine-tuned version of Meta's Llama 3.2 3B (Base) that has been specially trained to generate high-quality thought processes before producing answers. The model underwent 4 rounds of specialized fine-tuning using a thought-chain ranking approach. (Weekend project, just a few hundred steps of training)

Training Process

  1. Initial Generation: For each training sample, the model generates multiple thought chains by prefixing different thought tokens: <thought>{char}</thought> for each character in [a-zA-Z0-9]. Each thought chain is allowed up to 128 tokens.

  2. Answer Generation: Following each thought chain, the model generates a complete answer with up to 2048 tokens.

  3. Ranking & Selection: An external LLM ranking system evaluates the quality of answers without seeing the thought processes, creating a ranking of the most effective thought patterns.

  4. Final Training: The model is then trained on the highest-ranked thought-answer pairs, learning to generate the most effective thought patterns autonomously.

Key Features

  • Thought Chain Generation: The model has learned to generate explicit thought processes before providing answers
  • Greedy Sampling: Uses greedy sampling for both thought generation and final answers
  • Length Parameters:
    • Thought chains: Up to 128 tokens
    • Final answers: Up to 2048 tokens

Model Architecture

  • Base model: Llama 3.2 3B (Base)
  • Architecture: Transformer-based language model
  • Parameters: ~3.2 billion
  • Training Strategy: Supervised Fine-Tuning (SFT) with thought-chain ranking

Intended Use

This model is designed for tasks that benefit from explicit reasoning chains, including but not limited to:

  • Problem-solving
  • Mathematical reasoning
  • Logical deduction
  • Step-by-step explanations
  • Complex decision making

Out-of-Scope Uses

  • Direct deployment without safety measures
  • Applications requiring guaranteed accuracy
  • Critical decision-making without human oversight
  • Tasks requiring capabilities beyond the base Llama 3.2 3B model

Training Details

Training Data

The model was trained using:

  • Sample questions paired with multiple thought variations
  • Thought chains generated using systematic character prefixes
  • Rankings derived from LLM evaluation of answer quality

Training Procedure

  1. Thought Generation Phase

    • Generated 62 variations of thoughts per sample (a-z, A-Z, 0-9)
    • Sampled with temperature=0.0
    • Maximum thought length: 128 tokens
  2. Answer Generation Phase

    • Generated completions following each thought chain
    • Maximum answer length: 2048 tokens
    • Sampled with temperature=0.0
  3. Ranking Phase

    • External LLM evaluated answer quality
    • Ranking performed without access to thought chains
    • Selected highest-performing thought-answer pairs
  4. Final Training Phase

    • Fine-tuned on best-performing thought-answer combinations
    • 4 complete rounds of training

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("ericflo/Llama-3.2-3B-COT")
tokenizer = AutoTokenizer.from_pretrained("ericflo/Llama-3.2-3B-COT")

# Example usage
prompt = "Solve this math problem: 2x + 3 = 7"
input_ids = tokenizer.apply_chat_template(
  [{"role": "user", "content": prompt}],
  return_tensors="pt"
)

# Generate response with thought chain
output = model.generate(
    input_ids,
    temperature=1.0,
)

response = tokenizer.decode(output[0])

Limitations

  • Limited to the capabilities of the base Llama 3.2 3B model
  • May generate thought chains that are not always optimal
  • Performance depends on the quality of the LLM ranking system used during training
  • Training process may not capture all possible effective thought patterns
  • Limited by the context window of the base model

Ethical Considerations

  • The model inherits biases from the base Llama 3.2 3B model
  • Generated thought chains should be reviewed for accuracy and appropriateness
  • The model's reasoning process should not be relied upon for critical decisions without human verification
  • Users should implement appropriate content filtering and safety measures

Citation

If you use this model in your research, please cite:

@misc{thought-ranked-llama,
  title={Thought-Ranked Llama 3.2: Fine-tuning Language Models with Ranked Thought Chains},
  author={[Eric Florenzano]},
  year={2024},
  howpublished={\url{https://huggingface.co/ericflo/Llama-3.2-3B-COT}}
}
Downloads last month
162
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for ericflo/Llama-3.2-3B-COT

Quantized
(55)
this model
Merges
1 model
Quantizations
2 models