AstroSage-8B / README.md
Tijmen2's picture
Update README.md
011b5fb verified
|
raw
history blame
3.56 kB
metadata
language:
  - en
tags:
  - physics
  - astronomy
  - astrophysics
  - cosmology
license:
  - llama3.1
base_model:
  - meta-llama/Meta-Llama-3.1-8B

AstroSage-Llama-3.1-8B

AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, and cosmology. Trained on the complete collection of astronomy-related arXiv papers from 2007-2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-8B demonstrates remarkable proficiency on a wide range of questions. AstroSage-Llama-3.1-8B scores 80.9% on the AstroMLab-1 benchmark, greatly outperforming all models---proprietary and open-weight---in the 8-billion parameter class, and performing on par with GPT-4o. This achievement demonstrates the potential of domain specialization in AI, suggesting that focused training can yield capabilities exceeding those of much larger, general-purpose models. AstroSage-Llama-3.1-8B is freely available, enabling widespread access to advanced AI capabilities for astronomical education and research.

Model Details

  • Model Type: Astronomy-specialized LLM
  • Base Model: Meta-Llama-3.1-8B
  • Parameters: 8 billion
  • Training Focus: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
  • License: Llama 3.1 Community License
  • Development Process:
    1. Continued Pre-training (CPT) on astronomical literature
    2. Supervised Fine-tuning (SFT) on QA pairs and instruction sets
    3. Model merging with Meta-Llama-3.1-8B-Instruct (75% CPT+SFT / 25% Meta-Instruct)

Performance

  • AstroMLab-1 Benchmark: 80.9% accuracy
    • Outperforms all 8B parameter models
    • Comparable to GPT-4o (80.4%)
    • ~1000x more cost-effective than proprietary models
    • 8 percentage-point improvement over base model
  • General Capabilities: Maintains strong performance on standard benchmarks
    • IF-EVAL: 41.4%
    • BBH: 52.9%
    • MATH: 8.4%
    • GPQA: 31.2%
    • MUSR: 38.9%
    • MMLU-PRO: 34.6%

Training Data

  • Continued Pre-training:
    • ~250,000 arXiv preprints (2007-2024) from astro-ph and gr-qc
    • Astronomy-related Wikipedia articles
    • Selected astronomy textbooks
    • Total: 3.3 billion tokens, 19.9 GB plaintext
  • Supervised Fine-tuning:
    • 8.8 million curated QA pairs
    • Filtered Infinity-Instruct-7M dataset
    • Paper summaries and metadata
    • Total: 2.0 billion tokens, 9.8 GB plaintext

Intended Use

  • Curiosity-driven question answering
  • Brainstorming new ideas
  • Astronomical research assistance
  • Educational support in astronomy
  • Literature review and summarization
  • Domain-specific question answering
  • Scientific explanation of concepts

Limitations

  • As with all LLMs, hallucinations are possible
  • Limited by 8B parameter size for complex reasoning
  • Paper metadata not perfectly memorized
  • Performance primarily validated on multiple-choice questions
  • Training data cutoff: January 2024
  • English-only capabilities

Ethical Considerations

  • Should not be used as sole source for critical research decisions
  • Output should be verified against primary sources
  • May reflect biases present in astronomical literature

Technical Specifications

  • Architecture: Based on Meta-Llama 3.1
  • Training Infrastructure: ORNL OLCF Frontier
  • Hosting: Hugging Face Hub (AstroMLab/AstroSage-8B)

Citation and Contact

  • Corresponding author: Tijmen de Haan <tijmen.dehaan at gmail dot com>
  • Please cite the AstroMLab 3 paper when using this model.