AstroSage-Llama-3.1-8B-GGUF

https://arxiv.org/abs/2411.09012

AstroSage-Llama-3.1-8B-GGUF is the quantized version of AstroSage-Llama-3.1-8B, optimized for efficient deployment while maintaining the model's specialized capabilities in astronomy, astrophysics, and cosmology. This quantized version aims to provide a more accessible deployment option while preserving the model's capabilities.

Model Details

Base Architecture: Meta-Llama-3.1-8B
Base Model: AstroSage-Llama-3.1-8B
Parameters: 8 billion
Quantization: GGUF format with two precision options
Training Focus: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
License: Llama 3.1 Community License
Development Process:
1. Based on the fully trained AstroSage-Llama-3.1-8B model
2. Quantized to GGUF format in two versions
3. Optimized for efficient inference

Using the Model

Python Implementation

from llama_cpp import Llama 
from huggingface_hub import hf_hub_download 
import os 
import sys 
import contextlib

# Suppress warnings
@contextlib.contextmanager 
def suppress_stderr(): 
    stderr = sys.stderr 
    with open(os.devnull, 'w') as devnull: 
        sys.stderr = devnull 
        try: 
            yield 
        finally: 
            sys.stderr = stderr 

# or change the filename to AstroSage-8B-BF16.gguf for BF16 quantization
def download_model(repo_id="AstroMLab/AstroSage-8B-GGUF", filename="AstroSage-8B-Q8_0.gguf"): 
    try: 
        os.makedirs("models", exist_ok=True) 
        local_path = os.path.join("models", filename) 
        if not os.path.exists(local_path): 
            print(f"Downloading {filename}...") 
            with suppress_stderr(): 
                local_path = hf_hub_download( 
                    repo_id=repo_id, 
                    filename=filename, 
                    local_dir="models", 
                    local_dir_use_symlinks=False 
                ) 
            print("Download complete!") 
        return local_path 
    except Exception as e: 
        print(f"Error downloading model: {e}") 
        raise 

def initialize_llm(): 
    model_path = download_model() 
    with suppress_stderr(): 
        return Llama( 
            model_path=model_path, 
            n_ctx=2048, 
            n_threads=4 
        ) 

def get_response(llm, prompt, max_tokens=128): 
    response = llm( 
        prompt, 
        max_tokens=max_tokens, 
        temperature=0.7, 
        top_p=0.9, 
        top_k=40, 
        repeat_penalty=1.1, 
        stop=["User:", "\n\n"] 
    ) 
    return response['choices'][0]['text'] 

def main(): 
    llm = initialize_llm()
    
    # Example question about galaxy formation
    first_question = "How does a galaxy form?"
    print("\nQuestion:", first_question)
    print("\nAI:", get_response(llm, first_question).strip(), "\n")
    
    print("\nYou can now ask more questions! Type 'quit' or 'exit' to end the conversation.\n")
    
    while True:
        try:
            user_input = input("You: ")
            if user_input.lower() in ['quit', 'exit']:
                print("\nGoodbye!")
                break
                
            print("\nAI:", get_response(llm, user_input).strip(), "\n")
            
        except KeyboardInterrupt:
            print("\nGoodbye!")
            break
        except Exception as e:
            print(f"Error: {e}")

if __name__ == "__main__": 
    main()

Installation Requirements

pip install llama-cpp-python huggingface_hub

For Macbook with Apple Silicon, install llama-cpp with the following instead

CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DLLAMA_METAL=on" pip install llama-cpp-python

Key Parameters

n_ctx: Context window size (default: 2048)
n_threads: Number of CPU threads to use (adjust based on your hardware)
temperature: Controls randomness
top_p: Nucleus sampling parameter
top_k: Limits vocabulary choices
repeat_penalty: Prevents repetition
max_tokens: Maximum length of response (128 default, increase for longer answers)

Example Usage

The model will automatically:

Download the quantized model from Hugging Face
Initialize it with recommended parameters
Start with an example question about galaxy formation
Allow for interactive conversation
Support easy exit with 'quit' or 'exit' commands

For different use cases, you can:

Use the BF16 version for maximum accuracy
Adjust context window size for longer conversations
Modify temperature for more/less deterministic responses
Change max_tokens for longer/shorter responses

Model Improvements and Performance

The quantized model offers several advantages:

Reduced memory requirements
CPU inference capability
Faster inference speed
Broader hardware compatibility

Note: Formal benchmarking of the quantized model is pending. Performance metrics will be updated once comprehensive testing is completed.

Quantization Details

Format: GGUF
Available Versions:
- AstroSage-8B-BF16.gguf: bfloat16 precision, original precision
- AstroSage-8B-Q8_0.gguf: 8-bit quantized, negligible loss in perplexity, smaller size
Compatibility: Works with llama.cpp and derived projects
Trade-offs:
- BF16:
  - Best quality, closest to original model behavior
  - Larger file size and memory requirements
  - Recommended for accuracy-critical applications
- Q8_0:
  - Reduced memory footprint
  - Good balance of performance and size
  - Suitable for most general applications

Intended Use

Curiosity-driven question answering
Brainstorming new ideas
Astronomical research assistance
Educational support in astronomy
Literature review and summarization
Scientific explanation of concepts
Low-resource deployment scenarios
Edge device implementation
CPU-only environments
Applications requiring reduced memory footprint

Limitations

All limitations of the original model apply
Additional considerations:
- Potential reduction in prediction accuracy due to quantization
- May show increased variance in numeric calculations
- Reduced precision in edge cases
- Performance may vary based on hardware configuration

Technical Specifications

Architecture: Meta-Llama 3.1
Deployment: CPU-friendly, reduced memory footprint
Format: GGUF (compatible with llama.cpp)

Ethical Considerations

While this model is designed for scientific use:

Should not be used as sole source for critical research decisions
Output should be verified against primary sources
May reflect biases present in astronomical literature

Citation and Contact

Corresponding author: Tijmen de Haan (tijmen dot dehaan at gmail dot com)
AstroMLab: astromachinelearninglab at gmail dot com
Please cite the AstroMLab 3 paper when referencing this model:

@preprint{dehaan2024astromlab3,
      title={AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model}, 
      author={Tijmen de Haan and Yuan-Sen Ting and Tirthankar Ghosal and Tuan Dung Nguyen and Alberto Accomazzi and Azton Wells and Nesar Ramachandra and Rui Pan and Zechang Sun},
      year={2024},
      eprint={2411.09012},
      archivePrefix={arXiv},
      primaryClass={astro-ph.IM},
      url={https://arxiv.org/abs/2411.09012}, 
}

Additional note: When citing this quantized version, please reference both the original AstroMLab 3 paper above and specify the use of the GGUF quantized variant.

AstroMLab
/

AstroSage-8B-GGUF