AstroSage-Llama-3.1-8B-GGUF
https://arxiv.org/abs/2411.09012
AstroSage-Llama-3.1-8B-GGUF is the quantized version of AstroSage-Llama-3.1-8B, optimized for efficient deployment while maintaining the model's specialized capabilities in astronomy, astrophysics, and cosmology. This quantized version aims to provide a more accessible deployment option while preserving the model's capabilities.
Model Details
- Base Architecture: Meta-Llama-3.1-8B
- Base Model: AstroSage-Llama-3.1-8B
- Parameters: 8 billion
- Quantization: GGUF format with two precision options
- Training Focus: Astronomy, Astrophysics, Cosmology, and Astronomical Instrumentation
- License: Llama 3.1 Community License
- Development Process:
- Based on the fully trained AstroSage-Llama-3.1-8B model
- Quantized to GGUF format in two versions
- Optimized for efficient inference
Using the Model
Python Implementation
from llama_cpp import Llama
from huggingface_hub import hf_hub_download
import os
import sys
import contextlib
# Suppress warnings
@contextlib.contextmanager
def suppress_stderr():
stderr = sys.stderr
with open(os.devnull, 'w') as devnull:
sys.stderr = devnull
try:
yield
finally:
sys.stderr = stderr
# or change the filename to AstroSage-8B-BF16.gguf for BF16 quantization
def download_model(repo_id="AstroMLab/AstroSage-8B-GGUF", filename="AstroSage-8B-Q8_0.gguf"):
try:
os.makedirs("models", exist_ok=True)
local_path = os.path.join("models", filename)
if not os.path.exists(local_path):
print(f"Downloading {filename}...")
with suppress_stderr():
local_path = hf_hub_download(
repo_id=repo_id,
filename=filename,
local_dir="models",
local_dir_use_symlinks=False
)
print("Download complete!")
return local_path
except Exception as e:
print(f"Error downloading model: {e}")
raise
def initialize_llm():
model_path = download_model()
with suppress_stderr():
return Llama(
model_path=model_path,
n_ctx=2048,
n_threads=4
)
def get_response(llm, prompt, max_tokens=128):
response = llm(
prompt,
max_tokens=max_tokens,
temperature=0.7,
top_p=0.9,
top_k=40,
repeat_penalty=1.1,
stop=["User:", "\n\n"]
)
return response['choices'][0]['text']
def main():
llm = initialize_llm()
# Example question about galaxy formation
first_question = "How does a galaxy form?"
print("\nQuestion:", first_question)
print("\nAI:", get_response(llm, first_question).strip(), "\n")
print("\nYou can now ask more questions! Type 'quit' or 'exit' to end the conversation.\n")
while True:
try:
user_input = input("You: ")
if user_input.lower() in ['quit', 'exit']:
print("\nGoodbye!")
break
print("\nAI:", get_response(llm, user_input).strip(), "\n")
except KeyboardInterrupt:
print("\nGoodbye!")
break
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
Installation Requirements
pip install llama-cpp-python huggingface_hub
For Macbook with Apple Silicon, install llama-cpp with the following instead
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DLLAMA_METAL=on" pip install llama-cpp-python
Key Parameters
n_ctx
: Context window size (default: 2048)n_threads
: Number of CPU threads to use (adjust based on your hardware)temperature
: Controls randomnesstop_p
: Nucleus sampling parametertop_k
: Limits vocabulary choicesrepeat_penalty
: Prevents repetitionmax_tokens
: Maximum length of response (128 default, increase for longer answers)
Example Usage
The model will automatically:
- Download the quantized model from Hugging Face
- Initialize it with recommended parameters
- Start with an example question about galaxy formation
- Allow for interactive conversation
- Support easy exit with 'quit' or 'exit' commands
For different use cases, you can:
- Use the BF16 version for maximum accuracy
- Adjust context window size for longer conversations
- Modify temperature for more/less deterministic responses
- Change max_tokens for longer/shorter responses
Model Improvements and Performance
The quantized model offers several advantages:
- Reduced memory requirements
- CPU inference capability
- Faster inference speed
- Broader hardware compatibility
Note: Formal benchmarking of the quantized model is pending. Performance metrics will be updated once comprehensive testing is completed.
Quantization Details
- Format: GGUF
- Available Versions:
- AstroSage-8B-BF16.gguf: bfloat16 precision, original precision
- AstroSage-8B-Q8_0.gguf: 8-bit quantized, negligible loss in perplexity, smaller size
- Compatibility: Works with llama.cpp and derived projects
- Trade-offs:
- BF16:
- Best quality, closest to original model behavior
- Larger file size and memory requirements
- Recommended for accuracy-critical applications
- Q8_0:
- Reduced memory footprint
- Good balance of performance and size
- Suitable for most general applications
- BF16:
Intended Use
- Curiosity-driven question answering
- Brainstorming new ideas
- Astronomical research assistance
- Educational support in astronomy
- Literature review and summarization
- Scientific explanation of concepts
- Low-resource deployment scenarios
- Edge device implementation
- CPU-only environments
- Applications requiring reduced memory footprint
Limitations
- All limitations of the original model apply
- Additional considerations:
- Potential reduction in prediction accuracy due to quantization
- May show increased variance in numeric calculations
- Reduced precision in edge cases
- Performance may vary based on hardware configuration
Technical Specifications
- Architecture: Meta-Llama 3.1
- Deployment: CPU-friendly, reduced memory footprint
- Format: GGUF (compatible with llama.cpp)
Ethical Considerations
While this model is designed for scientific use:
- Should not be used as sole source for critical research decisions
- Output should be verified against primary sources
- May reflect biases present in astronomical literature
Citation and Contact
- Corresponding author: Tijmen de Haan (tijmen dot dehaan at gmail dot com)
- AstroMLab: astromachinelearninglab at gmail dot com
- Please cite the AstroMLab 3 paper when referencing this model:
@preprint{dehaan2024astromlab3,
title={AstroMLab 3: Achieving GPT-4o Level Performance in Astronomy with a Specialized 8B-Parameter Large Language Model},
author={Tijmen de Haan and Yuan-Sen Ting and Tirthankar Ghosal and Tuan Dung Nguyen and Alberto Accomazzi and Azton Wells and Nesar Ramachandra and Rui Pan and Zechang Sun},
year={2024},
eprint={2411.09012},
archivePrefix={arXiv},
primaryClass={astro-ph.IM},
url={https://arxiv.org/abs/2411.09012},
}
Additional note: When citing this quantized version, please reference both the original AstroMLab 3 paper above and specify the use of the GGUF quantized variant.
- Downloads last month
- 215
Model tree for AstroMLab/AstroSage-8B-GGUF
Base model
meta-llama/Llama-3.1-8B