|
--- |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- llama-2 |
|
- astronomy |
|
- astrophysics |
|
- arxiv |
|
inference: false |
|
base_model: |
|
- meta-llama/Llama-2-70b-hf |
|
--- |
|
|
|
# AstroLLaMA-2-70B-Chat_AIC |
|
|
|
AstroLLaMA-2-70B-Chat_AIC is a specialized chat model for astronomy, developed by fine-tuning the AstroLLaMA-2-70B-Base_AIC model. This model was developed by the AstroMLab team and is, to our best knowledge, one of the first specialized 70B parameter-level LLMs in astronomy designed for instruction-following and chat-based interactions. |
|
|
|
## Model Details |
|
|
|
- **Base Architecture**: LLaMA-2-70b |
|
- **Base Model**: AstroLLaMA-2-70B-Base_AIC (trained on Abstract, Introduction, and Conclusion sections from arXiv's astro-ph category papers) |
|
- **Fine-tuning Method**: Supervised Fine-Tuning (SFT) |
|
- **SFT Dataset**: |
|
- 10,356 astronomy-centered conversations generated from arXiv abstracts by GPT-4 |
|
- Full content of LIMA dataset |
|
- 10,000 samples from Open Orca dataset |
|
- 10,000 samples from UltraChat dataset |
|
- **Training Details**: |
|
- Learning rate: 3 × 10⁻⁷ |
|
- Training epochs: 1 |
|
- Total batch size: 48 |
|
- Maximum token length: 2048 |
|
- Warmup ratio: 0.03 |
|
- Cosine decay schedule for learning rate reduction |
|
- **Primary Use**: Instruction-following and chat-based interactions for astronomy-related queries |
|
- **Reference**: [Pan et al. 2024](https://arxiv.org/abs/2409.19750) |
|
|
|
## Using the model for chat |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Load the model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("AstroMLab/astrollama-2-70b-chat_aic") |
|
model = AutoModelForCausalLM.from_pretrained("AstroMLab/astrollama-2-70b-chat_aic", device_map="auto") |
|
|
|
# Function to generate a response |
|
def generate_response(prompt, max_length=512): |
|
full_prompt = f"###Human: {prompt}\n\n###Assistant:" |
|
inputs = tokenizer(full_prompt, return_tensors="pt", truncation=True, max_length=max_length) |
|
inputs = inputs.to(model.device) |
|
|
|
# Generate a response |
|
with torch.no_grad(): |
|
outputs = model.generate( |
|
**inputs, |
|
max_length=max_length, |
|
num_return_sequences=1, |
|
do_sample=True, |
|
pad_token_id=tokenizer.eos_token_id, |
|
eos_token_id=tokenizer.encode("###Human:", add_special_tokens=False)[0] |
|
) |
|
|
|
# Decode and return the response |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=False) |
|
|
|
# Extract only the Assistant's response |
|
assistant_response = response.split("###Assistant:")[-1].strip() |
|
return assistant_response |
|
|
|
# Example usage |
|
user_input = "What are the main components of a galaxy?" |
|
response = generate_response(user_input) |
|
print(f"Human: {user_input}") |
|
print(f"Assistant: {response}") |
|
``` |
|
|
|
## Model Performance and Limitations |
|
|
|
While the AstroLLaMA-2-70B-Base_AIC model demonstrated significant improvements over its baseline LLaMA-2-70B model, the chat version (AstroLLaMA-2-70B-Chat_AIC) experiences performance degradation due to limitations in the SFT process. Here's a performance comparison: |
|
|
|
| Model | Score (%) | |
|
|-------|-----------| |
|
| **AstroSage-LLaMA-3.1-8B (AstroMLab)** | **80.9** | |
|
| **<span style="color:green">AstroLLaMA-2-70B-Base (AstroMLab)</span>** | **<span style="color:green">76.0</span>** | |
|
| LLaMA-3.1-8B | 73.7 | |
|
| LLaMA-2-70B | 70.7 | |
|
| Gemma-2-9B | 71.5 | |
|
| Qwen-2.5-7B | 70.4 | |
|
| Yi-1.5-9B | 68.4 | |
|
| InternLM-2.5-7B | 64.5 | |
|
| **<span style="color:green">AstroLLaMA-2-70B-Chat (AstroMLab)</span>** | **<span style="color:green">64.7</span>** | |
|
| Mistral-7B-v0.3 | 63.9 | |
|
| ChatGLM3-6B | 50.4 | |
|
|
|
Key limitations: |
|
|
|
1. **SFT Dataset Limitations**: The current SFT dataset, with only 30,000 Q&As (many not astronomy-focused), has proven inadequate for maintaining the base model's performance. |
|
2. **Performance Degradation**: The chat model's performance (64.7%) is significantly lower than the base model (76.0%), indicating an 11.3-point decrement due to the SFT process. |
|
3. **General Knowledge vs. Specialized Knowledge**: The current SFT process appears to deviate the model towards general answers, potentially at the cost of specialized astronomical knowledge. |
|
|
|
These limitations underscore the challenges in developing specialized chat models and the critical importance of both the quantity and quality of training data, especially for the SFT process. |
|
|
|
This model is released primarily for reproducibility purposes, allowing researchers to track the development process and compare different iterations of AstroLLaMA models. |
|
|
|
For optimal performance and the most up-to-date capabilities in astronomy-related tasks, we recommend using AstroSage-LLaMA-3.1-8B, where these limitations have been addressed through expanded training data and refined fine-tuning processes. |
|
|
|
## Ethical Considerations |
|
|
|
While this model is designed for scientific use, users should be mindful of potential misuse, such as generating misleading scientific content. Always verify model outputs against peer-reviewed sources for critical applications. |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
|
|
``` |
|
@ARTICLE{2024arXiv240919750P, |
|
author = {{Pan}, Rui and {Dung Nguyen}, Tuan and {Arora}, Hardik and {Accomazzi}, Alberto and {Ghosal}, Tirthankar and {Ting}, Yuan-Sen}, |
|
title = "{AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy}", |
|
journal = {arXiv e-prints}, |
|
keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Computation and Language}, |
|
year = 2024, |
|
month = sep, |
|
eid = {arXiv:2409.19750}, |
|
pages = {arXiv:2409.19750}, |
|
doi = {10.48550/arXiv.2409.19750}, |
|
archivePrefix = {arXiv}, |
|
eprint = {2409.19750}, |
|
primaryClass = {astro-ph.IM}, |
|
adsurl = {https://ui.adsabs.harvard.edu/abs/2024arXiv240919750P}, |
|
adsnote = {Provided by the SAO/NASA Astrophysics Data System} |
|
} |
|
|
|
``` |