AstroMLab
/

astrollama-3-8b-base_summary

Text Generation

text-generation-inference

Model card Files Files and versions Community

tingyuansen commited on Nov 16, 2024

Commit

ed8cab4

·

verified ·

1 Parent(s): 177cd69

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ AstroLLaMA-3-8B-Base_Summary is a specialized base language model for astronomy,
   - Cosine decay schedule for learning rate reduction
   - Training duration: 1 epoch
 - **Primary Use**: Next token prediction for astronomy-related text generation and analysis
-- **Reference**: Pan et al. 2024 [Link to be added]
 ## Generating text from a prompt
@@ -79,6 +79,7 @@ Here's a performance comparison chart based upon the astronomical benchmarking Q
 | Model | Score (%) |
 |-------|-----------|
 | LLaMA-3.1-8B | 73.7 |
 | LLaMA-3-8B | 72.9 |
 | **<span style="color:green">AstroLLaMA-3-8B-Base_Summary (AstroMLab)</span>** | **<span style="color:green">72.3</span>** |
@@ -92,7 +93,7 @@ Here's a performance comparison chart based upon the astronomical benchmarking Q
 As shown, AstroLLaMA-3-8B-Base_Summary performs competitively, nearly matching the performance of the base LLaMA-3.1-8B model and outperforming the AIC version. This improvement demonstrates the importance of information density in the training data.
-Notably, the instruct version of this model shows even more significant improvements, highlighting the effectiveness of the summarization approach in capturing and retaining key astronomical concepts. For detailed performance analysis of the instruct version, please refer to Pan et al. 2024.
 While AstroLLaMA-3-8B performs competitively among models in its class, it does not surpass the performance of the base LLaMA-3-8B model. This underscores the challenges in developing specialized models and the need for more diverse and comprehensive training data.

   - Cosine decay schedule for learning rate reduction
   - Training duration: 1 epoch
 - **Primary Use**: Next token prediction for astronomy-related text generation and analysis
+- **Reference**: [Pan et al. 2024](https://arxiv.org/abs/2409.19750)
 ## Generating text from a prompt
 | Model | Score (%) |
 |-------|-----------|
+| **AstroSage-LLaMA-3.1-8B (AstroMLab)** | **80.9** |
 | LLaMA-3.1-8B | 73.7 |
 | LLaMA-3-8B | 72.9 |
 | **<span style="color:green">AstroLLaMA-3-8B-Base_Summary (AstroMLab)</span>** | **<span style="color:green">72.3</span>** |
 As shown, AstroLLaMA-3-8B-Base_Summary performs competitively, nearly matching the performance of the base LLaMA-3.1-8B model and outperforming the AIC version. This improvement demonstrates the importance of information density in the training data.
+Notably, the instruct version of this model shows even more significant improvements, highlighting the effectiveness of the summarization approach in capturing and retaining key astronomical concepts. For detailed performance analysis of the instruct version, please refer to [Pan et al. 2024](https://arxiv.org/abs/2409.19750).
 While AstroLLaMA-3-8B performs competitively among models in its class, it does not surpass the performance of the base LLaMA-3-8B model. This underscores the challenges in developing specialized models and the need for more diverse and comprehensive training data.