tingyuansen commited on
Commit
f24d417
1 Parent(s): 1ecb130

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -39
README.md CHANGED
@@ -1,59 +1,73 @@
1
  ---
2
- license: llama3
3
- base_model: meta-llama/Meta-Llama-3-8B
 
 
4
  tags:
5
- - generated_from_trainer
6
- datasets:
7
- - customized
8
- model-index:
9
- - name: astrollama-3-8b_summary_lmflow
10
- results: []
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
- # astrollama-3-8b_summary_lmflow
17
 
18
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the customized dataset.
19
 
20
- ## Model description
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- More information needed
23
 
24
- ## Intended uses & limitations
25
 
26
- More information needed
27
 
28
- ## Training and evaluation data
29
 
30
- More information needed
31
 
32
- ## Training procedure
 
 
 
 
 
 
 
 
 
33
 
34
- ### Training hyperparameters
35
 
36
- The following hyperparameters were used during training:
37
- - learning_rate: 2e-05
38
- - train_batch_size: 24
39
- - eval_batch_size: 8
40
- - seed: 42
41
- - distributed_type: multi-GPU
42
- - num_devices: 7
43
- - total_train_batch_size: 168
44
- - total_eval_batch_size: 56
45
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
- - lr_scheduler_type: cosine
47
- - lr_scheduler_warmup_ratio: 0.03
48
- - num_epochs: 1.0
49
 
50
- ### Training results
51
 
 
52
 
 
53
 
54
- ### Framework versions
55
 
56
- - Transformers 4.41.2
57
- - Pytorch 2.3.1+cu121
58
- - Datasets 2.14.6
59
- - Tokenizers 0.19.1
 
1
  ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
  tags:
7
+ - llama-3
8
+ - astronomy
9
+ - astrophysics
10
+ - arxiv
11
+ inference: false
12
+ base_model:
13
+ - meta-llama/Llama-3-8b-hf
14
  ---
15
 
16
+ # AstroLLaMA-3-8B-Base_AIC
 
17
 
18
+ AstroLLaMA-3-8B is a specialized base language model for astronomy, developed by fine-tuning Meta's LLaMA-3-8b architecture on astronomical literature. This model was developed by the AstroMLab team. It is designed for next token prediction tasks and is not an instruct/chat model.
19
 
20
+ ## Model Details
21
 
22
+ - **Base Architecture**: LLaMA-3-8b
23
+ - **Training Data**: Abstract, Introduction, and Conclusion (AIC) sections from arXiv's astro-ph category papers (from arXiv's inception up to January 2024)
24
+ - **Data Processing**: Optical character recognition (OCR) on PDF files using the Nougat tool, followed by summarization using Qwen-2-8B and LLaMA-3.1-8B.
25
+ - **Fine-tuning Method**: Continual Pre-Training (CPT) using the LMFlow framework
26
+ - **Training Details**:
27
+ - Learning rate: 2 × 10⁻⁵
28
+ - Total batch size: 96
29
+ - Maximum token length: 512
30
+ - Warmup ratio: 0.03
31
+ - No gradient accumulation
32
+ - BF16 format
33
+ - Cosine decay schedule for learning rate reduction
34
+ - Training duration: 1 epoch (approximately 32 A100 GPU hours)
35
+ - **Primary Use**: Next token prediction for astronomy-related text generation and analysis
36
+ - **Reference**: Pan et al. 2024 [Link to be added]
37
 
38
+ ## Generating text from a prompt
39
 
40
+ [The code example remains the same as in the previous version]
41
 
42
+ ## Model Limitations and Biases
43
 
44
+ A key limitation identified during the development of this model is that training solely on astro-ph data may not be sufficient to significantly improve performance over the base model, especially for the already highly performant LLaMA-3 series. This suggests that to achieve substantial gains, future iterations may need to incorporate a broader range of high-quality astronomical data beyond arXiv, such as textbooks, Wikipedia, and curated summaries.
45
 
46
+ Here's a performance comparison chart based upon the astronomical benchmarking Q&A as described in [Ting et al. 2024](https://arxiv.org/abs/2407.11194), and Pan et al. 2024:
47
 
48
+ | Model | Score (%) |
49
+ |-------|-----------|
50
+ | **AstroLLaMA-3-8B (AstroMLab)** | **72.3** |
51
+ | LLaMA-3-8B | 72.0 |
52
+ | Gemma-2-9B | 71.5 |
53
+ | Qwen-2.5-7B | 70.4 |
54
+ | Yi-1.5-9B | 68.4 |
55
+ | InternLM-2.5-7B | 64.0 |
56
+ | Mistral-7B-v0.3 | 63.9 |
57
+ | ChatGLM3-6B | 50.4 |
58
 
59
+ As shown, while AstroLLaMA-3-8B performs competitively among models in its class, it does not surpass the performance of the base LLaMA-3-8B model. This underscores the challenges in developing specialized models and the need for more diverse and comprehensive training data.
60
 
61
+ It's worth noting that the AstroLLaMA-3-8B-Plus which we will release in the next model release addresses these limitations by expanding beyond astro-ph data.
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
+ ## Ethical Considerations
64
 
65
+ While this model is designed for scientific use, users should be mindful of potential misuse, such as generating misleading scientific content. Always verify model outputs against peer-reviewed sources for critical applications.
66
 
67
+ ## Citation
68
 
69
+ If you use this model in your research, please cite:
70
 
71
+ ```
72
+ [Citation for Pan et al. 2024 to be added]
73
+ ```