tingyuansen commited on
Commit
050a4a8
·
verified ·
1 Parent(s): 7d2f249

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -54
README.md CHANGED
@@ -1,58 +1,108 @@
1
  ---
2
- base_model: output_models/abot-70b_epoch-1
 
 
 
3
  tags:
4
- - generated_from_trainer
5
- datasets:
6
- - customized
7
- model-index:
8
- - name: abot-70b_chat
9
- results: []
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
-
15
- # abot-70b_chat
16
-
17
- This model is a fine-tuned version of [output_models/abot-70b_epoch-1](https://huggingface.co/output_models/abot-70b_epoch-1) on the customized dataset.
18
-
19
- ## Model description
20
-
21
- More information needed
22
-
23
- ## Intended uses & limitations
24
-
25
- More information needed
26
-
27
- ## Training and evaluation data
28
-
29
- More information needed
30
-
31
- ## Training procedure
32
-
33
- ### Training hyperparameters
34
-
35
- The following hyperparameters were used during training:
36
- - learning_rate: 2e-05
37
- - train_batch_size: 6
38
- - eval_batch_size: 1
39
- - seed: 42
40
- - distributed_type: multi-GPU
41
- - num_devices: 8
42
- - total_train_batch_size: 48
43
- - total_eval_batch_size: 8
44
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
- - lr_scheduler_type: cosine
46
- - lr_scheduler_warmup_ratio: 0.03
47
- - num_epochs: 1.0
48
-
49
- ### Training results
50
-
51
-
52
-
53
- ### Framework versions
54
-
55
- - Transformers 4.33.3
56
- - Pytorch 2.1.2+cu121
57
- - Datasets 2.14.6
58
- - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
  tags:
7
+ - llama-2
8
+ - astronomy
9
+ - astrophysics
10
+ - arxiv
11
+ inference: false
12
+ base_model:
13
+ - meta-llama/Llama-2-70b-hf
14
  ---
15
 
16
+ # AstroLLaMA-2-70B-Chat_AIC
17
+
18
+ AstroLLaMA-2-70B-Chat_AIC is a specialized chat model for astronomy, developed by fine-tuning the AstroLLaMA-2-70B-Base_AIC model. This model was developed by the AstroMLab team and is, to our best knowledge, one of the first specialized 70B parameter-level LLMs in astronomy designed for instruction-following and chat-based interactions.
19
+
20
+ ## Model Details
21
+
22
+ - **Base Architecture**: LLaMA-2-70b
23
+ - **Base Model**: AstroLLaMA-2-70B-Base_AIC (trained on Abstract, Introduction, and Conclusion sections from arXiv's astro-ph category papers)
24
+ - **Fine-tuning Method**: Supervised Fine-Tuning (SFT)
25
+ - **SFT Dataset**:
26
+ - 10,356 astronomy-centered conversations generated from arXiv abstracts by GPT-4
27
+ - Full content of LIMA dataset
28
+ - 10,000 samples from Open Orca dataset
29
+ - 10,000 samples from UltraChat dataset
30
+ - **Training Details**:
31
+ - Learning rate: 3 × 10⁻⁷
32
+ - Training epochs: 1
33
+ - Total batch size: 48
34
+ - Maximum token length: 2048
35
+ - Warmup ratio: 0.03
36
+ - Cosine decay schedule for learning rate reduction
37
+ - **Primary Use**: Instruction-following and chat-based interactions for astronomy-related queries
38
+ - **Reference**: Pan et al. 2024 [Link to be added]
39
+
40
+ ## Using the model for chat
41
+
42
+ ```python
43
+ from transformers import AutoModelForCausalLM, AutoTokenizer
44
+ import torch
45
+
46
+ # Load the model and tokenizer
47
+ tokenizer = AutoTokenizer.from_pretrained("AstroMLab/astrollama-2-70b-chat_aic")
48
+ model = AutoModelForCausalLM.from_pretrained("AstroMLab/astrollama-2-70b-chat_aic", device_map="auto")
49
+
50
+ # Function to generate a response
51
+ def generate_response(prompt, max_length=512):
52
+ full_prompt = f"###Human: {prompt}\n\n###Assistant:"
53
+ inputs = tokenizer(full_prompt, return_tensors="pt", truncation=True, max_length=max_length)
54
+ inputs = inputs.to(model.device)
55
+
56
+ # Generate a response
57
+ with torch.no_grad():
58
+ outputs = model.generate(
59
+ **inputs,
60
+ max_length=max_length,
61
+ num_return_sequences=1,
62
+ do_sample=True,
63
+ pad_token_id=tokenizer.eos_token_id,
64
+ eos_token_id=tokenizer.encode("###Human:", add_special_tokens=False)[0]
65
+ )
66
+
67
+ # Decode and return the response
68
+ response = tokenizer.decode(outputs[0], skip_special_tokens=False)
69
+
70
+ # Extract only the Assistant's response
71
+ assistant_response = response.split("###Assistant:")[-1].strip()
72
+ return assistant_response
73
+
74
+ # Example usage
75
+ user_input = "What are the main components of a galaxy?"
76
+ response = generate_response(user_input)
77
+ print(f"Human: {user_input}")
78
+ print(f"Assistant: {response}")
79
+ ```
80
+
81
+ ## Model Performance and Limitations
82
+
83
+ While the AstroLLaMA-2-70B-Base_AIC model demonstrated significant improvements over its baseline LLaMA-2-70B model, the chat version (AstroLLaMA-2-70B-Chat_AIC) experiences performance degradation due to limitations in the SFT process. Here's a performance comparison:
84
+
85
+
86
+ Key limitations:
87
+
88
+ 1. **SFT Dataset Limitations**: The current SFT dataset, with only 30,000 Q&As (many not astronomy-focused), has proven inadequate for maintaining the base model's performance.
89
+ 2. **Performance Degradation**: The chat model's performance (64.7%) is significantly lower than the base model (76.0%), indicating an 11.3-point decrement due to the SFT process.
90
+ 3. **General Knowledge vs. Specialized Knowledge**: The current SFT process appears to deviate the model towards general answers, potentially at the cost of specialized astronomical knowledge.
91
+
92
+ These limitations underscore the challenges in developing specialized chat models and the critical importance of both the quantity and quality of training data, especially for the SFT process.
93
+
94
+ This model is released primarily for reproducibility purposes, allowing researchers to track the development process and compare different iterations of AstroLLaMA models.
95
+
96
+ For optimal performance and the most up-to-date capabilities in astronomy-related tasks, we recommend using AstroSage-8B, where these limitations have been addressed through expanded training data and refined fine-tuning processes.
97
+
98
+ ## Ethical Considerations
99
+
100
+ While this model is designed for scientific use, users should be mindful of potential misuse, such as generating misleading scientific content. Always verify model outputs against peer-reviewed sources for critical applications.
101
+
102
+ ## Citation
103
+
104
+ If you use this model in your research, please cite:
105
+
106
+ ```
107
+ [Citation for Pan et al. 2024 to be added]
108
+ ```