rs545837 commited on
Commit
7ba8f76
1 Parent(s): 8dd5b4e

create readme.md and model card

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - Trelis/smollm-corpus-2percent
4
+ language:
5
+ - en
6
+ base_model:
7
+ - HuggingFaceTB/SmolLM-360M
8
+ tags:
9
+ - language_model
10
+ - pruned
11
+ - distilled
12
+ ---
13
+
14
+ # Model Card for TrelisLM-80M
15
+
16
+ This model is a pruned and distilled version of SmolLM-360M, created for scientific curiosity.
17
+
18
+ ## Model Details
19
+
20
+ ### Model Description
21
+
22
+ - **Developed by:** Trelis Team
23
+ - **Model type:** Language Model
24
+ - **Language(s) (NLP):** English
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model:** HuggingFaceTB/SmolLM-360M
27
+
28
+ TrelisLM-80M is a 80 million parameter language model derived from SmolLM-360M. It was created through a process of layer and width pruning, followed by distillation from SmolLM-360M-Instruct using Forward KL loss.
29
+
30
+ ## Uses
31
+
32
+ ### Direct Use
33
+
34
+ This model is primarily intended for scientific curiosity and research purposes. It can be used to explore the effects of model pruning and distillation on language model performance.
35
+
36
+ ### Out-of-Scope Use
37
+
38
+ As this model is still not completely trained, it should not be used for any production or real-world applications at this stage.
39
+
40
+ ## Bias, Risks, and Limitations
41
+
42
+ The model is still in the training process and may have unpredictable behaviors or biases. It should be used with caution and only for research purposes.
43
+
44
+ ### Recommendations
45
+
46
+ Users should be aware that this model is a work in progress and its outputs should not be relied upon for any critical or sensitive tasks.
47
+
48
+ ## Training Details
49
+
50
+ ### Training Data
51
+
52
+ The model was distilled using the Trelis/smollm-corpus-2percent dataset.
53
+
54
+ ### Training Procedure
55
+
56
+ The training procedure involved the following steps:
57
+ 1. Layer pruning of SmolLM-360M
58
+ 2. Width pruning of SmolLM-360M
59
+ 3. Distillation from SmolLM-360M-Instruct using Forward KL loss
60
+
61
+ ## Evaluation
62
+
63
+ Evaluation results are not yet available for this model.
64
+
65
+ ## Model Examination
66
+
67
+ Further examination and interpretation of the model's behavior are needed.
68
+
69
+ ## Environmental Impact
70
+
71
+ [More Information Needed]
72
+
73
+ ## Technical Specifications
74
+
75
+ ### Model Architecture and Objective
76
+
77
+ TrelisLM-80M is an 80 million parameter language model derived from SmolLM-360M through pruning and distillation.
78
+
79
+ ### Compute Infrastructure
80
+
81
+ [More Information Needed]
82
+
83
+ ## Model Card Contact
84
+
85
+ [More Information Needed]