Trelis
/

TrelisSmolLM-base

Model card Files Files and versions Metrics Training metrics Community

TrelisSmolLM-base / README.md

rs545837's picture

Update README.md

9f7c095 verified 7 days ago

|

history blame contribute delete

No virus

2.41 kB

	---
	datasets:
	- Trelis/smollm-corpus-2percent
	language:
	- en
	base_model:
	- HuggingFaceTB/SmolLM-360M
	tags:
	- language_model
	- pruned
	- distilled
	---

	# Model Card for TrelisSmolLM-base

	This model is a pruned and distilled version of SmolLM-360M, created for scientific curiosity.

	To purchase the training scripts used for this model, visit: https://trelis.com/advanced-fine-tuning-scripts/

	## Model Details

	### Model Description

	- Developed by: Trelis Team
	- Model type: Language Model
	- Language(s) (NLP): English
	- License: [More Information Needed]
	- Finetuned from model: HuggingFaceTB/SmolLM-360M

	TrelisLM-80M is a 80 million parameter language model derived from SmolLM-360M. It was created through a process of layer and width pruning, followed by distillation from SmolLM-360M-Instruct using Forward KL loss.

	## Uses

	### Direct Use

	This model is primarily intended for scientific curiosity and research purposes. It can be used to explore the effects of model pruning and distillation on language model performance.

	### Out-of-Scope Use

	As this model is still not completely trained, it should not be used for any production or real-world applications at this stage.

	## Bias, Risks, and Limitations

	The model is still in the training process and may have unpredictable behaviors or biases. It should be used with caution and only for research purposes.

	### Recommendations

	Users should be aware that this model is a work in progress and its outputs should not be relied upon for any critical or sensitive tasks.

	## Training Details

	### Training Data

	The model was distilled using the Trelis/smollm-corpus-2percent dataset.

	### Training Procedure

	The training procedure involved the following steps:
	1. Layer pruning of SmolLM-360M
	2. Width pruning of SmolLM-360M
	3. Distillation from SmolLM-360M-Instruct using Forward KL loss

	## Evaluation

	Evaluation results are not yet available for this model.

	## Model Examination

	Further examination and interpretation of the model's behavior are needed.

	## Environmental Impact

	[More Information Needed]

	## Technical Specifications

	### Model Architecture and Objective

	TrelisLM-80M is an 80 million parameter language model derived from SmolLM-360M through pruning and distillation from SmolLM-360M-Instruct.

	### Compute Infrastructure

	[More Information Needed]

	## Model Card Contact

	[More Information Needed]