TrelisSmolLM-base / README.md
rs545837's picture
Update README.md
9f7c095 verified
metadata
datasets:
  - Trelis/smollm-corpus-2percent
language:
  - en
base_model:
  - HuggingFaceTB/SmolLM-360M
tags:
  - language_model
  - pruned
  - distilled

Model Card for TrelisSmolLM-base

This model is a pruned and distilled version of SmolLM-360M, created for scientific curiosity.

To purchase the training scripts used for this model, visit: https://trelis.com/advanced-fine-tuning-scripts/

Model Details

Model Description

  • Developed by: Trelis Team
  • Model type: Language Model
  • Language(s) (NLP): English
  • License: [More Information Needed]
  • Finetuned from model: HuggingFaceTB/SmolLM-360M

TrelisLM-80M is a 80 million parameter language model derived from SmolLM-360M. It was created through a process of layer and width pruning, followed by distillation from SmolLM-360M-Instruct using Forward KL loss.

Uses

Direct Use

This model is primarily intended for scientific curiosity and research purposes. It can be used to explore the effects of model pruning and distillation on language model performance.

Out-of-Scope Use

As this model is still not completely trained, it should not be used for any production or real-world applications at this stage.

Bias, Risks, and Limitations

The model is still in the training process and may have unpredictable behaviors or biases. It should be used with caution and only for research purposes.

Recommendations

Users should be aware that this model is a work in progress and its outputs should not be relied upon for any critical or sensitive tasks.

Training Details

Training Data

The model was distilled using the Trelis/smollm-corpus-2percent dataset.

Training Procedure

The training procedure involved the following steps:

  1. Layer pruning of SmolLM-360M
  2. Width pruning of SmolLM-360M
  3. Distillation from SmolLM-360M-Instruct using Forward KL loss

Evaluation

Evaluation results are not yet available for this model.

Model Examination

Further examination and interpretation of the model's behavior are needed.

Environmental Impact

[More Information Needed]

Technical Specifications

Model Architecture and Objective

TrelisLM-80M is an 80 million parameter language model derived from SmolLM-360M through pruning and distillation from SmolLM-360M-Instruct.

Compute Infrastructure

[More Information Needed]

Model Card Contact

[More Information Needed]