Model Name | Parameters | Class | Ratio | Tokens | Batch Size (Tokens) | Training Loss |
---|---|---|---|---|---|---|
GerbilLab/GerbilBlender-A-6.7m | 6.7m | A-Class | 20 | 134M | 131k | 6.0908 |
"Blender" models, inspired by UL2 pretraining, are trained equally in fill-in-the-middle, causal modelling, and masked language modelling tasks. Special tokens for these models include:
'<fitm_start>', '<multiple_tok_mask>', '<fitm_result>', '<causal>', '<mlm_start>', '<single_tok_mask>', '<mlm_end>'
# Example fill in the middle
'<fitm_start> this is an <multiple_tok_mask> for fill-in-the-middle <fitm_result> example text <|endoftext|>'
# Example causal language modelling
'<causal> this is an example text for causal language modelling <|endoftext|>'
# Example masked language modelling
'<mlm_start> this is an <single_tok_mask> text for masked language modelling <mlm_end> example <|endoftext|>'
- Downloads last month
- 231
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.