capybara_finetuned_results3

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 5.6542

video demo : (its pretty bad)

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 5
training_steps: 800

Training results

Training Loss	Epoch	Step	Validation Loss
15.5311	0.0230	50	14.5422
8.7477	0.0460	100	9.2952
7.3554	0.0690	150	7.1992
6.828	0.0920	200	6.7258
6.4694	0.1150	250	6.3597
6.3401	0.1381	300	6.1703
6.1256	0.1611	350	6.0395
6.0372	0.1841	400	5.9271
6.0221	0.2071	450	5.8464
5.8783	0.2301	500	5.7810
5.8339	0.2531	550	5.7335
5.8546	0.2761	600	5.6904
5.9169	0.2991	650	5.6690
5.7959	0.3221	700	5.6565
5.7271	0.3451	750	5.6543
5.8734	0.3682	800	5.6542

Framework versions

Transformers 4.44.2
Pytorch 2.4.0
Datasets 3.0.0
Tokenizers 0.19.1

Downloads last month: 248

Safetensors

Model size

494M params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Model tree for archit11/qwen_worldmodel

Base model

Qwen/Qwen2.5-0.5B

Quantized

(48)

this model

Dataset used to train archit11/qwen_worldmodel

Evaluation results

Metadata error: specify a dataset to view leaderboard