ghost613
/

gemma7_on_korean_events

Generated from Trainer

Model card Files Files and versions Community

gemma7_on_korean_events / README.md

ghost613's picture

gemma7_on_korean_events

bc2ce14 verified 5 months ago

|

history blame contribute delete

3.39 kB

	---
	license: other
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: beomi/gemma-ko-7b
	model-index:
	- name: gemma7_on_korean_events
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# gemma7_on_korean_events

	This model is a fine-tuned version of [beomi/gemma-ko-7b](https://huggingface.co/beomi/gemma-ko-7b) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5534

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 5
	- total_train_batch_size: 10
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 50
	- training_steps: 760
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.0063 \| 0.2632 \| 20 \| 0.6860 \|
	\| 0.4681 \| 0.5263 \| 40 \| 0.3754 \|
	\| 0.3565 \| 0.7895 \| 60 \| 0.3030 \|
	\| 0.2872 \| 1.0526 \| 80 \| 0.2880 \|
	\| 0.2557 \| 1.3158 \| 100 \| 0.2723 \|
	\| 0.2213 \| 1.5789 \| 120 \| 0.2674 \|
	\| 0.2386 \| 1.8421 \| 140 \| 0.2537 \|
	\| 0.1885 \| 2.1053 \| 160 \| 0.3032 \|
	\| 0.1362 \| 2.3684 \| 180 \| 0.2710 \|
	\| 0.131 \| 2.6316 \| 200 \| 0.2844 \|
	\| 0.1366 \| 2.8947 \| 220 \| 0.2626 \|
	\| 0.0935 \| 3.1579 \| 240 \| 0.3632 \|
	\| 0.0618 \| 3.4211 \| 260 \| 0.3315 \|
	\| 0.0672 \| 3.6842 \| 280 \| 0.3255 \|
	\| 0.0684 \| 3.9474 \| 300 \| 0.3238 \|
	\| 0.0393 \| 4.2105 \| 320 \| 0.4230 \|
	\| 0.0311 \| 4.4737 \| 340 \| 0.4180 \|
	\| 0.0337 \| 4.7368 \| 360 \| 0.3933 \|
	\| 0.0402 \| 5.0 \| 380 \| 0.3846 \|
	\| 0.0179 \| 5.2632 \| 400 \| 0.4478 \|
	\| 0.0176 \| 5.5263 \| 420 \| 0.4464 \|
	\| 0.0274 \| 5.7895 \| 440 \| 0.4003 \|
	\| 0.017 \| 6.0526 \| 460 \| 0.4284 \|
	\| 0.0112 \| 6.3158 \| 480 \| 0.4675 \|
	\| 0.0107 \| 6.5789 \| 500 \| 0.4715 \|
	\| 0.0115 \| 6.8421 \| 520 \| 0.4911 \|
	\| 0.0107 \| 7.1053 \| 540 \| 0.4776 \|
	\| 0.0053 \| 7.3684 \| 560 \| 0.4829 \|
	\| 0.0049 \| 7.6316 \| 580 \| 0.4962 \|
	\| 0.0046 \| 7.8947 \| 600 \| 0.5087 \|
	\| 0.0039 \| 8.1579 \| 620 \| 0.5240 \|
	\| 0.0028 \| 8.4211 \| 640 \| 0.5317 \|
	\| 0.0035 \| 8.6842 \| 660 \| 0.5351 \|
	\| 0.0034 \| 8.9474 \| 680 \| 0.5393 \|
	\| 0.0013 \| 9.2105 \| 700 \| 0.5445 \|
	\| 0.0031 \| 9.4737 \| 720 \| 0.5502 \|
	\| 0.0018 \| 9.7368 \| 740 \| 0.5523 \|
	\| 0.002 \| 10.0 \| 760 \| 0.5534 \|


	### Framework versions

	- PEFT 0.11.2.dev0
	- Transformers 4.41.1
	- Pytorch 2.3.0+cu121
	- Datasets 2.15.0
	- Tokenizers 0.19.1