SujanKarki/Meta_Llama3.1_8b_instruct_text_to_sql_vera

Training Details

Training Data

gretelai/synthetic_text_to_sql https://huggingface.co/datasets/gretelai/synthetic_text_to_sql gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples. The dataset includes 105,851 records partitioned into 100,000 train and 5,851 test records. But i used only 50k records for my training.

Training Result

Step Training Loss

10 1.296000

20 1.331600

30 1.279400

40 1.312900

50 1.274100

60 1.271700

70 1.209100

80 1.192600

90 1.176700

100 1.118300

110 1.086800

120 1.048000

130 1.019500

140 1.001400

150 0.994300

160 0.934900

170 0.904500

180 0.879900

190 0.850400

200 0.828000

210 0.811400

220 0.846000

230 0.791100

240 0.766900

250 0.782000

260 0.718300

270 0.701800

280 0.720000

290 0.693600

300 0.676500

310 0.679900

320 0.673200

330 0.669500

340 0.692800

350 0.662200

360 0.761200

370 0.659600

380 0.683700

390 0.681200

400 0.674000

410 0.651800

420 0.641800

430 0.646500

440 0.664200

450 0.633600

460 0.646900

470 0.643400

480 0.658800

490 0.631500

500 0.678200

510 0.633400

520 0.623300

530 0.655700

540 0.631500

550 0.617700

560 0.644000

570 0.650200

580 0.618500

590 0.615400

600 0.614000

610 0.612800

620 0.616900

630 0.640200

640 0.613000

650 0.611400

660 0.617000

670 0.629800

680 0.648800

690 0.608800

700 0.603200

710 0.628200

720 0.629700

730 0.604400

740 0.610700

750 0.621300

760 0.617900

770 0.596500

780 0.612800

790 0.611700

800 0.618600

810 0.590900

820 0.590300

830 0.592900

840 0.611700

850 0.628300

860 0.590100

870 0.584800

880 0.591200

890 0.585900

900 0.607000

910 0.578800

920 0.576600

930 0.597600

940 0.602100

950 0.579000

960 0.597900

970 0.590600

980 0.606100

990 0.577600

1000 0.584000

1010 0.569300

1020 0.594000

1030 0.596100

1040 0.590600

1050 0.570300

1060 0.572800

1070 0.572200

1080 0.569900

1090 0.587200

1100 0.572200

1110 0.569700

1120 0.612500

1130 0.587800

1140 0.568100

1150 0.573100

1160 0.568300

1170 0.620800

1180 0.570600

1190 0.561500

1200 0.560200

1210 0.592400

1220 0.580500

1230 0.578300

1240 0.573400

1250 0.568800

1260 0.600500

1270 0.578800

1280 0.561300

1290 0.570900

1300 0.567700

1310 0.589800

1320 0.598200

1330 0.564900

1340 0.577500

1350 0.565700

1360 0.581400

1370 0.562000

1380 0.588200

1390 0.603800

1400 0.560300

1410 0.559600

1420 0.567000

1430 0.562700

1440 0.564200

1450 0.563700

1460 0.561100

1470 0.561100

1480 0.561600

1490 0.564800

1500 0.579100

1510 0.564100

1520 0.562900

1530 0.569800

1540 0.566200

1550 0.599100

1560 0.562000

1570 0.580600

1580 0.564900

1590 0.571900

1600 0.580000

1610 0.559200

1620 0.566900

1630 0.556100

Training Hyperparameters

The following hyperparameters were used during training: num_train_epochs=3,
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
optim="adamw_torch_fused",
learning_rate=2e-4,
max_grad_norm=0.3,
weight_decay=0.01,
lr_scheduler_type="cosine",
warmup_steps=50, bf16=True,
tf32=True, )

SujanKarki
/

Meta_Llama3.1_8b_instruct_text_to_sql_vera

Training Details

Training Data

Training Result

Training Hyperparameters

Model tree for SujanKarki/Meta_Llama3.1_8b_instruct_text_to_sql_vera