Model	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum
T5-GenQ-T-v1	75.2151	54.8735	74.5142	74.5262
T5-GenQ-TD-v1	78.2570	58.9586	77.5308	77.5466
T5-GenQ-TDE-v1	76.9075	57.0980	76.1464	76.1502
T5-GenQ-TDC-v1 (best)	80.0754	61.5974	79.3557	79.3427

Model

ROUGE-1

ROUGE-2

ROUGE-L

ROUGE-Lsum

T5-GenQ-T-v1

75.2151

54.8735

74.5142

74.5262

T5-GenQ-TD-v1

78.2570

58.9586

77.5308

77.5466

T5-GenQ-TDE-v1

76.9075

57.0980

76.1464

76.1502

T5-GenQ-TDC-v1 (best)

80.0754

61.5974

79.3557

79.3427

Model	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum
T5-GenQ-TD-v1	76.15	56.23	75.49	75.49
query-gen-msmarco-t5-base-v1	34.92	15.28	34.17	34.17

Model

ROUGE-1

ROUGE-2

ROUGE-L

ROUGE-Lsum

T5-GenQ-TD-v1

76.15

56.23

75.49

query-gen-msmarco-t5-base-v1

34.92

15.28

34.17

Input Text	Target Query	Before Fine-tuning	After Fine-tuning
Dr. Scholl's Women's Trance Slip Resistant Clog Our trance work shoe combines exceptional style and performance. An oil and slip-resistant outsole combined with a molded EVA construction will add layers of safety and comfort.	Dr. Scholl's Women's Trance Clog	dr scholl trance shoes	Dr. Scholl's Trance Clog
Girls Birthday Tutu Skirts Dress with Mermaid Birthday Girl Tshirt, Headband, Satin Sash Girls Mermaid Dress Set with Tshirt, Dress, Headband and sash.	girls mermaid dress set	what to wear for a mermaid birthday	Girls Mermaid Birthday Dress Set
Saucony Women's Omni 15 Running Shoe If we could design shoelaces for pronators, we’d do that too. The omni 15 delivers everything a moderate to severe pronator could need, including enhanced cushioning, exceptional support, flexibility and a smooth, fluid ride.	Saucony Omni 15 Women's Running Shoe	what shoes are good for pronators	Saucony Omni 15 Running Shoe

Input Text

Target Query

Before Fine-tuning

After Fine-tuning

Dr. Scholl's Women's Trance Slip Resistant Clog

Our trance work shoe combines exceptional style and performance. An oil and slip-resistant outsole combined with a molded EVA construction will add layers of safety and comfort.

Dr. Scholl's Women's Trance Clog

dr scholl trance shoes

Dr. Scholl's Trance Clog

Girls Birthday Tutu Skirts Dress with Mermaid Birthday Girl Tshirt, Headband, Satin Sash

Girls Mermaid Dress Set with Tshirt, Dress, Headband and sash.

girls mermaid dress set

what to wear for a mermaid birthday

Girls Mermaid Birthday Dress Set

Saucony Women's Omni 15 Running Shoe

If we could design shoelaces for pronators, we’d do that too. The omni 15 delivers everything a moderate to severe pronator could need, including enhanced cushioning, exceptional support, flexibility and a smooth, fluid ride.

Saucony Omni 15 Women's Running Shoe

what shoes are good for pronators

Saucony Omni 15 Running Shoe

Epoch	Step	Loss	Grad Norm	Learning Rate	Eval Loss	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum
1.0	4285	0.2515	1.890405	0.000049	0.165247	76.4578	56.4813	75.7754	75.7835
2.0	8570	0.1744	1.433518	0.000042	0.157739	77.2138	57.4609	76.5478	76.5589
3.0	12855	0.1595	1.340541	0.000035	0.154977	77.5761	57.9620	76.8824	76.8854
4.0	17140	0.1488	1.370982	0.000028	0.153134	77.9366	58.5720	77.2561	77.2692
5.0	21425	0.1407	1.549360	0.000021	0.153177	78.1102	58.7207	77.4106	77.4241
6.0	25710	0.1344	1.258538	0.000014	0.152852	78.1691	58.8640	77.4554	77.4651
7.0	29995	0.1299	1.200458	0.000007	0.153884	78.2001	58.8603	77.4833	77.4984
8.0	34280	0.1267	1.079393	0.000000	0.154507	78.2570	58.9586	77.5308	77.5466

Epoch

Step

Loss

Grad Norm

Learning Rate

Eval Loss

ROUGE-1

ROUGE-2

ROUGE-L

ROUGE-Lsum

1.0

4285

0.2515

1.890405

0.000049

0.165247

76.4578

56.4813

75.7754

75.7835

2.0

8570

0.1744

1.433518

0.000042

0.157739

77.2138

57.4609

76.5478

76.5589

3.0

12855

0.1595

1.340541

0.000035

0.154977

77.5761

57.9620

76.8824

76.8854

4.0

17140

0.1488

1.370982

0.000028

0.153134

77.9366

58.5720

77.2561

77.2692

5.0

21425

0.1407

1.549360

0.000021

0.153177

78.1102

58.7207

77.4106

77.4241

6.0

25710

0.1344

1.258538

0.000014

0.152852

78.1691

58.8640

77.4554

77.4651

7.0

29995

0.1299

1.200458

0.000007

0.153884

78.2001

58.8603

77.4833

77.4984

8.0

34280

0.1267

1.079393

0.000000

0.154507

78.2570

58.9586

77.5308

77.5466

### Model Analysis

Average scores by model

```checkpoint-34280``` (T5-GenQ-TD-v1) significantly outperforms ```query-gen-msmarco-t5-base-v1``` across all ROUGE metrics. The difference is most notable in ROUGE-2, where ```checkpoint-34280``` achieves 56.24% vs. 15.29% for the baseline model. These results suggest ```checkpoint-34280``` produces more precise and high-overlap text generations.

Density comparison

```checkpoint-34280``` (T5-GenQ-TD-v1) has strong peaks near 100%, indicating high overlap with reference texts. ```query-gen-msmarco-t5-base-v1``` shows a broader distribution, with peaks at low to mid-range scores (10-40%), suggesting greater variability but lower precision. ROUGE-2 has a high density at 0% for the baseline model, implying many instances with no bigram overlap.

Histogram comparison

```checkpoint-34280``` (T5-GenQ-TD-v1, blue) shows a steady increase toward high ROUGE scores, peaking at 100%. ```query-gen-msmarco-t5-base-v1``` (orange) has multiple low-score peaks, particularly in ROUGE-2, reinforcing its lower text overlap performance. These histograms confirm that ```checkpoint-34280``` consistently generates more accurate outputs.

Scores by generated query length

This visualization compares average ROUGE scores and score differences across different word sizes. Consistent ROUGE Scores (Sizes 2-8): ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-LSUM scores remain high and stable across most word sizes. Sharp Drop at Size 9: A significant decrease in scores occurs for size 9 words, with negative score differences, suggesting longer phrases are less aligned with reference texts. Score Differences Stay Near Zero (Sizes 2-8): Models perform similarly for shorter text spans but diverge at larger word sizes.

Semantic similarity distribution

This histogram visualizes the distribution of cosine similarity scores, which measure the semantic similarity between paired texts (generated query and target query). A strong peak near 1.0 indicates that most pairs are highly semantically similar. Low similarity scores (0.0–0.4) are rare, suggesting the dataset consists mostly of highly related text pairs.

Semantic similarity score against ROUGE scores

This scatter plot matrix shows the relationship between semantic similarity (cosine similarity) and ROUGE scores: Higher similarity → Higher ROUGE scores, indicating a positive correlation. ROUGE-1 & ROUGE-L show the strongest alignment, while ROUGE-2 has greater variance. Some low-similarity outliers still achieve moderate ROUGE scores, suggesting surface-level overlap without deep semantic alignment. This analysis helps understand how semantic similarity aligns with n-gram overlap metrics for evaluating text models.