flan-t5-large-coref / README.md
jtlicardo's picture
Update README.md
3e541c2
|
raw
history blame
4.13 kB
metadata
license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - winograd_wsc
metrics:
  - rouge
model-index:
  - name: flan-t5-large-coref
    results:
      - task:
          name: Sequence-to-sequence Language Modeling
          type: text2text-generation
        dataset:
          name: winograd_wsc
          type: winograd_wsc
          config: wsc285
          split: test
          args: wsc285
        metrics:
          - name: Rouge1
            type: rouge
            value: 0.9495
widget:
  - text: Sam has a Parker pen. He loves writing with it.
    example_title: Example 1
  - text: >-
      Coronavirus quickly spread worldwide in 2020. The virus mostly affects
      elderly people. They can easily catch it.
    example_title: Example 2
  - text: >-
      First, the manager evaluates the candidates. Afterwards, he notifies the
      candidates regarding the evaluation.
    example_title: Example 3

flan-t5-large-coref

This model is a fine-tuned version of google/flan-t5-large on the winograd_wsc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2404
  • Rouge1: 0.9495
  • Rouge2: 0.9107
  • Rougel: 0.9494
  • Rougelsum: 0.9494
  • Gen Len: 23.4828

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.0169 1.0 16 0.6742 0.7918 0.6875 0.7836 0.7847 18.2414
0.6275 2.0 32 0.5093 0.8776 0.7947 0.8734 0.8732 21.5517
0.596 3.0 48 0.4246 0.9104 0.8486 0.9085 0.9091 22.5172
0.743 4.0 64 0.3632 0.9247 0.8661 0.9235 0.9231 22.8621
0.5007 5.0 80 0.3301 0.9353 0.8845 0.9357 0.9353 22.8621
0.2567 6.0 96 0.3093 0.9388 0.8962 0.9392 0.9388 22.9655
0.4146 7.0 112 0.2978 0.9449 0.907 0.9455 0.9458 23.1034
0.1991 8.0 128 0.2853 0.9454 0.9064 0.946 0.9462 23.069
0.1786 9.0 144 0.2794 0.9475 0.9097 0.9475 0.9477 23.069
0.3559 10.0 160 0.2701 0.9424 0.9013 0.9428 0.9426 23.0345
0.2059 11.0 176 0.2636 0.9472 0.9069 0.9472 0.9472 23.0345
0.199 12.0 192 0.2592 0.9523 0.9141 0.9521 0.9524 23.4483
0.1634 13.0 208 0.2553 0.9523 0.9141 0.9521 0.9524 23.4483
0.2006 14.0 224 0.2518 0.9523 0.9141 0.9521 0.9524 23.4483
0.1419 15.0 240 0.2487 0.9523 0.9141 0.9521 0.9524 23.4483
0.2089 16.0 256 0.2456 0.9523 0.9141 0.9521 0.9524 23.4483
0.1007 17.0 272 0.2431 0.9523 0.9141 0.9521 0.9524 23.4483
0.1598 18.0 288 0.2415 0.9495 0.9107 0.9494 0.9494 23.4828
0.3088 19.0 304 0.2407 0.9495 0.9107 0.9494 0.9494 23.4828
0.2003 20.0 320 0.2404 0.9495 0.9107 0.9494 0.9494 23.4828

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.0+cu116
  • Datasets 2.7.1
  • Tokenizers 0.13.2