File size: 3,503 Bytes

a3c68db
d299a7e
 
a3c68db
 
 
 
 
 
 
 
 
 
 
 
 
 
d299a7e
a3c68db
 
 
 
 
 
 
d299a7e
9b1ea80
cc1b17a
 
 
 
 
 
 
 
 
 
 
9b1ea80
a3c68db
 
 
 
 
 
 
d299a7e
a3c68db
d299a7e
 
a3c68db

---
language:
- en
license: mit
tags:
- generated_from_trainer
datasets:
- glue
metrics:
- matthews_correlation
model-index:
- name: xtremedistil-l12-h384-uncased-CoLA
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: GLUE COLA
      type: glue
      config: cola
      split: validation
      args: cola
    metrics:
    - name: Matthews Correlation
      type: matthews_correlation
      value: 0.5395539646127814

widget:
  - text: 'The cat sat on the mat.'
    example_title: Correct grammatical sentence
  - text: 'Me and my friend going to the store.'
    example_title: Incorrect subject-verb agreement
  - text: 'I ain''t got no money.'
    example_title: Incorrect verb conjugation and double negative
  - text: 'She don''t like pizza no more.'
    example_title: Incorrect verb conjugation and double negative
  - text: 'They is arriving tomorrow.'
    example_title: Incorrect verb conjugation

---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# xtremedistil-l12-h384-uncased-CoLA

This model is a fine-tuned version of [microsoft/xtremedistil-l12-h384-uncased](https://huggingface.co/microsoft/xtremedistil-l12-h384-uncased) on the GLUE COLA dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4974
- Matthews Correlation: 0.5396

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 128
- eval_batch_size: 16
- seed: 5559
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 16.0
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Matthews Correlation |
|:-------------:|:-----:|:----:|:---------------:|:--------------------:|
| 0.4822        | 1.0   | 67   | 0.5893          | 0.2621               |
| 0.4669        | 2.0   | 134  | 0.5811          | 0.3722               |
| 0.3077        | 3.0   | 201  | 0.6150          | 0.4383               |
| 0.2594        | 4.0   | 268  | 0.4974          | 0.5396               |
| 0.21          | 5.0   | 335  | 0.5594          | 0.5182               |
| 0.1526        | 6.0   | 402  | 0.5715          | 0.5150               |
| 0.1775        | 7.0   | 469  | 0.6637          | 0.5020               |
| 0.1681        | 8.0   | 536  | 0.6958          | 0.5131               |
| 0.124         | 9.0   | 603  | 0.7057          | 0.5154               |
| 0.1111        | 10.0  | 670  | 0.8173          | 0.5074               |
| 0.1332        | 11.0  | 737  | 0.8253          | 0.5260               |
| 0.0673        | 12.0  | 804  | 0.8086          | 0.5180               |
| 0.0512        | 13.0  | 871  | 0.8409          | 0.5128               |
| 0.0457        | 14.0  | 938  | 0.8760          | 0.4947               |
| 0.04          | 15.0  | 1005 | 0.8522          | 0.5103               |
| 0.0485        | 16.0  | 1072 | 0.8556          | 0.5076               |


### Framework versions

- Transformers 4.27.0.dev0
- Pytorch 1.13.1+cu117
- Datasets 2.8.0
- Tokenizers 0.13.1