deberta-v3-xsmall-zyda-2-transformed-readability-new

Model Overview

This model is a fine-tuned version of agentlans/deberta-v3-xsmall-zyda-2 designed to predict text readability. It achieves the following results on the evaluation set:

  • Loss: 0.0273
  • MSE: 0.0273

Dataset Description

The dataset used for training comprises approximately 800 000 paragraphs with corresponding readability metrics from four diverse sources:

  1. HuggingFace's Fineweb-Edu
  2. Ronen Eldan's TinyStories
  3. Wikipedia-2023-11-embed-multilingual-v3 (English only)
  4. ArXiv Abstracts-2021
  • Text Length: 50 to 2000 characters per paragraph
  • Readability Grade: Median of six readability metrics (Flesch-Kincaid, Gunning Fog, SMOG, Automated Readability Index, Coleman-Liau, Linsear Write)

Data Transformation

  • U.S. reading grade levels were transformed using the Box-Cox method (λ = 0.8766912)
  • Standardization and scale inversion were applied to generate 'readability' scores
  • Higher scores indicate easier readability

Transformation Statistics

  • λ (lambda) = 0.8766912
  • Mean (before standardization) = 7.908629
  • Standard deviation (before standardization) = 3.339119

Usage Example

import torch
import numpy as np
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model and tokenizer
model_name = "agentlans/deberta-v3-xsmall-zyda-2-readability"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Prediction function
def predict_score(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
    with torch.no_grad():
        logits = model(**inputs).logits
    return logits.item()

# Grade level conversion function
def grade_level(y):
    lambda_, mean, sd = 0.8766912, 7.908629, 3.339119
    y_unstd = (-y) * sd + mean
    return np.power((y_unstd * lambda_ + 1), (1 / lambda_))

# Example
input_text = "The mitochondria is the powerhouse of the cell."
readability = predict_score(input_text)
grade = grade_level(readability)
print(f"Predicted score: {readability:.2f}\nGrade: {grade:.1f}")

Sample Outputs

Text Readability Grade
I like to eat apples. 2.21 1.6
The cat is on the mat. 2.17 1.7
Birds are singing in the trees. 2.05 2.1
The sun is shining brightly today. 1.95 2.5
She enjoys reading books in her free time. 1.84 2.9
The quick brown fox jumps over the lazy dog. 1.75 3.2
After a long day at work, he finally relaxed with a cup of tea. 1.21 5.4
As the storm approached, the sky turned a deep shade of gray, casting an eerie shadow over the landscape. 0.54 8.2
Despite the challenges they faced, the team remained resolute in their pursuit of excellence and innovation. -0.52 13.0
In a world increasingly dominated by technology, the delicate balance between human connection and digital interaction has become a focal point of contemporary discourse. -1.91 19.5

Training Procedure

Hyperparameters

  • Learning rate: 5e-05
  • Train batch size: 64
  • Eval batch size: 8
  • Seed: 42
  • Optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-08)
  • LR scheduler: Linear
  • Number of epochs: 3.0

Training Results

Training Loss Epoch Step Validation Loss MSE
0.0297 1.0 13589 0.0302 0.0302
0.0249 2.0 27178 0.0279 0.0279
0.0218 3.0 40767 0.0273 0.0273

Framework Versions

  • Transformers: 4.46.3
  • PyTorch: 2.5.1+cu124
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3
Downloads last month
40
Safetensors
Model size
70.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for agentlans/deberta-v3-xsmall-zyda-2-readability

Finetuned
(3)
this model