deberta-v3-xsmall-zyda-2-transformed-readability-new
Model Overview
This model is a fine-tuned version of agentlans/deberta-v3-xsmall-zyda-2 designed to predict text readability. It achieves the following results on the evaluation set:
- Loss: 0.0273
- MSE: 0.0273
Dataset Description
The dataset used for training comprises approximately 800 000 paragraphs with corresponding readability metrics from four diverse sources:
- HuggingFace's Fineweb-Edu
- Ronen Eldan's TinyStories
- Wikipedia-2023-11-embed-multilingual-v3 (English only)
- ArXiv Abstracts-2021
- Text Length: 50 to 2000 characters per paragraph
- Readability Grade: Median of six readability metrics (Flesch-Kincaid, Gunning Fog, SMOG, Automated Readability Index, Coleman-Liau, Linsear Write)
Data Transformation
- U.S. reading grade levels were transformed using the Box-Cox method (λ = 0.8766912)
- Standardization and scale inversion were applied to generate 'readability' scores
- Higher scores indicate easier readability
Transformation Statistics
- λ (lambda) = 0.8766912
- Mean (before standardization) = 7.908629
- Standard deviation (before standardization) = 3.339119
Usage Example
import torch
import numpy as np
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model and tokenizer
model_name = "agentlans/deberta-v3-xsmall-zyda-2-readability"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Prediction function
def predict_score(text):
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
with torch.no_grad():
logits = model(**inputs).logits
return logits.item()
# Grade level conversion function
def grade_level(y):
lambda_, mean, sd = 0.8766912, 7.908629, 3.339119
y_unstd = (-y) * sd + mean
return np.power((y_unstd * lambda_ + 1), (1 / lambda_))
# Example
input_text = "The mitochondria is the powerhouse of the cell."
readability = predict_score(input_text)
grade = grade_level(readability)
print(f"Predicted score: {readability:.2f}\nGrade: {grade:.1f}")
Sample Outputs
Text | Readability | Grade |
---|---|---|
I like to eat apples. | 2.21 | 1.6 |
The cat is on the mat. | 2.17 | 1.7 |
Birds are singing in the trees. | 2.05 | 2.1 |
The sun is shining brightly today. | 1.95 | 2.5 |
She enjoys reading books in her free time. | 1.84 | 2.9 |
The quick brown fox jumps over the lazy dog. | 1.75 | 3.2 |
After a long day at work, he finally relaxed with a cup of tea. | 1.21 | 5.4 |
As the storm approached, the sky turned a deep shade of gray, casting an eerie shadow over the landscape. | 0.54 | 8.2 |
Despite the challenges they faced, the team remained resolute in their pursuit of excellence and innovation. | -0.52 | 13.0 |
In a world increasingly dominated by technology, the delicate balance between human connection and digital interaction has become a focal point of contemporary discourse. | -1.91 | 19.5 |
Training Procedure
Hyperparameters
- Learning rate: 5e-05
- Train batch size: 64
- Eval batch size: 8
- Seed: 42
- Optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-08)
- LR scheduler: Linear
- Number of epochs: 3.0
Training Results
Training Loss | Epoch | Step | Validation Loss | MSE |
---|---|---|---|---|
0.0297 | 1.0 | 13589 | 0.0302 | 0.0302 |
0.0249 | 2.0 | 27178 | 0.0279 | 0.0279 |
0.0218 | 3.0 | 40767 | 0.0273 | 0.0273 |
Framework Versions
- Transformers: 4.46.3
- PyTorch: 2.5.1+cu124
- Datasets: 3.1.0
- Tokenizers: 0.20.3
- Downloads last month
- 40
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for agentlans/deberta-v3-xsmall-zyda-2-readability
Base model
microsoft/deberta-v3-xsmall
Finetuned
agentlans/deberta-v3-xsmall-zyda-2