README.md · Phando/chemberta-v2-finetuned-uspto-50k-classification at 7208295c79306e3968b839bc61ce966730eb54d6

metadata

datasets:
  - Phando/uspto-50k
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - chemistry

This ChemBERTa-v2 checkpoint was fine-tuned on the USPTO-50k dataset for sequence classification.

Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by ".").

Train/Test split: 0.99/0.01
Evaluation results:
- Accuracy: 87.11%
- Loss: 0.4272
Fine-tuning hyperparameters:
- seed = 233
- batch-size = 128
- num_epochs = 5 (but early stopped at epoch 4)
- learning_rate = 5e-4
- warmup_steps = 64
- weight_decay = 0.01