Phando
/

chemberta-v2-finetuned-uspto-50k-classification

Text Classification

Inference Endpoints

Model card Files Files and versions Community

chemberta-v2-finetuned-uspto-50k-classification / README.md

Phando's picture

Update README.md

2d86537 about 1 year ago

|

817 Bytes

	---
	datasets:
	- Phando/uspto-50k
	metrics:
	- accuracy
	pipeline_tag: text-classification
	tags:
	- chemistry
	license: mit
	---

	This [ChemBERTa-v2](https://huggingface.co/seyonec/ChemBERTa_zinc250k_v2_40k) checkpoint was fine-tuned on the [USPTO-50k](https://huggingface.co/datasets/Phando/uspto-50k) dataset for sequence classification.

	Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by ".").

	- Train/Test split: 0.99/0.01

	- Evaluation results:
	- Accuracy: 87.11%
	- Loss: 0.4272

	- Fine-tuning hyperparameters:
	- seed = 233
	- batch-size = 128
	- num_epochs = 5 (but early stopped at epoch 4)
	- learning_rate = 5e-4
	- warmup_steps = 64
	- weight_decay = 0.01
	- lr_scheduler_type = "cosine"