--- datasets: - EvaKlimentova/knots_AF license: apache-2.0 --- # M2 - small CNN trained on embeddings The model is trained on [ProtBert-BFD](https://huggingface.co/Rostlab/prot_bert_bfd) embeddings of [knots_AF dataset](https://huggingface.co/datasets/EvaKlimentova/knots_AF) to recognize between knotted and unknotted proteins based on their amino acid sequence. Accuracy on the test set: | | Dataset size | Unknotted set size | Accuracy | TPR | TNR | |:----------------------------:|:------------:|:------------------:|:--------:|:------:|:------:| | All | 39412 | 19718 | 0.9690 | 0.9569 | 0.9811 | | SPOUT | 7371 | 550 | 0.9712 | 0.9815 | 0.8436 | | TDD | 612 | 24 | 0.9673 | 0.9796 | 0.6667 | | DUF | 716 | 429 | 0.9413 | 0.8955 | 0.9720 | | AdoMet synthase | 1794 | 240 | 0.9727 | 0.9755 | 0.9542 | | Carbonic anhydrase | 1531 | 539 | 0.8870 | 0.8619 | 0.9332 | | UCH | 477 | 125 | 0.8700 | 0.8892 | 0.816 | | ATCase/OTCase | 3799 | 3352 | 0.9932 | 0.9418 | 1.0 | | ribosomal-mitochondrial | 147 | 41 | 0.8163 | 0.8319 | 0.7805 | | membrane | 8309 | 1577 | 0.9740 | 0.9857 | 0.9239 | | VIT | 14347 | 12639 | 0.9742 | 0.8214 | 0.9948 | | biosynthesis of lantibiotics | 392 | 286 | 0.9388 | 0.8019 | 0.9895 |