File size: 4,916 Bytes
fca296e 8d5df03 fca296e 9a4096c fca296e 9a4096c f5ac947 8d5df03 f5ac947 eedca8b 8d5df03 f5ac947 fca296e f5ac947 fca296e 15cf7f2 fca296e 15cf7f2 fca296e 15cf7f2 e504da9 8d5df03 fca296e 15cf7f2 fca296e 8f8585a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
library_name: transformers
tags:
- CodonTransformer
- Computational Biology
- Machine Learning
- Bioinformatics
- Synthetic Biology
license: apache-2.0
pipeline_tag: token-classification
---

**CodonTransformer** is the ultimate tool for codon optimization, transforming protein sequences into optimized DNA sequences specific for your target organisms. Whether you are a researcher or a practitioner in genetic engineering, CodonTransformer provides a comprehensive suite of features to facilitate your work. By leveraging the Transformer architecture and a user-friendly Jupyter notebook, it reduces the complexity of codon optimization, saving you time and effort.
<br>
**This is the pretrained model, for best results please use the [finetuned model](https://huggingface.co/adibvafa/CodonTransformer)**.
## Authors
Adibvafa Fallahpour<sup>1,2</sup>\*, Vincent Gureghian<sup>3</sup>\*, Guillaume J. Filion<sup>2</sup>‡, Ariel B. Lindner<sup>3</sup>‡, Amir Pandi<sup>3</sup>‡
<sup>1</sup> Vector Institute for Artificial Intelligence, Toronto ON, Canada
<sup>2</sup> University of Toronto Scarborough; Department of Biological Science; Scarborough ON, Canada
<sup>3</sup> Université Paris Cité, INSERM U1284, Center for Research and Interdisciplinarity, F-75006 Paris, France
\* These authors contributed equally to this work.
‡ To whom correspondence should be addressed: <br>
guillaume.filion@utoronto.ca, ariel.lindner@inserm.fr, amir.pandi@cri-paris.org
<br>
## Use Case
**For a guide on finetuning CodonTransformer, check out our [GitHub.](https://github.com/Adibvafa/CodonTransformer/tree/main?tab=readme-ov-file#finetuning-codontransformer)**
<br>**For an interactive demo, check out our [Google Colab Notebook.](https://adibvafa.github.io/CodonTransformer/GoogleColab)**
<br></br>
After installing CodonTransformer, you can use:
```python
import torch
from transformers import AutoTokenizer, BigBirdForMaskedLM
from CodonTransformer.CodonPrediction import predict_dna_sequence
from CodonTransformer.CodonJupyter import format_model_output
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer")
model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer-base").to(device)
# Set your input data
protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG"
organism = "Escherichia coli general"
# Predict with CodonTransformer
output = predict_dna_sequence(
protein=protein,
organism=organism,
device=device,
tokenizer=tokenizer,
model=model,
attention_type="original_full",
deterministic=True
)
print(format_model_output(output))
```
The output is:
<br>
```python
-----------------------------
| Organism |
-----------------------------
Escherichia coli general
-----------------------------
| Input Protein |
-----------------------------
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG
-----------------------------
| Processed Input |
-----------------------------
M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK
-----------------------------
| Predicted DNA |
-----------------------------
ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA
```
## Additional Resources
- **Project Website** <br>
https://adibvafa.github.io/CodonTransformer/
- **GitHub Repository** <br>
https://github.com/Adibvafa/CodonTransformer
- **Google Colab Demo** <br>
https://adibvafa.github.io/CodonTransformer/GoogleColab
- **PyPI Package** <br>
https://pypi.org/project/CodonTransformer/
- **Paper** <br>
https://www.nature.com/articles/s41467-025-58588-7
## Citation
```
@article{Fallahpour_Gureghian_Filion_Lindner_Pandi_2025,
title={CodonTransformer: a multispecies codon optimizer using context-aware neural networks},
volume={16},
ISSN={2041-1723},
url={https://www.nature.com/articles/s41467-025-58588-7},
DOI={10.1038/s41467-025-58588-7},
number={1},
journal={Nature Communications},
author={Fallahpour, Adibvafa and Gureghian, Vincent and Filion, Guillaume J. and Lindner, Ariel B. and Pandi, Amir},
year={2025},
month=apr,
pages={3205},
language={en}
}
``` |