NaturalGradient commited on
Commit
3d6919f
1 Parent(s): 8b0ebf4

Update README.md with appropriate references

Browse files
Files changed (1) hide show
  1. README.md +36 -17
README.md CHANGED
@@ -1,17 +1,36 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
4
-
5
- # Protein Sequence Modelling with Bayesian Flow Networks
6
-
7
- Welcome to the model weights for the paper "Protein Sequence Modelling with Bayesian Flow Networks". Using the [code on our GitHub page](https://github.com/instadeepai/protein-sequence-bfn), you can sample from our trained models ProtBFN, for general proteins, and AbBFN, for antibody VH chains.
8
-
9
- [Bayesian Flow Networks](https://arxiv.org/abs/2308.07037) are a new approach to generative modelling, and can be viewed as an extension of diffusion models to the parameter space of probability distributions. They define a continuous-time process that maps between a naive prior distribution and a psuedo-deterministic posterior distribution for each variable independently. By training our neural network to 'denoise' the current posterior, by taking into account mutual information between variables, we implicitly minimise a variational lower bound. We can then use our trained neural network to generate samples from the learned distribution.
10
-
11
- One of the benefits of defining such a process in probability parameter space is that it can be applied to *any* family of distributions with continous-valued parameters. This means that BFNs can be directly applied to discrete data, allowing for diffusion-like generative modelling for sequences without restrictive left-to-right inductive biases or relying on discrete-time stochastic processes. The main focus of our work is to investigate the application of BFNs to *protein sequences*, as represented by a sequence of amino acids. The ProtBFN methodology is broadly summarised below:
12
-
13
- ![An overview of ProtBFN.](BFN_overview.png)
14
-
15
- Having trained ProtBFN, we find that it is exceptionally performant at unconditional generation of de novo protein sequences. For example, we find that we are able to rediscover a variety of structural motifs, according to structures predicted by ESMFold, with high sequence novelty:
16
-
17
- ![Cath hits for ProtBFN.](cath_s40_proteins.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ ---
4
+
5
+ # Protein Sequence Modelling with Bayesian Flow Networks
6
+
7
+ Welcome to the model weights for the paper ["Protein Sequence Modelling with Bayesian Flow Networks"](https://www.biorxiv.org/content/10.1101/2024.09.24.614734v1). Using the [code on our GitHub page](https://github.com/instadeepai/protein-sequence-bfn), you can sample from our trained models ProtBFN, for general proteins, and AbBFN, for antibody VH chains.
8
+
9
+ [Bayesian Flow Networks](https://arxiv.org/abs/2308.07037) are a new approach to generative modelling, and can be viewed as an extension of diffusion models to the parameter space of probability distributions. They define a continuous-time process that maps between a naive prior distribution and a psuedo-deterministic posterior distribution for each variable independently. By training our neural network to 'denoise' the current posterior, by taking into account mutual information between variables, we implicitly minimise a variational lower bound. We can then use our trained neural network to generate samples from the learned distribution.
10
+
11
+ One of the benefits of defining such a process in probability parameter space is that it can be applied to *any* family of distributions with continous-valued parameters. This means that BFNs can be directly applied to discrete data, allowing for diffusion-like generative modelling for sequences without restrictive left-to-right inductive biases or relying on discrete-time stochastic processes. The main focus of our work is to investigate the application of BFNs to *protein sequences*, as represented by a sequence of amino acids. The ProtBFN methodology is broadly summarised below:
12
+
13
+ ![An overview of ProtBFN.](BFN_overview.png)
14
+
15
+ Having trained ProtBFN, we find that it is exceptionally performant at unconditional generation of de novo protein sequences. For example, we find that we are able to rediscover a variety of structural motifs, according to structures predicted by ESMFold, with high sequence novelty:
16
+
17
+ ![Cath hits for ProtBFN.](cath_s40_proteins.png)
18
+
19
+
20
+ ## Cite our work
21
+
22
+ If you have used ProtBFN or AbBFN in your work, you can cite us using the following bibtex entry:
23
+
24
+ ```text
25
+ @article {Atkinson2024.09.24.614734,
26
+ author = {Atkinson, Timothy and Barrett, Thomas D. and Cameron, Scott and Guloglu, Bora and Greenig, Matthew and Robinson, Louis and Graves, Alex and Copoiu, Liviu and Laterre, Alexandre},
27
+ title = {Protein Sequence Modelling with Bayesian Flow Networks},
28
+ elocation-id = {2024.09.24.614734},
29
+ year = {2024},
30
+ doi = {10.1101/2024.09.24.614734},
31
+ publisher = {Cold Spring Harbor Laboratory},
32
+ URL = {https://www.biorxiv.org/content/early/2024/09/26/2024.09.24.614734},
33
+ eprint = {https://www.biorxiv.org/content/early/2024/09/26/2024.09.24.614734.full.pdf},
34
+ journal = {bioRxiv}
35
+ }
36
+ ```