hdallatorre commited on
Commit
941645b
·
verified ·
1 Parent(s): aa82c0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -35,11 +35,12 @@ pip install --upgrade git+https://github.com/huggingface/transformers.git
35
 
36
  A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
37
 
 
38
  ⚠️ The maximum sequence length is set by default at the training length of 30,000 nucleotides, or 5001 tokens (accounting for the CLS token). However, Segment-NT has been shown to
39
  generalize up to sequences of 50,000 bp. In case you need to infer on sequences between 30kbp and 50kbp, make sure to change the `rescaling_factor` argument in the config
40
  to `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference (i.e 6669 for a sequence of 40008 base pairs) and
41
  `max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
42
-
43
  ```python
44
  # Load model and tokenizer
45
  from transformers import AutoTokenizer, AutoModel
 
35
 
36
  A small snippet of code is given here in order to retrieve both logits and embeddings from a dummy DNA sequence.
37
 
38
+ ```
39
  ⚠️ The maximum sequence length is set by default at the training length of 30,000 nucleotides, or 5001 tokens (accounting for the CLS token). However, Segment-NT has been shown to
40
  generalize up to sequences of 50,000 bp. In case you need to infer on sequences between 30kbp and 50kbp, make sure to change the `rescaling_factor` argument in the config
41
  to `num_dna_tokens_inference / max_num_tokens_nt` where `num_dna_tokens_inference` is the number of tokens at inference (i.e 6669 for a sequence of 40008 base pairs) and
42
  `max_num_tokens_nt` is the max number of tokens on which the backbone nucleotide-transformer was trained on, i.e `2048`.
43
+ ```
44
  ```python
45
  # Load model and tokenizer
46
  from transformers import AutoTokenizer, AutoModel