hdallatorre
commited on
Commit
•
870bbf9
1
Parent(s):
4260bc0
Update README.md
Browse files
README.md
CHANGED
@@ -68,6 +68,8 @@ probabilities = torch.nn.functional.softmax(logits, dim=-1)
|
|
68 |
## Training data
|
69 |
|
70 |
The **segment-nt-30kb** model was trained on all human chromosomes except for chromosomes 20 and 21, kept as test set, and chromosome 22, used as a validation set.
|
|
|
|
|
71 |
|
72 |
## Training procedure
|
73 |
|
@@ -81,7 +83,7 @@ The DNA sequences are tokenized using the Nucleotide Transformer Tokenizer, whic
|
|
81 |
|
82 |
### Training
|
83 |
|
84 |
-
The model was trained on a DGXH100 on a total of 23B tokens. The model was trained on 3kb, 10kb, 20kb and finally 30kb sequences, at each time with an effective batch size of 256 sequences.
|
85 |
|
86 |
|
87 |
### Architecture
|
|
|
68 |
## Training data
|
69 |
|
70 |
The **segment-nt-30kb** model was trained on all human chromosomes except for chromosomes 20 and 21, kept as test set, and chromosome 22, used as a validation set.
|
71 |
+
During training, sequences are randomly sampled in the genome with associated annotations. However, we keep the sequences in the validation and test set fixed by
|
72 |
+
using a sliding window of length 30,000 over the chromosomes 20 and 21. The validation set was used to monitor training and for early stopping.
|
73 |
|
74 |
## Training procedure
|
75 |
|
|
|
83 |
|
84 |
### Training
|
85 |
|
86 |
+
The model was trained on a DGXH100 node with 8 GPUs on a total of 23B tokens for 3 days. The model was trained on 3kb, 10kb, 20kb and finally 30kb sequences, at each time with an effective batch size of 256 sequences.
|
87 |
|
88 |
|
89 |
### Architecture
|