dotan1111 commited on
Commit
8716dc4
1 Parent(s): 628cace

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +52 -0
  2. tokenizer.json +830 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - biology
4
+ - bioinformatics
5
+ - tokenizers
6
+ ---
7
+ # Effect of Tokenization on Transformers for Biological Sequences
8
+ ## Abstract:
9
+ Deep learning models are transforming biological research. Many bioinformatics and comparative genomics algorithms analyze genomic data, either DNA or protein sequences. Examples include sequence alignments, phylogenetic tree inference and automatic classification of protein functions. Among these deep learning algorithms, models for processing natural languages, developed in the natural language processing (NLP) community, were recently applied to biological sequences. However, biological sequences are different than natural languages, such as English, and French, in which segmentation of the text to separate words is relatively straightforward. Moreover, biological sequences are characterized by extremely long sentences, which hamper their processing by current machine-learning models, notably the transformer architecture. In NLP, one of the first processing steps is to transform the raw text to a list of tokens. Deep-learning applications to biological sequence data mostly segment proteins and DNA to single characters. In this work, we study the effect of alternative tokenization algorithms on eight different tasks in biology, from predicting the function of proteins and their stability, through nucleotide sequence alignment, to classifying proteins to specific families. We demonstrate that applying alternative tokenization algorithms can increase accuracy and at the same time, substantially reduce the input length compared to the trivial tokenizer in which each character is a token. Furthermore, applying these tokenization algorithms allows interpreting trained models, taking into account dependencies among positions. Finally, we trained these tokenizers on a large dataset of protein sequences containing more than 400 billion amino acids, which resulted in over a three-fold decrease in the number of tokens. We then tested these tokenizers trained on large-scale data on the above specific tasks and showed that for some tasks it is highly beneficial to train database-specific tokenizers. Our study suggests that tokenizers are likely to be a critical component in future deep-network analysis of biological sequence data.
10
+
11
+ ![image](https://github.com/idotan286/BiologicalTokenizers/assets/58917533/d69893e2-7114-41a8-8d46-9b025b2d2840)
12
+
13
+ Different tokenization algorithms can be applied to biological sequences, as exemplified for the sequence “AAGTCAAGGATC”. (a) The baseline “words” tokenizer assumes a dictionary consisting of the nucleotides: “A”, “C”, “G” and “T”. The length of the encoded sequence is 12, i.e., the number of nucleotides; (b) The “pairs” tokenizer assumes a dictionary consisting of all possible nucleotide pairs. The length of the encoded sequences is typically halved; (c) A sophisticated dictionary consisting of only three tokens: “AAG”, “TC” and “GA”. The encoded sequence for this dictionary contains only five tokens.
14
+
15
+ ## Data:
16
+ The "data" folder contains the train, valid and test data of seven of the eight datasets used in the paper.
17
+
18
+ ## BFD Tokenizers:
19
+
20
+ We trained BPE, WordPiece and Unigram tokenizers on samples of proteins from the 2.2 billion protein sequences of the BFD dataset (Steinegger and Söding 2018). We evaluate the average sequences length as a function of the vocabulary size and number of sequences in the training data.
21
+
22
+ ![BFD_BPE_table](https://github.com/idotan286/BiologicalTokenizers/assets/58917533/710b7aa7-0dde-46bb-9ddf-39a84b579d71)
23
+ ![BFD_WPC_table](https://github.com/idotan286/BiologicalTokenizers/assets/58917533/8adfe5a7-25f5-4723-a87a-8598c6a76ff6)
24
+ ![BFD_UNI_table](https://github.com/idotan286/BiologicalTokenizers/assets/58917533/4462e782-0b21-4377-a5fe-309685141538)
25
+
26
+ Effect of vocabulary size and number of training samples on the three tokenizers: BPE, WordPiece and Unigram. The darker the color the higher the average number of tokens per protein. Increasing the vocabulary and the training size reduces the number of tokens per protein for all of the tested tokenizers.
27
+
28
+ We uploaded the "BFD_Tokenizers" which been trained on 10,000,000 sequences randomly sampled from the BFD datasset.
29
+
30
+ ## Github
31
+
32
+ The code, datasets and trained tokenizers are available on https://github.com/idotan286/BiologicalTokenizers/.
33
+
34
+ ## APA
35
+
36
+ ```
37
+ Dotan, E., Jaschek, G., Pupko, T., & Belinkov, Y. (2023). Effect of Tokenization on Transformers for Biological Sequences. bioRxiv. https://doi.org/10.1101/2023.08.15.553415
38
+ ```
39
+
40
+
41
+ ## BibTeX
42
+ ```
43
+ @article{Dotan_Effect_of_Tokenization_2023,
44
+ author = {Dotan, Edo and Jaschek, Gal and Pupko, Tal and Belinkov, Yonatan},
45
+ doi = {10.1101/2023.08.15.553415},
46
+ journal = {bioRxiv},
47
+ month = aug,
48
+ title = {{Effect of Tokenization on Transformers for Biological Sequences}},
49
+ year = {2023}
50
+ }
51
+
52
+ ```
tokenizer.json ADDED
@@ -0,0 +1,830 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "<UNK>",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ }
15
+ ],
16
+ "normalizer": {
17
+ "type": "Lowercase"
18
+ },
19
+ "pre_tokenizer": {
20
+ "type": "Whitespace"
21
+ },
22
+ "post_processor": null,
23
+ "decoder": null,
24
+ "model": {
25
+ "type": "Unigram",
26
+ "unk_id": 0,
27
+ "vocab": [
28
+ [
29
+ "<UNK>",
30
+ 0.0
31
+ ],
32
+ [
33
+ "y",
34
+ -3.4247079552329573
35
+ ],
36
+ [
37
+ "h",
38
+ -3.4912839749476063
39
+ ],
40
+ [
41
+ "a",
42
+ -3.4928119486871907
43
+ ],
44
+ [
45
+ "q",
46
+ -3.54294177673021
47
+ ],
48
+ [
49
+ "m",
50
+ -3.5727061235339868
51
+ ],
52
+ [
53
+ "n",
54
+ -3.6861399265992674
55
+ ],
56
+ [
57
+ "l",
58
+ -3.6902237446793507
59
+ ],
60
+ [
61
+ "r",
62
+ -3.709010589966521
63
+ ],
64
+ [
65
+ "f",
66
+ -3.764879814809312
67
+ ],
68
+ [
69
+ "v",
70
+ -3.814862110364423
71
+ ],
72
+ [
73
+ "g",
74
+ -3.8544927179040798
75
+ ],
76
+ [
77
+ "p",
78
+ -3.879554414552608
79
+ ],
80
+ [
81
+ "d",
82
+ -3.9147195920056497
83
+ ],
84
+ [
85
+ "t",
86
+ -3.9218713965741117
87
+ ],
88
+ [
89
+ "w",
90
+ -3.9222956068813346
91
+ ],
92
+ [
93
+ "s",
94
+ -3.9578207791967905
95
+ ],
96
+ [
97
+ "e",
98
+ -3.9672879466924833
99
+ ],
100
+ [
101
+ "c",
102
+ -3.9846198307959853
103
+ ],
104
+ [
105
+ "k",
106
+ -3.9851137367724085
107
+ ],
108
+ [
109
+ "i",
110
+ -4.193013703452998
111
+ ],
112
+ [
113
+ "aa",
114
+ -4.436907022724377
115
+ ],
116
+ [
117
+ "la",
118
+ -4.763564440735472
119
+ ],
120
+ [
121
+ "al",
122
+ -4.858898617206826
123
+ ],
124
+ [
125
+ "ag",
126
+ -4.871741733505836
127
+ ],
128
+ [
129
+ "ll",
130
+ -4.899529457955385
131
+ ],
132
+ [
133
+ "rr",
134
+ -4.9773160337094495
135
+ ],
136
+ [
137
+ "gg",
138
+ -5.014989277014378
139
+ ],
140
+ [
141
+ "va",
142
+ -5.109713869003169
143
+ ],
144
+ [
145
+ "ar",
146
+ -5.138860598222161
147
+ ],
148
+ [
149
+ "av",
150
+ -5.139874127757343
151
+ ],
152
+ [
153
+ "rl",
154
+ -5.176546293029192
155
+ ],
156
+ [
157
+ "ga",
158
+ -5.180699752082347
159
+ ],
160
+ [
161
+ "ra",
162
+ -5.2000038343828034
163
+ ],
164
+ [
165
+ "lg",
166
+ -5.2027465134290285
167
+ ],
168
+ [
169
+ "lv",
170
+ -5.256125973713518
171
+ ],
172
+ [
173
+ "vl",
174
+ -5.271594558844297
175
+ ],
176
+ [
177
+ "gl",
178
+ -5.30076007905488
179
+ ],
180
+ [
181
+ "vv",
182
+ -5.317697085475906
183
+ ],
184
+ [
185
+ "lr",
186
+ -5.322075731145462
187
+ ],
188
+ [
189
+ "pa",
190
+ -5.355656845053176
191
+ ],
192
+ [
193
+ "gr",
194
+ -5.375270652971334
195
+ ],
196
+ [
197
+ "el",
198
+ -5.418090097414771
199
+ ],
200
+ [
201
+ "sg",
202
+ -5.430395962674904
203
+ ],
204
+ [
205
+ "gv",
206
+ -5.436901106950042
207
+ ],
208
+ [
209
+ "ls",
210
+ -5.4458744055610495
211
+ ],
212
+ [
213
+ "ss",
214
+ -5.448111897232318
215
+ ],
216
+ [
217
+ "ae",
218
+ -5.461435663548462
219
+ ],
220
+ [
221
+ "lp",
222
+ -5.475994138455475
223
+ ],
224
+ [
225
+ "as",
226
+ -5.476134811820398
227
+ ],
228
+ [
229
+ "ea",
230
+ -5.485108044375101
231
+ ],
232
+ [
233
+ "sa",
234
+ -5.500928466436452
235
+ ],
236
+ [
237
+ "ld",
238
+ -5.508944898925408
239
+ ],
240
+ [
241
+ "sl",
242
+ -5.514725532043448
243
+ ],
244
+ [
245
+ "vg",
246
+ -5.518212225765366
247
+ ],
248
+ [
249
+ "dl",
250
+ -5.523790559991012
251
+ ],
252
+ [
253
+ "ad",
254
+ -5.529423509798146
255
+ ],
256
+ [
257
+ "rg",
258
+ -5.535085863481557
259
+ ],
260
+ [
261
+ "da",
262
+ -5.536839902357814
263
+ ],
264
+ [
265
+ "er",
266
+ -5.53952607664727
267
+ ],
268
+ [
269
+ "at",
270
+ -5.5447440770682555
271
+ ],
272
+ [
273
+ "ia",
274
+ -5.562486066774282
275
+ ],
276
+ [
277
+ "ta",
278
+ -5.562710078562095
279
+ ],
280
+ [
281
+ "ap",
282
+ -5.576109295337275
283
+ ],
284
+ [
285
+ "pg",
286
+ -5.5889310432053705
287
+ ],
288
+ [
289
+ "gs",
290
+ -5.613078970833071
291
+ ],
292
+ [
293
+ "lt",
294
+ -5.6256524683818725
295
+ ],
296
+ [
297
+ "tl",
298
+ -5.625679592438182
299
+ ],
300
+ [
301
+ "dg",
302
+ -5.6321727208112105
303
+ ],
304
+ [
305
+ "tg",
306
+ -5.637149672051338
307
+ ],
308
+ [
309
+ "rs",
310
+ -5.642993846504739
311
+ ],
312
+ [
313
+ "le",
314
+ -5.646009973570775
315
+ ],
316
+ [
317
+ "rv",
318
+ -5.65593574252734
319
+ ],
320
+ [
321
+ "rp",
322
+ -5.690244716736588
323
+ ],
324
+ [
325
+ "pp",
326
+ -5.699015023501083
327
+ ],
328
+ [
329
+ "vr",
330
+ -5.713330922855734
331
+ ],
332
+ [
333
+ "gt",
334
+ -5.724597713519122
335
+ ],
336
+ [
337
+ "x",
338
+ -5.749277420153369
339
+ ],
340
+ [
341
+ "vt",
342
+ -5.773882873295735
343
+ ],
344
+ [
345
+ "gd",
346
+ -5.777222073452883
347
+ ],
348
+ [
349
+ "ve",
350
+ -5.786282370291476
351
+ ],
352
+ [
353
+ "ai",
354
+ -5.792878037369006
355
+ ],
356
+ [
357
+ "vd",
358
+ -5.794675365344499
359
+ ],
360
+ [
361
+ "re",
362
+ -5.79541316342204
363
+ ],
364
+ [
365
+ "ee",
366
+ -5.795741716638281
367
+ ],
368
+ [
369
+ "vs",
370
+ -5.7985576242235215
371
+ ],
372
+ [
373
+ "dv",
374
+ -5.803070571126753
375
+ ],
376
+ [
377
+ "tv",
378
+ -5.8066339742199276
379
+ ],
380
+ [
381
+ "ge",
382
+ -5.816848413000429
383
+ ],
384
+ [
385
+ "pl",
386
+ -5.818667195662142
387
+ ],
388
+ [
389
+ "ig",
390
+ -5.8351234290286556
391
+ ],
392
+ [
393
+ "pv",
394
+ -5.85521794340395
395
+ ],
396
+ [
397
+ "sp",
398
+ -5.856237812350054
399
+ ],
400
+ [
401
+ "sr",
402
+ -5.863093598657972
403
+ ],
404
+ [
405
+ "dr",
406
+ -5.865626231550317
407
+ ],
408
+ [
409
+ "ev",
410
+ -5.867222604064999
411
+ ],
412
+ [
413
+ "sv",
414
+ -5.8821181233543385
415
+ ],
416
+ [
417
+ "tt",
418
+ -5.888419823057225
419
+ ],
420
+ [
421
+ "vp",
422
+ -5.893708098390535
423
+ ],
424
+ [
425
+ "rd",
426
+ -5.902907686889787
427
+ ],
428
+ [
429
+ "st",
430
+ -5.913895223643719
431
+ ],
432
+ [
433
+ "tp",
434
+ -5.925474310054826
435
+ ],
436
+ [
437
+ "ts",
438
+ -5.935259750541029
439
+ ],
440
+ [
441
+ "iv",
442
+ -5.936481760515951
443
+ ],
444
+ [
445
+ "ps",
446
+ -5.957155519695867
447
+ ],
448
+ [
449
+ "dp",
450
+ -5.959226050226507
451
+ ],
452
+ [
453
+ "gi",
454
+ -5.959289362288203
455
+ ],
456
+ [
457
+ "de",
458
+ -5.97130066143788
459
+ ],
460
+ [
461
+ "pr",
462
+ -5.979541473966986
463
+ ],
464
+ [
465
+ "li",
466
+ -5.995759213159367
467
+ ],
468
+ [
469
+ "il",
470
+ -6.010252230503628
471
+ ],
472
+ [
473
+ "kl",
474
+ -6.022115205105928
475
+ ],
476
+ [
477
+ "rt",
478
+ -6.032960790021843
479
+ ],
480
+ [
481
+ "eg",
482
+ -6.039348346316448
483
+ ],
484
+ [
485
+ "ei",
486
+ -6.040448664946604
487
+ ],
488
+ [
489
+ "ql",
490
+ -6.041916049879731
491
+ ],
492
+ [
493
+ "id",
494
+ -6.042274769013373
495
+ ],
496
+ [
497
+ "pd",
498
+ -6.045243783203977
499
+ ],
500
+ [
501
+ "gp",
502
+ -6.049733643791777
503
+ ],
504
+ [
505
+ "qa",
506
+ -6.050023019599072
507
+ ],
508
+ [
509
+ "dd",
510
+ -6.055774660433624
511
+ ],
512
+ [
513
+ "pe",
514
+ -6.084564384204274
515
+ ],
516
+ [
517
+ "ri",
518
+ -6.088601223547604
519
+ ],
520
+ [
521
+ "is",
522
+ -6.100326314999879
523
+ ],
524
+ [
525
+ "ie",
526
+ -6.109111961781512
527
+ ],
528
+ [
529
+ "ka",
530
+ -6.121224426031684
531
+ ],
532
+ [
533
+ "lf",
534
+ -6.131569198939003
535
+ ],
536
+ [
537
+ "kk",
538
+ -6.135328802275913
539
+ ],
540
+ [
541
+ "vi",
542
+ -6.145707458064743
543
+ ],
544
+ [
545
+ "fg",
546
+ -6.147380503177732
547
+ ],
548
+ [
549
+ "fl",
550
+ -6.152282406338413
551
+ ],
552
+ [
553
+ "fa",
554
+ -6.182936168452729
555
+ ],
556
+ [
557
+ "pt",
558
+ -6.183826012732906
559
+ ],
560
+ [
561
+ "qr",
562
+ -6.195343435788958
563
+ ],
564
+ [
565
+ "sd",
566
+ -6.207739044792362
567
+ ],
568
+ [
569
+ "af",
570
+ -6.233639395340617
571
+ ],
572
+ [
573
+ "et",
574
+ -6.235456716778566
575
+ ],
576
+ [
577
+ "gf",
578
+ -6.243328452420121
579
+ ],
580
+ [
581
+ "si",
582
+ -6.250022710702352
583
+ ],
584
+ [
585
+ "ek",
586
+ -6.2547582409569
587
+ ],
588
+ [
589
+ "tr",
590
+ -6.258601043179839
591
+ ],
592
+ [
593
+ "it",
594
+ -6.265827079759829
595
+ ],
596
+ [
597
+ "ti",
598
+ -6.2667690455532075
599
+ ],
600
+ [
601
+ "lk",
602
+ -6.266837815225498
603
+ ],
604
+ [
605
+ "gk",
606
+ -6.2791883478354045
607
+ ],
608
+ [
609
+ "aq",
610
+ -6.285028901209582
611
+ ],
612
+ [
613
+ "ep",
614
+ -6.294333027943258
615
+ ],
616
+ [
617
+ "ed",
618
+ -6.295524393472979
619
+ ],
620
+ [
621
+ "ke",
622
+ -6.328320481376119
623
+ ],
624
+ [
625
+ "nl",
626
+ -6.333964002663926
627
+ ],
628
+ [
629
+ "es",
630
+ -6.335631844356625
631
+ ],
632
+ [
633
+ "se",
634
+ -6.345943093920235
635
+ ],
636
+ [
637
+ "ak",
638
+ -6.3579406834980965
639
+ ],
640
+ [
641
+ "td",
642
+ -6.376912272540318
643
+ ],
644
+ [
645
+ "kt",
646
+ -6.381749192026097
647
+ ],
648
+ [
649
+ "kv",
650
+ -6.383192653367846
651
+ ],
652
+ [
653
+ "lq",
654
+ -6.386738517935324
655
+ ],
656
+ [
657
+ "fv",
658
+ -6.39360783803135
659
+ ],
660
+ [
661
+ "qv",
662
+ -6.396066490337569
663
+ ],
664
+ [
665
+ "sf",
666
+ -6.39910526635248
667
+ ],
668
+ [
669
+ "fs",
670
+ -6.40212185272803
671
+ ],
672
+ [
673
+ "rf",
674
+ -6.412047799425322
675
+ ],
676
+ [
677
+ "ks",
678
+ -6.4128110020819555
679
+ ],
680
+ [
681
+ "ng",
682
+ -6.421427777161963
683
+ ],
684
+ [
685
+ "ki",
686
+ -6.422202910211933
687
+ ],
688
+ [
689
+ "ip",
690
+ -6.430876724365621
691
+ ],
692
+ [
693
+ "rq",
694
+ -6.435834150272331
695
+ ],
696
+ [
697
+ "fd",
698
+ -6.446507064108266
699
+ ],
700
+ [
701
+ "ir",
702
+ -6.449152590874585
703
+ ],
704
+ [
705
+ "ln",
706
+ -6.453108969541907
707
+ ],
708
+ [
709
+ "kr",
710
+ -6.457387963642308
711
+ ],
712
+ [
713
+ "vf",
714
+ -6.463731630061847
715
+ ],
716
+ [
717
+ "eq",
718
+ -6.46524591662215
719
+ ],
720
+ [
721
+ "qp",
722
+ -6.480794924169258
723
+ ],
724
+ [
725
+ "di",
726
+ -6.509616735402286
727
+ ],
728
+ [
729
+ "kp",
730
+ -6.510059533681824
731
+ ],
732
+ [
733
+ "na",
734
+ -6.51587869796394
735
+ ],
736
+ [
737
+ "ds",
738
+ -6.523174036007418
739
+ ],
740
+ [
741
+ "kg",
742
+ -6.534881877135531
743
+ ],
744
+ [
745
+ "np",
746
+ -6.550937681060169
747
+ ],
748
+ [
749
+ "gn",
750
+ -6.566725084409789
751
+ ],
752
+ [
753
+ "gq",
754
+ -6.567082940116801
755
+ ],
756
+ [
757
+ "kd",
758
+ -6.568698331293467
759
+ ],
760
+ [
761
+ "qg",
762
+ -6.569785852284269
763
+ ],
764
+ [
765
+ "te",
766
+ -6.579979481653046
767
+ ],
768
+ [
769
+ "dt",
770
+ -6.582071618443225
771
+ ],
772
+ [
773
+ "rk",
774
+ -6.594085374063781
775
+ ],
776
+ [
777
+ "qs",
778
+ -6.606515808555127
779
+ ],
780
+ [
781
+ "hl",
782
+ -6.612070459132525
783
+ ],
784
+ [
785
+ "ii",
786
+ -6.622224788525211
787
+ ],
788
+ [
789
+ "ft",
790
+ -6.6381275899184775
791
+ ],
792
+ [
793
+ "nv",
794
+ -6.661901139647316
795
+ ],
796
+ [
797
+ "df",
798
+ -6.667420037978074
799
+ ],
800
+ [
801
+ "tf",
802
+ -6.676255399402093
803
+ ],
804
+ [
805
+ "pi",
806
+ -6.717126026714817
807
+ ],
808
+ [
809
+ "sn",
810
+ -6.723550604685874
811
+ ],
812
+ [
813
+ "u",
814
+ -17.611299956393843
815
+ ],
816
+ [
817
+ "b",
818
+ -17.811943459429884
819
+ ],
820
+ [
821
+ "z",
822
+ -19.052756709660283
823
+ ],
824
+ [
825
+ "o",
826
+ -20.50275670966017
827
+ ]
828
+ ]
829
+ }
830
+ }