davidmezzetti commited on
Commit
d5d82df
·
1 Parent(s): d4d4888

Initial version

Browse files
Files changed (4) hide show
  1. README.md +237 -0
  2. config.json +1 -0
  3. model.safetensors +3 -0
  4. tokenizer.json +1715 -0
README.md ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: sentence-similarity
3
+ tags:
4
+ - sentence-transformers
5
+ - feature-extraction
6
+ - sentence-similarity
7
+ - transformers
8
+ - embeddings
9
+ - static-embeddings
10
+ language: en
11
+ license: apache-2.0
12
+ ---
13
+
14
+ # PubMedBERT Embeddings 100K
15
+
16
+ This is a pruned version of [PubMedBERT Embeddings 2M](https://huggingface.co/NeuML/pubmedbert-base-embeddings-2M). It prunes the vocabulary to take the top 5% most frequently used tokens.
17
+
18
+ See [Extremely Small BERT Models from Mixed-Vocabulary Training](https://arxiv.org/abs/1909.11687) for background on pruning vocabularies to build smaller models.
19
+
20
+ ## Usage (txtai)
21
+
22
+ This model can be used to build embeddings databases with [txtai](https://github.com/neuml/txtai) for semantic search and/or as a knowledge source for retrieval augmented generation (RAG).
23
+
24
+ ```python
25
+ import txtai
26
+
27
+ # Create embeddings
28
+ embeddings = txtai.Embeddings(
29
+ path="neuml/pubmedbert-base-embeddings-100K",
30
+ content=True,
31
+ )
32
+ embeddings.index(documents())
33
+
34
+ # Run a query
35
+ embeddings.search("query to run")
36
+ ```
37
+
38
+ ## Usage (Sentence-Transformers)
39
+
40
+ Alternatively, the model can be loaded with [sentence-transformers](https://www.SBERT.net).
41
+
42
+ ```python
43
+ from sentence_transformers import SentenceTransformer
44
+ from sentence_transformers.models import StaticEmbedding
45
+
46
+ # Initialize a StaticEmbedding module
47
+ static = StaticEmbedding.from_model2vec("neuml/pubmedbert-base-embeddings-100K")
48
+ model = SentenceTransformer(modules=[static])
49
+
50
+ sentences = ["This is an example sentence", "Each sentence is converted"]
51
+ embeddings = model.encode(sentences)
52
+ print(embeddings)
53
+ ```
54
+
55
+ ## Usage (Model2Vec)
56
+
57
+ The model can also be used directly with Model2Vec.
58
+
59
+ ```python
60
+ from model2vec import StaticModel
61
+
62
+ # Load a pretrained Model2Vec model
63
+ model = StaticModel.from_pretrained("neuml/pubmedbert-base-embeddings-100K")
64
+
65
+ # Compute text embeddings
66
+ sentences = ["This is an example sentence", "Each sentence is converted"]
67
+ embeddings = model.encode(sentences)
68
+ print(embeddings)
69
+ ```
70
+
71
+ ## Evaluation Results
72
+
73
+ The following compares performance of this model against the models previously compared with [PubMedBERT Embeddings](https://huggingface.co/NeuML/pubmedbert-base-embeddings#evaluation-results). The following datasets were used to evaluate model performance.
74
+
75
+ - [PubMed QA](https://huggingface.co/datasets/pubmed_qa)
76
+ - Subset: pqa_labeled, Split: train, Pair: (question, long_answer)
77
+ - [PubMed Subset](https://huggingface.co/datasets/awinml/pubmed_abstract_3_1k)
78
+ - Split: test, Pair: (title, text)
79
+ - _Note: The previously used [PubMed Subset](https://huggingface.co/datasets/zxvix/pubmed_subset_new) dataset is no longer available but a similar dataset is used here_
80
+ - [PubMed Summary](https://huggingface.co/datasets/scientific_papers)
81
+ - Subset: pubmed, Split: validation, Pair: (article, abstract)
82
+
83
+ The [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) is used as the evaluation metric.
84
+
85
+ | Model | PubMed QA | PubMed Subset | PubMed Summary | Average |
86
+ | -------------------------------------------------------------------------------------- | --------- | ------------- | -------------- | --------- |
87
+ | pubmedbert-base-embeddings-8M-M2V (No training) | 69.84 | 70.77 | 71.30 | 70.64 |
88
+ | [**pubmedbert-base-embeddings-100K**](https://hf.co/neuml/pubmedbert-base-embeddings-100K) | **74.56** | **84.65** | **81.84** | **80.35** |
89
+ | [pubmedbert-base-embeddings-500K](https://hf.co/neuml/pubmedbert-base-embeddings-500K) | 86.03 | 91.71 | 91.25 | 89.66 |
90
+ | [pubmedbert-base-embeddings-1M](https://hf.co/neuml/pubmedbert-base-embeddings-1M) | 87.87 | 92.80 | 92.87 | 91.18 |
91
+ | [pubmedbert-base-embeddings-2M](https://hf.co/neuml/pubmedbert-base-embeddings-2M) | 88.62 | 93.08 | 93.24 | 91.65 |
92
+
93
+ It's quite a steep dropoff in accuracy compared the original unpruned model. Although this model still scores higher than the naive distilled version without training
94
+
95
+ ## Runtime performance
96
+
97
+ As another test, let's see how long each model takes to index 120K article abstracts using the following code. All indexing is done with a RTX 3090 GPU.
98
+
99
+ ```python
100
+ from datasets import load_dataset
101
+ from tqdm import tqdm
102
+ from txtai import Embeddings
103
+
104
+ ds = load_dataset("ccdv/pubmed-summarization", split="train")
105
+
106
+ embeddings = Embeddings(path="path to model", content=True, backend="numpy")
107
+ embeddings.index(tqdm(ds["abstract"]))
108
+ ```
109
+
110
+ | Model | Model Size (MB) | Index time (s) |
111
+ | -------------------------------------------------------------------------------------- | ---------- | -------------- |
112
+ | [**pubmedbert-base-embeddings-100K**](https://hf.co/neuml/pubmedbert-base-embeddings-100K) | **0.2** | **19** |
113
+ | [pubmedbert-base-embeddings-500K](https://hf.co/neuml/pubmedbert-base-embeddings-500K) | 1.0 | 17 |
114
+ | [pubmedbert-base-embeddings-1M](https://hf.co/neuml/pubmedbert-base-embeddings-1M) | 2.0 | 17 |
115
+ | [pubmedbert-base-embeddings-2M](https://hf.co/neuml/pubmedbert-base-embeddings-2M) | 7.5 | 17 |
116
+
117
+ Vocabulary pruning leads to a slighly higher runtime. This is attributed to the fact that more tokens are needed to represent text. But the model is much smaller. Vectors are stored at `int16` precision. This can be beneficial to smaller/lower powered embedded devices and could lead to faster vectorization times.
118
+
119
+ ## Training
120
+
121
+ This model was vocabulary pruned using the following script.
122
+
123
+ ```python
124
+ import json
125
+ import os
126
+
127
+ from collections import Counter
128
+ from pathlib import Path
129
+
130
+ import numpy as np
131
+
132
+ from model2vec import StaticModel
133
+ from more_itertools import batched
134
+ from sklearn.decomposition import PCA
135
+ from tokenlearn.train import collect_means_and_texts
136
+ from tokenizers import Tokenizer
137
+ from tqdm import tqdm
138
+ from txtai.scoring import ScoringFactory
139
+
140
+ def tokenize(tokenizer):
141
+ # Tokenize into dataset
142
+ dataset = []
143
+ for t in tqdm(batched(texts, 1024)):
144
+ encodings = tokenizer.encode_batch_fast(t, add_special_tokens=False)
145
+ for e in encodings:
146
+ dataset.append((None, e.ids, None))
147
+
148
+ return dataset
149
+
150
+ def tokenweights(tokenizer):
151
+ dataset = tokenize(tokenizer)
152
+
153
+ # Build scoring index
154
+ scoring = ScoringFactory.create({"method": "bm25", "terms": True})
155
+ scoring.index(dataset)
156
+
157
+ # Calculate mean value of weights array per token
158
+ tokens = np.zeros(tokenizer.get_vocab_size())
159
+ for x in scoring.idf:
160
+ tokens[x] = np.mean(scoring.terms.weights(x)[1])
161
+
162
+ return tokens
163
+
164
+ # See PubMedBERT Embeddings 2M model for details on this data
165
+ features = "features"
166
+ paths = sorted(Path(features).glob("*.json"))
167
+ texts, _ = collect_means_and_texts(paths)
168
+
169
+ # Output model parameters
170
+ output = "output path"
171
+ params, dims = 100000, 64
172
+
173
+ path = "pubmedbert-base-embeddings-2M_unweighted"
174
+ model = StaticModel.from_pretrained(path)
175
+
176
+ os.makedirs(output, exist_ok=True)
177
+
178
+ with open(f"{path}/tokenizer.json", "r", encoding="utf-8") as f:
179
+ config = json.load(f)
180
+
181
+ # Calculate number of tokens to keep
182
+ tokencount = params // model.dim
183
+
184
+ # Calculate term frequency
185
+ freqs = Counter()
186
+ for _, ids, _ in tokenize(model.tokenizer):
187
+ freqs.update(ids)
188
+
189
+ # Select top N most common tokens
190
+ uids = set(x for x, _ in freqs.most_common(tokencount))
191
+ uids = [uid for token, uid in config["model"]["vocab"].items() if uid in uids or token.startswith("[")]
192
+
193
+ # Get embeddings for uids
194
+ model.embedding = model.embedding[uids]
195
+
196
+ # Select pruned tokens
197
+ pairs, index = [], 0
198
+ for token, uid in config["model"]["vocab"].items():
199
+ if uid in uids:
200
+ pairs.append((token, index))
201
+ index += 1
202
+
203
+ config["model"]["vocab"] = dict(pairs)
204
+
205
+ # Write new tokenizer
206
+ with open(f"{output}/tokenizer.json", "w", encoding="utf-8") as f:
207
+ json.dump(config, f, indent=2)
208
+
209
+ model.tokenizer = Tokenizer.from_file(f"{output}/tokenizer.json")
210
+
211
+ # Re-weight tokens
212
+ weights = tokenweights(model.tokenizer)
213
+
214
+ # Remove NaNs from embedding, if any
215
+ embedding = np.nan_to_num(model.embedding)
216
+
217
+ # Apply PCA
218
+ embedding = PCA(n_components=dims).fit_transform(embedding)
219
+
220
+ # Apply weights
221
+ embedding *= weights[:, None]
222
+
223
+ # Update model embedding and normalize
224
+ model.embedding, model.normalize = embedding.astype(np.int16), True
225
+
226
+ model.save_pretrained(output)
227
+ ```
228
+
229
+ ## Acknowledgement
230
+
231
+ This model is built on the great work from the [Minish Lab](https://github.com/MinishLab) team consisting of [Stephan Tulkens](https://github.com/stephantul) and [Thomas van Dongen](https://github.com/Pringled).
232
+
233
+ Read more at the following links.
234
+
235
+ - [Model2Vec](https://github.com/MinishLab/model2vec)
236
+ - [Tokenlearn](https://github.com/MinishLab/tokenlearn)
237
+ - [Minish Lab Blog](https://minishlab.github.io/)
config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"model_type": "model2vec", "architectures": ["StaticModel"], "tokenizer_name": "neuml/pubmedbert-base-embeddings", "apply_pca": 64, "apply_zipf": true, "hidden_dim": 64, "seq_length": 1000000, "normalize": true}
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:940c40860999cc9e6349b73dfacc26be7f30d34e582679d1945e06ea0d9b8122
3
+ size 200408
tokenizer.json ADDED
@@ -0,0 +1,1715 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "[PAD]",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 1,
17
+ "content": "[UNK]",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 2,
26
+ "content": "[CLS]",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ },
33
+ {
34
+ "id": 3,
35
+ "content": "[SEP]",
36
+ "single_word": false,
37
+ "lstrip": false,
38
+ "rstrip": false,
39
+ "normalized": false,
40
+ "special": true
41
+ },
42
+ {
43
+ "id": 4,
44
+ "content": "[MASK]",
45
+ "single_word": false,
46
+ "lstrip": false,
47
+ "rstrip": false,
48
+ "normalized": false,
49
+ "special": true
50
+ }
51
+ ],
52
+ "normalizer": {
53
+ "type": "BertNormalizer",
54
+ "clean_text": true,
55
+ "handle_chinese_chars": true,
56
+ "strip_accents": null,
57
+ "lowercase": true
58
+ },
59
+ "pre_tokenizer": {
60
+ "type": "BertPreTokenizer"
61
+ },
62
+ "post_processor": {
63
+ "type": "TemplateProcessing",
64
+ "single": [
65
+ {
66
+ "SpecialToken": {
67
+ "id": "[CLS]",
68
+ "type_id": 0
69
+ }
70
+ },
71
+ {
72
+ "Sequence": {
73
+ "id": "A",
74
+ "type_id": 0
75
+ }
76
+ },
77
+ {
78
+ "SpecialToken": {
79
+ "id": "[SEP]",
80
+ "type_id": 0
81
+ }
82
+ }
83
+ ],
84
+ "pair": [
85
+ {
86
+ "SpecialToken": {
87
+ "id": "[CLS]",
88
+ "type_id": 0
89
+ }
90
+ },
91
+ {
92
+ "Sequence": {
93
+ "id": "A",
94
+ "type_id": 0
95
+ }
96
+ },
97
+ {
98
+ "SpecialToken": {
99
+ "id": "[SEP]",
100
+ "type_id": 0
101
+ }
102
+ },
103
+ {
104
+ "Sequence": {
105
+ "id": "B",
106
+ "type_id": 1
107
+ }
108
+ },
109
+ {
110
+ "SpecialToken": {
111
+ "id": "[SEP]",
112
+ "type_id": 1
113
+ }
114
+ }
115
+ ],
116
+ "special_tokens": {
117
+ "[CLS]": {
118
+ "id": "[CLS]",
119
+ "ids": [
120
+ 2
121
+ ],
122
+ "tokens": [
123
+ "[CLS]"
124
+ ]
125
+ },
126
+ "[SEP]": {
127
+ "id": "[SEP]",
128
+ "ids": [
129
+ 3
130
+ ],
131
+ "tokens": [
132
+ "[SEP]"
133
+ ]
134
+ }
135
+ }
136
+ },
137
+ "decoder": {
138
+ "type": "WordPiece",
139
+ "prefix": "##",
140
+ "cleanup": true
141
+ },
142
+ "model": {
143
+ "type": "WordPiece",
144
+ "unk_token": "[UNK]",
145
+ "continuing_subword_prefix": "##",
146
+ "max_input_chars_per_word": 100,
147
+ "vocab": {
148
+ "[PAD]": 0,
149
+ "[UNK]": 1,
150
+ "[CLS]": 2,
151
+ "[SEP]": 3,
152
+ "[MASK]": 4,
153
+ "\"": 5,
154
+ "%": 6,
155
+ "'": 7,
156
+ "(": 8,
157
+ ")": 9,
158
+ "+": 10,
159
+ ",": 11,
160
+ "-": 12,
161
+ ".": 13,
162
+ "/": 14,
163
+ "0": 15,
164
+ "1": 16,
165
+ "2": 17,
166
+ "3": 18,
167
+ "4": 19,
168
+ "5": 20,
169
+ "6": 21,
170
+ "7": 22,
171
+ "8": 23,
172
+ "9": 24,
173
+ ":": 25,
174
+ ";": 26,
175
+ "<": 27,
176
+ "=": 28,
177
+ ">": 29,
178
+ "?": 30,
179
+ "[": 31,
180
+ "]": 32,
181
+ "a": 33,
182
+ "b": 34,
183
+ "c": 35,
184
+ "d": 36,
185
+ "e": 37,
186
+ "f": 38,
187
+ "g": 39,
188
+ "h": 40,
189
+ "i": 41,
190
+ "k": 42,
191
+ "l": 43,
192
+ "m": 44,
193
+ "n": 45,
194
+ "o": 46,
195
+ "p": 47,
196
+ "r": 48,
197
+ "s": 49,
198
+ "t": 50,
199
+ "u": 51,
200
+ "v": 52,
201
+ "x": 53,
202
+ "y": 54,
203
+ "±": 55,
204
+ "·": 56,
205
+ "α": 57,
206
+ "β": 58,
207
+ "##r": 59,
208
+ "##4": 60,
209
+ "##d": 61,
210
+ "##0": 62,
211
+ "##1": 63,
212
+ "##3": 64,
213
+ "##z": 65,
214
+ "##y": 66,
215
+ "##b": 67,
216
+ "##5": 68,
217
+ "##6": 69,
218
+ "##w": 70,
219
+ "##a": 71,
220
+ "##v": 72,
221
+ "##e": 73,
222
+ "##n": 74,
223
+ "##h": 75,
224
+ "##l": 76,
225
+ "##8": 77,
226
+ "##9": 78,
227
+ "##7": 79,
228
+ "##2": 80,
229
+ "##g": 81,
230
+ "##c": 82,
231
+ "##t": 83,
232
+ "##u": 84,
233
+ "##i": 85,
234
+ "##p": 86,
235
+ "##m": 87,
236
+ "##s": 88,
237
+ "##o": 89,
238
+ "##f": 90,
239
+ "##x": 91,
240
+ "##k": 92,
241
+ "##q": 93,
242
+ "##er": 94,
243
+ "##on": 95,
244
+ "##en": 96,
245
+ "##es": 97,
246
+ "##ed": 98,
247
+ "the": 99,
248
+ "##in": 100,
249
+ "in": 101,
250
+ "##al": 102,
251
+ "##or": 103,
252
+ "an": 104,
253
+ "of": 105,
254
+ "##an": 106,
255
+ "##tion": 107,
256
+ "and": 108,
257
+ "##ar": 109,
258
+ "##as": 110,
259
+ "##ic": 111,
260
+ "##re": 112,
261
+ "##is": 113,
262
+ "##el": 114,
263
+ "##ent": 115,
264
+ "##ing": 116,
265
+ "to": 117,
266
+ "##ation": 118,
267
+ "##ol": 119,
268
+ "##os": 120,
269
+ "##le": 121,
270
+ "##ly": 122,
271
+ "st": 123,
272
+ "with": 124,
273
+ "##us": 125,
274
+ "for": 126,
275
+ "##id": 127,
276
+ "##th": 128,
277
+ "re": 129,
278
+ "pro": 130,
279
+ "as": 131,
280
+ "al": 132,
281
+ "##ce": 133,
282
+ "##ts": 134,
283
+ "is": 135,
284
+ "##ated": 136,
285
+ "was": 137,
286
+ "were": 138,
287
+ "that": 139,
288
+ "##um": 140,
289
+ "on": 141,
290
+ "##tr": 142,
291
+ "be": 143,
292
+ "##ity": 144,
293
+ "##ion": 145,
294
+ "##tic": 146,
295
+ "by": 147,
296
+ "##ate": 148,
297
+ "or": 149,
298
+ "##ment": 150,
299
+ "at": 151,
300
+ "us": 152,
301
+ "cell": 153,
302
+ "de": 154,
303
+ "##ant": 155,
304
+ "are": 156,
305
+ "20": 157,
306
+ "from": 158,
307
+ "we": 159,
308
+ "ad": 160,
309
+ "##ers": 161,
310
+ "##ase": 162,
311
+ "this": 163,
312
+ "##ine": 164,
313
+ "sp": 165,
314
+ "##te": 166,
315
+ "##ies": 167,
316
+ "per": 168,
317
+ "##ous": 169,
318
+ "co": 170,
319
+ "not": 171,
320
+ "##ial": 172,
321
+ "ph": 173,
322
+ "cells": 174,
323
+ "pre": 175,
324
+ "inter": 176,
325
+ "can": 177,
326
+ "##able": 178,
327
+ "10": 179,
328
+ "##ia": 180,
329
+ "trans": 181,
330
+ "patients": 182,
331
+ "all": 183,
332
+ "et": 184,
333
+ "these": 185,
334
+ "show": 186,
335
+ "high": 187,
336
+ "which": 188,
337
+ "##st": 189,
338
+ "study": 190,
339
+ "have": 191,
340
+ "##ally": 192,
341
+ "it": 193,
342
+ "between": 194,
343
+ "using": 195,
344
+ "##yl": 196,
345
+ "group": 197,
346
+ "protein": 198,
347
+ "also": 199,
348
+ "significant": 200,
349
+ "data": 201,
350
+ "he": 202,
351
+ "level": 203,
352
+ "no": 204,
353
+ "anti": 205,
354
+ "used": 206,
355
+ "been": 207,
356
+ "more": 208,
357
+ "than": 209,
358
+ "after": 210,
359
+ "has": 211,
360
+ "##ide": 212,
361
+ "results": 213,
362
+ "there": 214,
363
+ "may": 215,
364
+ "19": 216,
365
+ "control": 217,
366
+ "##ating": 218,
367
+ "two": 219,
368
+ "##one": 220,
369
+ "expression": 221,
370
+ "##ized": 222,
371
+ "other": 223,
372
+ "how": 224,
373
+ "but": 225,
374
+ "their": 226,
375
+ "treatment": 227,
376
+ "red": 228,
377
+ "sub": 229,
378
+ "both": 230,
379
+ "model": 231,
380
+ "low": 232,
381
+ "ca": 233,
382
+ "analysis": 234,
383
+ "over": 235,
384
+ "one": 236,
385
+ "##ct": 237,
386
+ "our": 238,
387
+ "function": 239,
388
+ "studies": 240,
389
+ "up": 241,
390
+ "gene": 242,
391
+ "min": 243,
392
+ "each": 244,
393
+ "time": 245,
394
+ "12": 246,
395
+ "form": 247,
396
+ "develop": 248,
397
+ "follow": 249,
398
+ "out": 250,
399
+ "however": 251,
400
+ "during": 252,
401
+ "different": 253,
402
+ "method": 254,
403
+ "such": 255,
404
+ "levels": 256,
405
+ "had": 257,
406
+ "system": 258,
407
+ "found": 259,
408
+ "health": 260,
409
+ "under": 261,
410
+ "only": 262,
411
+ "non": 263,
412
+ "compared": 264,
413
+ "based": 265,
414
+ "activity": 266,
415
+ "##ium": 267,
416
+ "associated": 268,
417
+ "addition": 269,
418
+ "into": 270,
419
+ "15": 271,
420
+ "cd": 272,
421
+ "##ization": 273,
422
+ "when": 274,
423
+ "##osis": 275,
424
+ "year": 276,
425
+ "present": 277,
426
+ "end": 278,
427
+ "use": 279,
428
+ "well": 280,
429
+ "specific": 281,
430
+ "mm": 282,
431
+ "effect": 283,
432
+ "its": 284,
433
+ "most": 285,
434
+ "increased": 286,
435
+ "11": 287,
436
+ "##ane": 288,
437
+ "mice": 289,
438
+ "further": 290,
439
+ "shown": 291,
440
+ "##ps": 292,
441
+ "observed": 293,
442
+ "number": 294,
443
+ "clinical": 295,
444
+ "cancer": 296,
445
+ "similar": 297,
446
+ "previous": 298,
447
+ "three": 299,
448
+ "first": 300,
449
+ "effects": 301,
450
+ "risk": 302,
451
+ "significantly": 303,
452
+ "reported": 304,
453
+ "disease": 305,
454
+ "term": 306,
455
+ "so": 307,
456
+ "could": 308,
457
+ "higher": 309,
458
+ "process": 310,
459
+ "13": 311,
460
+ "performed": 312,
461
+ "showed": 313,
462
+ "through": 314,
463
+ "where": 315,
464
+ "type": 316,
465
+ "14": 317,
466
+ "do": 318,
467
+ "they": 319,
468
+ "il": 320,
469
+ "human": 321,
470
+ "condition": 322,
471
+ "30": 323,
472
+ "genes": 324,
473
+ "age": 325,
474
+ "assess": 326,
475
+ "25": 327,
476
+ "16": 328,
477
+ "##oid": 329,
478
+ "response": 330,
479
+ "ml": 331,
480
+ "if": 332,
481
+ "test": 333,
482
+ "within": 334,
483
+ "18": 335,
484
+ "research": 336,
485
+ "mean": 337,
486
+ "groups": 338,
487
+ "some": 339,
488
+ "dna": 340,
489
+ "while": 341,
490
+ "24": 342,
491
+ "acid": 343,
492
+ "those": 344,
493
+ "proteins": 345,
494
+ "total": 346,
495
+ "micro": 347,
496
+ "among": 348,
497
+ "including": 349,
498
+ "related": 350,
499
+ "important": 351,
500
+ "induced": 352,
501
+ "individual": 353,
502
+ "main": 354,
503
+ "signal": 355,
504
+ "set": 356,
505
+ "years": 357,
506
+ "development": 358,
507
+ "binding": 359,
508
+ "who": 360,
509
+ "child": 361,
510
+ "17": 362,
511
+ "post": 363,
512
+ "target": 364,
513
+ "then": 365,
514
+ "increase": 366,
515
+ "50": 367,
516
+ "respectively": 368,
517
+ "rate": 369,
518
+ "role": 370,
519
+ "patient": 371,
520
+ "work": 372,
521
+ "multi": 373,
522
+ "samples": 374,
523
+ "mg": 375,
524
+ "factors": 376,
525
+ "normal": 377,
526
+ "complex": 378,
527
+ "tumor": 379,
528
+ "due": 380,
529
+ "did": 381,
530
+ "second": 382,
531
+ "100": 383,
532
+ "new": 384,
533
+ "long": 385,
534
+ "line": 386,
535
+ "lower": 387,
536
+ "will": 388,
537
+ "interaction": 389,
538
+ "drug": 390,
539
+ "changes": 391,
540
+ "same": 392,
541
+ "thus": 393,
542
+ "positive": 394,
543
+ "values": 395,
544
+ "mechanism": 396,
545
+ "small": 397,
546
+ "care": 398,
547
+ "methods": 399,
548
+ "treated": 400,
549
+ "although": 401,
550
+ "blood": 402,
551
+ "because": 403,
552
+ "differences": 404,
553
+ "potential": 405,
554
+ "ii": 406,
555
+ "following": 407,
556
+ "identified": 408,
557
+ "21": 409,
558
+ "growth": 410,
559
+ "infection": 411,
560
+ "described": 412,
561
+ "##ne": 413,
562
+ "05": 414,
563
+ "cases": 415,
564
+ "class": 416,
565
+ "obtained": 417,
566
+ "measure": 418,
567
+ "month": 419,
568
+ "activation": 420,
569
+ "less": 421,
570
+ "need": 422,
571
+ "nm": 423,
572
+ "lead": 424,
573
+ "22": 425,
574
+ "##oma": 426,
575
+ "therefore": 427,
576
+ "single": 428,
577
+ "tissue": 429,
578
+ "conditions": 430,
579
+ "would": 431,
580
+ "any": 432,
581
+ "standard": 433,
582
+ "##ness": 434,
583
+ "population": 435,
584
+ "without": 436,
585
+ "several": 437,
586
+ "week": 438,
587
+ "sample": 439,
588
+ "information": 440,
589
+ "presence": 441,
590
+ "factor": 442,
591
+ "approach": 443,
592
+ "days": 444,
593
+ "about": 445,
594
+ "95": 446,
595
+ "participants": 447,
596
+ "four": 448,
597
+ "day": 449,
598
+ "ms": 450,
599
+ "23": 451,
600
+ "previously": 452,
601
+ "##ri": 453,
602
+ "reduced": 454,
603
+ "region": 455,
604
+ "design": 456,
605
+ "species": 457,
606
+ "40": 458,
607
+ "review": 459,
608
+ "major": 460,
609
+ "measured": 461,
610
+ "trial": 462,
611
+ "period": 463,
612
+ "whether": 464,
613
+ "findings": 465,
614
+ "common": 466,
615
+ "included": 467,
616
+ "primary": 468,
617
+ "should": 469,
618
+ "rat": 470,
619
+ "brain": 471,
620
+ "children": 472,
621
+ "stress": 473,
622
+ "local": 474,
623
+ "size": 475,
624
+ "direct": 476,
625
+ "many": 477,
626
+ "case": 478,
627
+ "women": 479,
628
+ "organ": 480,
629
+ "evidence": 481,
630
+ "dependent": 482,
631
+ "large": 483,
632
+ "either": 484,
633
+ "range": 485,
634
+ "concentration": 486,
635
+ "value": 487,
636
+ "pattern": 488,
637
+ "before": 489,
638
+ "early": 490,
639
+ "os": 491,
640
+ "models": 492,
641
+ "part": 493,
642
+ "negative": 494,
643
+ "multiple": 495,
644
+ "mir": 496,
645
+ "experiments": 497,
646
+ "rna": 498,
647
+ "according": 499,
648
+ "receptor": 500,
649
+ "surface": 501,
650
+ "28": 502,
651
+ "demonstrated": 503,
652
+ "containing": 504,
653
+ "ir": 505,
654
+ "structure": 506,
655
+ "known": 507,
656
+ "difference": 508,
657
+ "26": 509,
658
+ "determined": 510,
659
+ "therapy": 511,
660
+ "functional": 512,
661
+ "change": 513,
662
+ "like": 514,
663
+ "37": 515,
664
+ "phase": 516,
665
+ "cm": 517,
666
+ "signaling": 518,
667
+ "expressed": 519,
668
+ "site": 520,
669
+ "vs": 521,
670
+ "pd": 522,
671
+ "likely": 523,
672
+ "strong": 524,
673
+ "result": 525,
674
+ "transcription": 526,
675
+ "current": 527,
676
+ "possible": 528,
677
+ "suggest": 529,
678
+ "months": 530,
679
+ "water": 531,
680
+ "27": 532,
681
+ "reaction": 533,
682
+ "29": 534,
683
+ "order": 535,
684
+ "000": 536,
685
+ "sequence": 537,
686
+ "analyses": 538,
687
+ "dose": 539,
688
+ "serum": 540,
689
+ "loss": 541,
690
+ "given": 542,
691
+ "35": 543,
692
+ "membrane": 544,
693
+ "decreased": 545,
694
+ "analyzed": 546,
695
+ "molecular": 547,
696
+ "60": 548,
697
+ "particular": 549,
698
+ "association": 550,
699
+ "intervention": 551,
700
+ "additional": 552,
701
+ "distribution": 553,
702
+ "might": 554,
703
+ "involved": 555,
704
+ "01": 556,
705
+ "down": 557,
706
+ "free": 558,
707
+ "concentrations": 559,
708
+ "subjects": 560,
709
+ "survival": 561,
710
+ "genetic": 562,
711
+ "being": 563,
712
+ "relationship": 564,
713
+ "life": 565,
714
+ "considered": 566,
715
+ "against": 567,
716
+ "contrast": 568,
717
+ "mouse": 569,
718
+ "available": 570,
719
+ "32": 571,
720
+ "improve": 572,
721
+ "since": 573,
722
+ "detected": 574,
723
+ "solution": 575,
724
+ "general": 576,
725
+ "##ae": 577,
726
+ "formation": 578,
727
+ "relative": 579,
728
+ "pcr": 580,
729
+ "very": 581,
730
+ "ed": 582,
731
+ "##ns": 583,
732
+ "state": 584,
733
+ "average": 585,
734
+ "body": 586,
735
+ "##ma": 587,
736
+ "ct": 588,
737
+ "ci": 589,
738
+ "production": 590,
739
+ "support": 591,
740
+ "pathway": 592,
741
+ "reduction": 593,
742
+ "behavior": 594,
743
+ "assay": 595,
744
+ "even": 596,
745
+ "responses": 597,
746
+ "weight": 598,
747
+ "rates": 599,
748
+ "indicated": 600,
749
+ "provide": 601,
750
+ "exposure": 602,
751
+ "overall": 603,
752
+ "ratio": 604,
753
+ "33": 605,
754
+ "sites": 606,
755
+ "area": 607,
756
+ "revealed": 608,
757
+ "here": 609,
758
+ "mass": 610,
759
+ "virus": 611,
760
+ "required": 612,
761
+ "determine": 613,
762
+ "family": 614,
763
+ "whereas": 615,
764
+ "domain": 616,
765
+ "medical": 617,
766
+ "various": 618,
767
+ "flow": 619,
768
+ "48": 620,
769
+ "greater": 621,
770
+ "prior": 622,
771
+ "mediated": 623,
772
+ "across": 624,
773
+ "individuals": 625,
774
+ "temperature": 626,
775
+ "regions": 627,
776
+ "via": 628,
777
+ "quality": 629,
778
+ "hospital": 630,
779
+ "least": 631,
780
+ "31": 632,
781
+ "length": 633,
782
+ "001": 634,
783
+ "36": 635,
784
+ "independent": 636,
785
+ "material": 637,
786
+ "sex": 638,
787
+ "followed": 639,
788
+ "consistent": 640,
789
+ "active": 641,
790
+ "network": 642,
791
+ "recent": 643,
792
+ "left": 644,
793
+ "position": 645,
794
+ "identify": 646,
795
+ "environment": 647,
796
+ "inhibition": 648,
797
+ "hiv": 649,
798
+ "mutations": 650,
799
+ "frequency": 651,
800
+ "developed": 652,
801
+ "mechanisms": 653,
802
+ "antibody": 654,
803
+ "scale": 655,
804
+ "step": 656,
805
+ "above": 657,
806
+ "death": 658,
807
+ "34": 659,
808
+ "lymph": 660,
809
+ "self": 661,
810
+ "compound": 662,
811
+ "presented": 663,
812
+ "resistance": 664,
813
+ "animals": 665,
814
+ "lines": 666,
815
+ "neurons": 667,
816
+ "field": 668,
817
+ "calculated": 669,
818
+ "controls": 670,
819
+ "cross": 671,
820
+ "liver": 672,
821
+ "examined": 673,
822
+ "diagnosis": 674,
823
+ "point": 675,
824
+ "wild": 676,
825
+ "indicate": 677,
826
+ "them": 678,
827
+ "plasma": 679,
828
+ "types": 680,
829
+ "times": 681,
830
+ "stage": 682,
831
+ "score": 683,
832
+ "pain": 684,
833
+ "derived": 685,
834
+ "vitro": 686,
835
+ "vivo": 687,
836
+ "experimental": 688,
837
+ "symptoms": 689,
838
+ "performance": 690,
839
+ "increasing": 691,
840
+ "mutant": 692,
841
+ "45": 693,
842
+ "defined": 694,
843
+ "conducted": 695,
844
+ "lung": 696,
845
+ "rats": 697,
846
+ "mrna": 698,
847
+ "tested": 699,
848
+ "kg": 700,
849
+ "bone": 701,
850
+ "weeks": 702,
851
+ "highly": 703,
852
+ "five": 704,
853
+ "lack": 705,
854
+ "regulation": 706,
855
+ "status": 707,
856
+ "medium": 708,
857
+ "collected": 709,
858
+ "effective": 710,
859
+ "right": 711,
860
+ "shows": 712,
861
+ "short": 713,
862
+ "hr": 714,
863
+ "correlation": 715,
864
+ "parameters": 716,
865
+ "provided": 717,
866
+ "furthermore": 718,
867
+ "##itis": 719,
868
+ "ability": 720,
869
+ "assessed": 721,
870
+ "outcomes": 722,
871
+ "social": 723,
872
+ "program": 724,
873
+ "cellular": 725,
874
+ "later": 726,
875
+ "focus": 727,
876
+ "##de": 728,
877
+ "light": 729,
878
+ "baseline": 730,
879
+ "sequences": 731,
880
+ "decrease": 732,
881
+ "bio": 733,
882
+ "interactions": 734,
883
+ "comparison": 735,
884
+ "chronic": 736,
885
+ "fold": 737,
886
+ "80": 738,
887
+ "applied": 739,
888
+ "does": 740,
889
+ "culture": 741,
890
+ "visual": 742,
891
+ "38": 743,
892
+ "breast": 744,
893
+ "immune": 745,
894
+ "evaluated": 746,
895
+ "initial": 747,
896
+ "imaging": 748,
897
+ "muscle": 749,
898
+ "inflammatory": 750,
899
+ "systems": 751,
900
+ "pressure": 752,
901
+ "aim": 753,
902
+ "volume": 754,
903
+ "fat": 755,
904
+ "characteristics": 756,
905
+ "surgery": 757,
906
+ "suggesting": 758,
907
+ "antibodies": 759,
908
+ "selected": 760,
909
+ "six": 761,
910
+ "images": 762,
911
+ "sensitivity": 763,
912
+ "detection": 764,
913
+ "better": 765,
914
+ "report": 766,
915
+ "glucose": 767,
916
+ "density": 768,
917
+ "##da": 769,
918
+ "impact": 770,
919
+ "variables": 771,
920
+ "peak": 772,
921
+ "key": 773,
922
+ "amino": 774,
923
+ "physical": 775,
924
+ "genome": 776,
925
+ "energy": 777,
926
+ "investigated": 778,
927
+ "trials": 779,
928
+ "statistical": 780,
929
+ "national": 781,
930
+ "few": 782,
931
+ "transfer": 783,
932
+ "##ta": 784,
933
+ "42": 785,
934
+ "experiment": 786,
935
+ "tumors": 787,
936
+ "resulting": 788,
937
+ "90": 789,
938
+ "index": 790,
939
+ "often": 791,
940
+ "pathways": 792,
941
+ "authors": 793,
942
+ "tissues": 794,
943
+ "properties": 795,
944
+ "mortality": 796,
945
+ "39": 797,
946
+ "scores": 798,
947
+ "limited": 799,
948
+ "70": 800,
949
+ "hand": 801,
950
+ "diseases": 802,
951
+ "absence": 803,
952
+ "fraction": 804,
953
+ "task": 805,
954
+ "made": 806,
955
+ "events": 807,
956
+ "play": 808,
957
+ "content": 809,
958
+ "combination": 810,
959
+ "outcome": 811,
960
+ "men": 812,
961
+ "poor": 813,
962
+ "biological": 814,
963
+ "cause": 815,
964
+ "isolated": 816,
965
+ "acute": 817,
966
+ "43": 818,
967
+ "studied": 819,
968
+ "strain": 820,
969
+ "seen": 821,
970
+ "moreover": 822,
971
+ "include": 823,
972
+ "areas": 824,
973
+ "food": 825,
974
+ "hours": 826,
975
+ "still": 827,
976
+ "measures": 828,
977
+ "suggested": 829,
978
+ "mutation": 830,
979
+ "corresponding": 831,
980
+ "confirmed": 832,
981
+ "diet": 833,
982
+ "approximately": 834,
983
+ "recently": 835,
984
+ "generated": 836,
985
+ "heart": 837,
986
+ "received": 838,
987
+ "male": 839,
988
+ "novel": 840,
989
+ "severe": 841,
990
+ "release": 842,
991
+ "infected": 843,
992
+ "suggests": 844,
993
+ "kinase": 845,
994
+ "tests": 846,
995
+ "animal": 847,
996
+ "processes": 848,
997
+ "image": 849,
998
+ "proliferation": 850,
999
+ "diabetes": 851,
1000
+ "sd": 852,
1001
+ "influence": 853,
1002
+ "increases": 854,
1003
+ "44": 855,
1004
+ "strains": 856,
1005
+ "injury": 857,
1006
+ "cycle": 858,
1007
+ "na": 859,
1008
+ "phosphorylation": 860,
1009
+ "rapid": 861,
1010
+ "structures": 862,
1011
+ "access": 863,
1012
+ "secondary": 864,
1013
+ "versus": 865,
1014
+ "upon": 866,
1015
+ "features": 867,
1016
+ "assessment": 868,
1017
+ "prevalence": 869,
1018
+ "complete": 870,
1019
+ "affect": 871,
1020
+ "together": 872,
1021
+ "receptors": 873,
1022
+ "help": 874,
1023
+ "functions": 875,
1024
+ "75": 876,
1025
+ "apoptosis": 877,
1026
+ "training": 878,
1027
+ "healthy": 879,
1028
+ "populations": 880,
1029
+ "central": 881,
1030
+ "center": 882,
1031
+ "directly": 883,
1032
+ "real": 884,
1033
+ "inhibitor": 885,
1034
+ "injection": 886,
1035
+ "enzyme": 887,
1036
+ "future": 888,
1037
+ "insulin": 889,
1038
+ "full": 890,
1039
+ "##se": 891,
1040
+ "activated": 892,
1041
+ "critical": 893,
1042
+ "measurements": 894,
1043
+ "patterns": 895,
1044
+ "bp": 896,
1045
+ "produced": 897,
1046
+ "differentiation": 898,
1047
+ "old": 899,
1048
+ "side": 900,
1049
+ "adult": 901,
1050
+ "knowledge": 902,
1051
+ "criteria": 903,
1052
+ "regression": 904,
1053
+ "staining": 905,
1054
+ "indicating": 906,
1055
+ "stimulation": 907,
1056
+ "components": 908,
1057
+ "conclusion": 909,
1058
+ "median": 910,
1059
+ "46": 911,
1060
+ "administration": 912,
1061
+ "estimated": 913,
1062
+ "much": 914,
1063
+ "caused": 915,
1064
+ "hyper": 916,
1065
+ "drugs": 917,
1066
+ "young": 918,
1067
+ "toward": 919,
1068
+ "molecules": 920,
1069
+ "peptide": 921,
1070
+ "taken": 922,
1071
+ "activities": 923,
1072
+ "proposed": 924,
1073
+ "testing": 925,
1074
+ "power": 926,
1075
+ "affected": 927,
1076
+ "way": 928,
1077
+ "good": 929,
1078
+ "along": 930,
1079
+ "female": 931,
1080
+ "able": 932,
1081
+ "proportion": 933,
1082
+ "combined": 934,
1083
+ "nuclear": 935,
1084
+ "enhanced": 936,
1085
+ "linear": 937,
1086
+ "47": 938,
1087
+ "terminal": 939,
1088
+ "intensity": 940,
1089
+ "synthesis": 941,
1090
+ "therapeutic": 942,
1091
+ "points": 943,
1092
+ "detect": 944,
1093
+ "management": 945,
1094
+ "despite": 946,
1095
+ "alcohol": 947,
1096
+ "poly": 948,
1097
+ "55": 949,
1098
+ "whole": 950,
1099
+ "damage": 951,
1100
+ "alone": 952,
1101
+ "compounds": 953,
1102
+ "improved": 954,
1103
+ "resulted": 955,
1104
+ "setting": 956,
1105
+ "relatively": 957,
1106
+ "evaluate": 958,
1107
+ "viral": 959,
1108
+ "larger": 960,
1109
+ "cognitive": 961,
1110
+ "people": 962,
1111
+ "incidence": 963,
1112
+ "regulated": 964,
1113
+ "iii": 965,
1114
+ "understanding": 966,
1115
+ "rt": 967,
1116
+ "finally": 968,
1117
+ "49": 969,
1118
+ "structural": 970,
1119
+ "efficacy": 971,
1120
+ "amount": 972,
1121
+ "41": 973,
1122
+ "rather": 974,
1123
+ "procedure": 975,
1124
+ "fluorescence": 976,
1125
+ "rs": 977,
1126
+ "selection": 978,
1127
+ "especially": 979,
1128
+ "iv": 980,
1129
+ "carried": 981,
1130
+ "correlated": 982,
1131
+ "ng": 983,
1132
+ "states": 984,
1133
+ "characterized": 985,
1134
+ "established": 986,
1135
+ "progression": 987,
1136
+ "cohort": 988,
1137
+ "reference": 989,
1138
+ "statistically": 990,
1139
+ "evaluation": 991,
1140
+ "duration": 992,
1141
+ "72": 993,
1142
+ "predicted": 994,
1143
+ "chemical": 995,
1144
+ "stem": 996,
1145
+ "history": 997,
1146
+ "skin": 998,
1147
+ "induction": 999,
1148
+ "reduce": 1000,
1149
+ "community": 1001,
1150
+ "disorders": 1002,
1151
+ "essential": 1003,
1152
+ "maximum": 1004,
1153
+ "gas": 1005,
1154
+ "host": 1006,
1155
+ "terms": 1007,
1156
+ "literature": 1008,
1157
+ "markers": 1009,
1158
+ "air": 1010,
1159
+ "cardiac": 1011,
1160
+ "open": 1012,
1161
+ "adults": 1013,
1162
+ "particularly": 1014,
1163
+ "screening": 1015,
1164
+ "cost": 1016,
1165
+ "means": 1017,
1166
+ "65": 1018,
1167
+ "products": 1019,
1168
+ "public": 1020,
1169
+ "percentage": 1021,
1170
+ "head": 1022,
1171
+ "objective": 1023,
1172
+ "inhibitors": 1024,
1173
+ "specifically": 1025,
1174
+ "protocol": 1026,
1175
+ "investigate": 1027,
1176
+ "lipid": 1028,
1177
+ "metabolic": 1029,
1178
+ "56": 1030,
1179
+ "context": 1031,
1180
+ "distinct": 1032,
1181
+ "relevant": 1033,
1182
+ "internal": 1034,
1183
+ "plant": 1035,
1184
+ "remains": 1036,
1185
+ "needed": 1037,
1186
+ "interest": 1038,
1187
+ "home": 1039,
1188
+ "demonstrate": 1040,
1189
+ "substrate": 1041,
1190
+ "white": 1042,
1191
+ "variation": 1043,
1192
+ "lesions": 1044,
1193
+ "subsequent": 1045,
1194
+ "53": 1046,
1195
+ "leading": 1047,
1196
+ "memory": 1048,
1197
+ "matrix": 1049,
1198
+ "inflammation": 1050,
1199
+ "sensitive": 1051,
1200
+ "best": 1052,
1201
+ "having": 1053,
1202
+ "western": 1054,
1203
+ "vascular": 1055,
1204
+ "laboratory": 1056,
1205
+ "bacterial": 1057,
1206
+ "surgical": 1058,
1207
+ "significance": 1059,
1208
+ "abnormal": 1060,
1209
+ "highest": 1061,
1210
+ "provides": 1062,
1211
+ "recorded": 1063,
1212
+ "third": 1064,
1213
+ "strength": 1065,
1214
+ "failure": 1066,
1215
+ "##to": 1067,
1216
+ "acids": 1068,
1217
+ "54": 1069,
1218
+ "source": 1070,
1219
+ "published": 1071,
1220
+ "processing": 1072,
1221
+ "component": 1073,
1222
+ "alpha": 1074,
1223
+ "mitochondrial": 1075,
1224
+ "search": 1076,
1225
+ "renal": 1077,
1226
+ "designed": 1078,
1227
+ "oxygen": 1079,
1228
+ "natural": 1080,
1229
+ "assays": 1081,
1230
+ "action": 1082,
1231
+ "exposed": 1083,
1232
+ "depression": 1084,
1233
+ "linked": 1085,
1234
+ "prevent": 1086,
1235
+ "phenotype": 1087,
1236
+ "controlled": 1088,
1237
+ "degree": 1089,
1238
+ "paper": 1090,
1239
+ "prepared": 1091,
1240
+ "nf": 1092,
1241
+ "successful": 1093,
1242
+ "chain": 1094,
1243
+ "what": 1095,
1244
+ "elevated": 1096,
1245
+ "beta": 1097,
1246
+ "regarding": 1098,
1247
+ "gel": 1099,
1248
+ "contribute": 1100,
1249
+ "hypothesis": 1101,
1250
+ "necessary": 1102,
1251
+ "last": 1103,
1252
+ "daily": 1104,
1253
+ "practice": 1105,
1254
+ "series": 1106,
1255
+ "stable": 1107,
1256
+ "appropriate": 1108,
1257
+ "occur": 1109,
1258
+ "tnf": 1110,
1259
+ "older": 1111,
1260
+ "useful": 1112,
1261
+ "reports": 1113,
1262
+ "survey": 1114,
1263
+ "metabolism": 1115,
1264
+ "application": 1116,
1265
+ "64": 1117,
1266
+ "oral": 1118,
1267
+ "onset": 1119,
1268
+ "core": 1120,
1269
+ "antigen": 1121,
1270
+ "importance": 1122,
1271
+ "syndrome": 1123,
1272
+ "wide": 1124,
1273
+ "96": 1125,
1274
+ "materials": 1126,
1275
+ "sequencing": 1127,
1276
+ "ion": 1128,
1277
+ "layer": 1129,
1278
+ "around": 1130,
1279
+ "strategies": 1131,
1280
+ "52": 1132,
1281
+ "education": 1133,
1282
+ "half": 1134,
1283
+ "induce": 1135,
1284
+ "disorder": 1136,
1285
+ "intracellular": 1137,
1286
+ "59": 1138,
1287
+ "technique": 1139,
1288
+ "measurement": 1140,
1289
+ "diagnostic": 1141,
1290
+ "57": 1142,
1291
+ "plants": 1143,
1292
+ "quantitative": 1144,
1293
+ "modified": 1145,
1294
+ "capacity": 1146,
1295
+ "continuous": 1147,
1296
+ "bacteria": 1148,
1297
+ "58": 1149,
1298
+ "occurred": 1150,
1299
+ "learning": 1151,
1300
+ "environmental": 1152,
1301
+ "agents": 1153,
1302
+ "wall": 1154,
1303
+ "identification": 1155,
1304
+ "double": 1156,
1305
+ "majority": 1157,
1306
+ "transition": 1158,
1307
+ "67": 1159,
1308
+ "generation": 1160,
1309
+ "motor": 1161,
1310
+ "targets": 1162,
1311
+ "experience": 1163,
1312
+ "evolution": 1164,
1313
+ "accumulation": 1165,
1314
+ "basis": 1166,
1315
+ "transport": 1167,
1316
+ "product": 1168,
1317
+ "led": 1169,
1318
+ "world": 1170,
1319
+ "delivery": 1171,
1320
+ "procedures": 1172,
1321
+ "longer": 1173,
1322
+ "degradation": 1174,
1323
+ "##co": 1175,
1324
+ "strategy": 1176,
1325
+ "resolution": 1177,
1326
+ "51": 1178,
1327
+ "approaches": 1179,
1328
+ "specificity": 1180,
1329
+ "yet": 1181,
1330
+ "nature": 1182,
1331
+ "must": 1183,
1332
+ "##nt": 1184,
1333
+ "global": 1185,
1334
+ "resistant": 1186,
1335
+ "course": 1187,
1336
+ "problem": 1188,
1337
+ "99": 1189,
1338
+ "accuracy": 1190,
1339
+ "grade": 1191,
1340
+ "treatments": 1192,
1341
+ "near": 1193,
1342
+ "variants": 1194,
1343
+ "strongly": 1195,
1344
+ "techniques": 1196,
1345
+ "epithelial": 1197,
1346
+ "regulatory": 1198,
1347
+ "problems": 1199,
1348
+ "forms": 1200,
1349
+ "methyl": 1201,
1350
+ "marker": 1202,
1351
+ "adjusted": 1203,
1352
+ "63": 1204,
1353
+ "66": 1205,
1354
+ "exercise": 1206,
1355
+ "attention": 1207,
1356
+ "interval": 1208,
1357
+ "examination": 1209,
1358
+ "interventions": 1210,
1359
+ "mixed": 1211,
1360
+ "improvement": 1212,
1361
+ "mainly": 1213,
1362
+ "electron": 1214,
1363
+ "media": 1215,
1364
+ "cortex": 1216,
1365
+ "channel": 1217,
1366
+ "endothelial": 1218,
1367
+ "physiological": 1219,
1368
+ "sleep": 1220,
1369
+ "unique": 1221,
1370
+ "peripheral": 1222,
1371
+ "alternative": 1223,
1372
+ "calcium": 1224,
1373
+ "mental": 1225,
1374
+ "examine": 1226,
1375
+ "complications": 1227,
1376
+ "infections": 1228,
1377
+ "females": 1229,
1378
+ "leads": 1230,
1379
+ "underlying": 1231,
1380
+ "achieved": 1232,
1381
+ "efficiency": 1233,
1382
+ "enzymes": 1234,
1383
+ "database": 1235,
1384
+ "birth": 1236,
1385
+ "uptake": 1237,
1386
+ "complexes": 1238,
1387
+ "migration": 1239,
1388
+ "3d": 1240,
1389
+ "cultures": 1241,
1390
+ "carbon": 1242,
1391
+ "stability": 1243,
1392
+ "late": 1244,
1393
+ "repair": 1245,
1394
+ "responsible": 1246,
1395
+ "potentially": 1247,
1396
+ "extent": 1248,
1397
+ "exhibited": 1249,
1398
+ "eight": 1250,
1399
+ "spatial": 1251,
1400
+ "inhibited": 1252,
1401
+ "produce": 1253,
1402
+ "smoking": 1254,
1403
+ "currently": 1255,
1404
+ "little": 1256,
1405
+ "countries": 1257,
1406
+ "moderate": 1258,
1407
+ "causes": 1259,
1408
+ "school": 1260,
1409
+ "appears": 1261,
1410
+ "68": 1262,
1411
+ "services": 1263,
1412
+ "purpose": 1264,
1413
+ "neuronal": 1265,
1414
+ "gender": 1266,
1415
+ "composition": 1267,
1416
+ "males": 1268,
1417
+ "developing": 1269,
1418
+ "targeting": 1270,
1419
+ "85": 1271,
1420
+ "stages": 1272,
1421
+ "##la": 1273,
1422
+ "intra": 1274,
1423
+ "targeted": 1275,
1424
+ "severity": 1276,
1425
+ "oxidative": 1277,
1426
+ "affinity": 1278,
1427
+ "profile": 1279,
1428
+ "compare": 1280,
1429
+ "aged": 1281,
1430
+ "coli": 1282,
1431
+ "neural": 1283,
1432
+ "students": 1284,
1433
+ "upper": 1285,
1434
+ "confidence": 1286,
1435
+ "ligand": 1287,
1436
+ "humans": 1288,
1437
+ "contact": 1289,
1438
+ "manner": 1290,
1439
+ "sodium": 1291,
1440
+ "tool": 1292,
1441
+ "space": 1293,
1442
+ "stroke": 1294,
1443
+ "intake": 1295,
1444
+ "recovery": 1296,
1445
+ "kidney": 1297,
1446
+ "adverse": 1298,
1447
+ "associations": 1299,
1448
+ "administered": 1300,
1449
+ "bmi": 1301,
1450
+ "seven": 1302,
1451
+ "dysfunction": 1303,
1452
+ "pregnancy": 1304,
1453
+ "simple": 1305,
1454
+ "fusion": 1306,
1455
+ "reactions": 1307,
1456
+ "transmission": 1308,
1457
+ "doses": 1309,
1458
+ "chemotherapy": 1310,
1459
+ "cardiovascular": 1311,
1460
+ "selective": 1312,
1461
+ "extracellular": 1313,
1462
+ "temporal": 1314,
1463
+ "behavioral": 1315,
1464
+ "unit": 1316,
1465
+ "lateral": 1317,
1466
+ "69": 1318,
1467
+ "pulmonary": 1319,
1468
+ "73": 1320,
1469
+ "solid": 1321,
1470
+ "elements": 1322,
1471
+ "mri": 1323,
1472
+ "carcinoma": 1324,
1473
+ "62": 1325,
1474
+ "past": 1326,
1475
+ "diagnosed": 1327,
1476
+ "monitoring": 1328,
1477
+ "obesity": 1329,
1478
+ "medicine": 1330,
1479
+ "commonly": 1331,
1480
+ "variety": 1332,
1481
+ "molecule": 1333,
1482
+ "technology": 1334,
1483
+ "making": 1335,
1484
+ "safety": 1336,
1485
+ "remained": 1337,
1486
+ "optimal": 1338,
1487
+ "relationships": 1339,
1488
+ "china": 1340,
1489
+ "inhibitory": 1341,
1490
+ "become": 1342,
1491
+ "consumption": 1343,
1492
+ "unknown": 1344,
1493
+ "76": 1345,
1494
+ "78": 1346,
1495
+ "microscopy": 1347,
1496
+ "radiation": 1348,
1497
+ "74": 1349,
1498
+ "stimulated": 1350,
1499
+ "prevention": 1351,
1500
+ "roles": 1352,
1501
+ "cholesterol": 1353,
1502
+ "fluid": 1354,
1503
+ "frequently": 1355,
1504
+ "underwent": 1356,
1505
+ "61": 1357,
1506
+ "anterior": 1358,
1507
+ "respiratory": 1359,
1508
+ "reviewed": 1360,
1509
+ "classification": 1361,
1510
+ "force": 1362,
1511
+ "ca2": 1363,
1512
+ "nerve": 1364,
1513
+ "maternal": 1365,
1514
+ "systemic": 1366,
1515
+ "artery": 1367,
1516
+ "secretion": 1368,
1517
+ "77": 1369,
1518
+ "altered": 1370,
1519
+ "guidelines": 1371,
1520
+ "##ra": 1372,
1521
+ "reducing": 1373,
1522
+ "toxicity": 1374,
1523
+ "decision": 1375,
1524
+ "randomized": 1376,
1525
+ "posterior": 1377,
1526
+ "conventional": 1378,
1527
+ "profiles": 1379,
1528
+ "remain": 1380,
1529
+ "questionnaire": 1381,
1530
+ "dynamics": 1382,
1531
+ "advanced": 1383,
1532
+ "hypertension": 1384,
1533
+ "benefit": 1385,
1534
+ "anxiety": 1386,
1535
+ "discussed": 1387,
1536
+ "differential": 1388,
1537
+ "98": 1389,
1538
+ "systematic": 1390,
1539
+ "collagen": 1391,
1540
+ "investigation": 1392,
1541
+ "heat": 1393,
1542
+ "isolates": 1394,
1543
+ "spectrum": 1395,
1544
+ "ten": 1396,
1545
+ "predict": 1397,
1546
+ "involvement": 1398,
1547
+ "widely": 1399,
1548
+ "##rs": 1400,
1549
+ "iron": 1401,
1550
+ "towards": 1402,
1551
+ "prostate": 1403,
1552
+ "84": 1404,
1553
+ "variability": 1405,
1554
+ "magnetic": 1406,
1555
+ "sexual": 1407,
1556
+ "plays": 1408,
1557
+ "recognition": 1409,
1558
+ "lesion": 1410,
1559
+ "86": 1411,
1560
+ "completed": 1412,
1561
+ "eye": 1413,
1562
+ "02": 1414,
1563
+ "understand": 1415,
1564
+ "partial": 1416,
1565
+ "88": 1417,
1566
+ "97": 1418,
1567
+ "dynamic": 1419,
1568
+ "impaired": 1420,
1569
+ "external": 1421,
1570
+ "87": 1422,
1571
+ "efficient": 1423,
1572
+ "mild": 1424,
1573
+ "living": 1425,
1574
+ "mechanical": 1426,
1575
+ "divided": 1427,
1576
+ "healthcare": 1428,
1577
+ "rare": 1429,
1578
+ "coronary": 1430,
1579
+ "71": 1431,
1580
+ "specimens": 1432,
1581
+ "article": 1433,
1582
+ "liquid": 1434,
1583
+ "meta": 1435,
1584
+ "prediction": 1436,
1585
+ "involving": 1437,
1586
+ "83": 1438,
1587
+ "agent": 1439,
1588
+ "root": 1440,
1589
+ "clinically": 1441,
1590
+ "postoperative": 1442,
1591
+ "fatty": 1443,
1592
+ "placebo": 1444,
1593
+ "fetal": 1445,
1594
+ "vaccine": 1446,
1595
+ "receiving": 1447,
1596
+ "metastasis": 1448,
1597
+ "infants": 1449,
1598
+ "applications": 1450,
1599
+ "matched": 1451,
1600
+ "describe": 1452,
1601
+ "alterations": 1453,
1602
+ "effectiveness": 1454,
1603
+ "##lo": 1455,
1604
+ "challenge": 1456,
1605
+ "removal": 1457,
1606
+ "laser": 1458,
1607
+ "invasive": 1459,
1608
+ "joint": 1460,
1609
+ "metal": 1461,
1610
+ "0001": 1462,
1611
+ "spinal": 1463,
1612
+ "soil": 1464,
1613
+ "reactive": 1465,
1614
+ "aspects": 1466,
1615
+ "cov": 1467,
1616
+ "hepatic": 1468,
1617
+ "dietary": 1469,
1618
+ "vitamin": 1470,
1619
+ "intestinal": 1471,
1620
+ "relation": 1472,
1621
+ "diabetic": 1473,
1622
+ "93": 1474,
1623
+ "impairment": 1475,
1624
+ "device": 1476,
1625
+ "82": 1477,
1626
+ "deficiency": 1478,
1627
+ "hydrogen": 1479,
1628
+ "81": 1480,
1629
+ "predictive": 1481,
1630
+ "92": 1482,
1631
+ "computed": 1483,
1632
+ "accurate": 1484,
1633
+ "prospective": 1485,
1634
+ "gastric": 1486,
1635
+ "optical": 1487,
1636
+ "cerebral": 1488,
1637
+ "recurrence": 1489,
1638
+ "pancreatic": 1490,
1639
+ "nine": 1491,
1640
+ "hormone": 1492,
1641
+ "traditional": 1493,
1642
+ "odds": 1494,
1643
+ "explore": 1495,
1644
+ "degrees": 1496,
1645
+ "frequent": 1497,
1646
+ "ray": 1498,
1647
+ "theory": 1499,
1648
+ "adolescents": 1500,
1649
+ "dimensional": 1501,
1650
+ "tract": 1502,
1651
+ "ventricular": 1503,
1652
+ "myocardial": 1504,
1653
+ "implementation": 1505,
1654
+ "deep": 1506,
1655
+ "transplantation": 1507,
1656
+ "malignant": 1508,
1657
+ "pathogenesis": 1509,
1658
+ "occurrence": 1510,
1659
+ "prognosis": 1511,
1660
+ "arterial": 1512,
1661
+ "successfully": 1513,
1662
+ "psychological": 1514,
1663
+ "prognostic": 1515,
1664
+ "fibrosis": 1516,
1665
+ "cervical": 1517,
1666
+ "organic": 1518,
1667
+ "aimed": 1519,
1668
+ "biopsy": 1520,
1669
+ "promising": 1521,
1670
+ "multivariate": 1522,
1671
+ "nanoparticles": 1523,
1672
+ "abdominal": 1524,
1673
+ "##ry": 1525,
1674
+ "resonance": 1526,
1675
+ "thyroid": 1527,
1676
+ "logistic": 1528,
1677
+ "resection": 1529,
1678
+ "retrospective": 1530,
1679
+ "implications": 1531,
1680
+ "gamma": 1532,
1681
+ "determination": 1533,
1682
+ "urinary": 1534,
1683
+ "chromatography": 1535,
1684
+ "trauma": 1536,
1685
+ "chinese": 1537,
1686
+ "morbidity": 1538,
1687
+ "undergoing": 1539,
1688
+ "operation": 1540,
1689
+ "characterization": 1541,
1690
+ "challenges": 1542,
1691
+ "elderly": 1543,
1692
+ "sectional": 1544,
1693
+ "##di": 1545,
1694
+ "ultrasound": 1546,
1695
+ "recurrent": 1547,
1696
+ "pediatric": 1548,
1697
+ "oxide": 1549,
1698
+ "twenty": 1550,
1699
+ "safe": 1551,
1700
+ "aortic": 1552,
1701
+ "tomography": 1553,
1702
+ "emergency": 1554,
1703
+ "##ba": 1555,
1704
+ "preoperative": 1556,
1705
+ "spectroscopy": 1557,
1706
+ "photo": 1558,
1707
+ "##ho": 1559,
1708
+ "##ll": 1560,
1709
+ "##no": 1561,
1710
+ "##li": 1562,
1711
+ "##mo": 1563,
1712
+ "##si": 1564
1713
+ }
1714
+ }
1715
+ }