|
## Pretrained Models |
|
|**Sentence Length**|**Trained Tokens**|**Link**| |
|
|----------|----------|----------| |
|
|128|~11B|[BiGS-11B-128](https://drive.google.com/drive/folders/1-nhzeWVgpXwMyNEQ5j-MwJxSzwKyT2an?usp=sharing) |
|
|128|~29B|[BiGS-29B-128](https://drive.google.com/drive/folders/10Mtl8_XUJb2mmHLyRC9x1wltdIWy6aaP?usp=sharing) |
|
|128|~97B|[BiGS-97B-128](https://huggingface.co/JunxiongWang/BiGS_128) |
|
|512|~108B|[BiGS-108B-512](https://huggingface.co/JunxiongWang/BiGS_512) |
|
|1024|~110B|[BiGS-110B-1024](https://huggingface.co/JunxiongWang/BiGS_1024) |
|
|4096|~110B|[BiGS-110B-4096](https://huggingface.co/JunxiongWang/BiGS_4096) |
|
|
|
### MNLI Checkpoints |
|
|
|
|**Sentence Length**|**Trained Tokens**|**Model**| |
|
|----------|----------|----------| |
|
|128|~11B|[BiGS-11B-128MNLI](https://drive.google.com/drive/folders/1-tn5ar_tRi9DnK_bNMZtPpappUdNnVET?usp=sharing) |
|
|128|~29B|[BiGS-29B-128MNLI](https://drive.google.com/drive/folders/116JwMbChYp9tBuPTz5jbiaulhXrXt1P2?usp=sharing) |
|
|128|~97B|[BiGS-97B-128MNLI](https://huggingface.co/JunxiongWang/BiGS_128_MNLI) |
|
|512|~108B|[BiGS-108B-512MNLI](https://huggingface.co/JunxiongWang/BiGS_512_MNLI) |
|
|
|
<!-- Sentence length: 128 |
|
|
|
|**Training Tokens**|**Model**| |
|
|----------|----------| |
|
|~11B|[https://drive.google.com/drive/folders/1-nhzeWVgpXwMyNEQ5j-MwJxSzwKyT2an?usp=sharing](https://drive.google.com/drive/folders/1-nhzeWVgpXwMyNEQ5j-MwJxSzwKyT2an?usp=sharing) |
|
|~29B|[https://drive.google.com/drive/folders/10Mtl8_XUJb2mmHLyRC9x1wltdIWy6aaP?usp=sharing](https://drive.google.com/drive/folders/10Mtl8_XUJb2mmHLyRC9x1wltdIWy6aaP?usp=sharing) |
|
|~97B|[https://huggingface.co/JunxiongWang/BiGS_128](https://huggingface.co/JunxiongWang/BiGS_128) |
|
--> |
|
|
|
<!-- Sentence length: 512 |
|
|
|
|**Training Tokens**|**Model**| |
|
|----------|----------| |
|
|~108B|[https://huggingface.co/JunxiongWang/BiGS_512](https://huggingface.co/JunxiongWang/BiGS_512) --> |
|
|
|
<!-- MNLI checkpoint: |
|
|
|
|**Training Tokens**|**Model**| |
|
|----------|----------| |
|
|~108B|[https://huggingface.co/JunxiongWang/BiGS_512_MNLI](https://huggingface.co/JunxiongWang/BiGS_512_MNLI) |
|
|
|
Sentence length: 1024 |
|
|
|
|**Training Tokens**|**Model**| |
|
|----------|----------| |
|
|~110B|[https://huggingface.co/JunxiongWang/BiGS_1024](https://huggingface.co/JunxiongWang/BiGS_1024) |
|
|
|
Sentence length: 4096 |
|
|
|
|**Training Tokens**|**Model**| |
|
|----------|----------| |
|
|~110B|[https://huggingface.co/JunxiongWang/BiGS_4096](https://huggingface.co/JunxiongWang/BiGS_4096) |
|
--> |
|
## Example Usage |
|
|
|
|
|
### Load Masked Language Model |
|
|
|
```python |
|
import jax |
|
from jax import numpy as jnp |
|
from transformers import BertTokenizer |
|
from BiGS.modeling_flax_bigs import FlaxBiGSForMaskedLM |
|
|
|
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased') |
|
model = FlaxBiGSForMaskedLM.from_pretrained('JunxiongWang/BiGS_128') |
|
|
|
text = "The goal of life is [MASK]." |
|
encoded_input = tokenizer(text, return_tensors='np', padding='max_length', max_length=128) |
|
output = model(**encoded_input) |
|
tokenizer.convert_ids_to_tokens(jnp.flip(jnp.argsort(jax.nn.softmax(output.logits[encoded_input['input_ids']==103]))[0])[:10]) |
|
# output: ['happiness', 'love', 'peace', 'perfection', 'life', 'enlightenment', 'god', 'survival', 'freedom', 'good'] |
|
jnp.flip(jnp.sort(jax.nn.softmax(output.logits[encoded_input['input_ids']==103]))[0])[:10] |
|
# probability: [0.16052087, 0.04306792, 0.03651363, 0.03468223, 0.02927081, 0.02549769, 0.02385132, 0.02261189, 0.01672831, 0.01619471] |
|
|
|
text = "Paris is the [MASK] of France." |
|
encoded_input = tokenizer(text, return_tensors='np', padding='max_length', max_length=128) |
|
output = model(**encoded_input) |
|
tokenizer.convert_ids_to_tokens(jnp.flip(jnp.argsort(jax.nn.softmax(output.logits[encoded_input['input_ids']==103]))[0])[:8]) |
|
# output: ['capital', 'centre', 'center', 'city', 'capitol', 'prefecture', 'headquarters', 'president', 'metropolis', 'heart'] |
|
jnp.flip(jnp.sort(jax.nn.softmax(output.logits[encoded_input['input_ids']==103]))[0])[:10] |
|
# probability: [0.9981787 , 0.00034076, 0.00026992, 0.00026926, 0.00017787, 0.00004816, 0.00004256, 0.00003716, 0.00003634, 0.00002893] |
|
``` |
|
|
|
### Load Sequence Classification Model |
|
|
|
```python |
|
from BiGS.modeling_flax_bigs import FlaxBiGSForSequenceClassification |
|
model = FlaxBiGSForSequenceClassification.from_pretrained('JunxiongWang/BiGS_512') |
|
``` |
|
|
|
### Load Question Answering Model |
|
|
|
```python |
|
from BiGS.modeling_flax_bigs import FlaxBiGSForQuestionAnswering |
|
model = FlaxBiGSForQuestionAnswering.from_pretrained('JunxiongWang/BiGS_512') |
|
``` |
|
|
|
### Load Multiple Choice Classification Model |
|
|
|
```python |
|
from BiGS.modeling_flax_bigs import FlaxBiGSForMultipleChoice |
|
model = FlaxBiGSForMultipleChoice.from_pretrained('JunxiongWang/BiGS_512') |
|
``` |