BiGS_512 / README.md

updated readme

caa9d4a over 1 year ago

4.63 kB

	## Pretrained Models
	\|Sentence Length\|Trained Tokens\|Link\|
	\|----------\|----------\|----------\|
	\|128\|~11B\|[BiGS-11B-128](https://drive.google.com/drive/folders/1-nhzeWVgpXwMyNEQ5j-MwJxSzwKyT2an?usp=sharing)
	\|128\|~29B\|[BiGS-29B-128](https://drive.google.com/drive/folders/10Mtl8_XUJb2mmHLyRC9x1wltdIWy6aaP?usp=sharing)
	\|128\|~97B\|[BiGS-97B-128](https://huggingface.co/JunxiongWang/BiGS_128)
	\|512\|~108B\|[BiGS-108B-512](https://huggingface.co/JunxiongWang/BiGS_512)
	\|1024\|~110B\|[BiGS-110B-1024](https://huggingface.co/JunxiongWang/BiGS_1024)
	\|4096\|~110B\|[BiGS-110B-4096](https://huggingface.co/JunxiongWang/BiGS_4096)

	### MNLI Checkpoints

	\|Sentence Length\|Trained Tokens\|Model\|
	\|----------\|----------\|----------\|
	\|128\|~11B\|[BiGS-11B-128MNLI](https://drive.google.com/drive/folders/1-tn5ar_tRi9DnK_bNMZtPpappUdNnVET?usp=sharing)
	\|128\|~29B\|[BiGS-29B-128MNLI](https://drive.google.com/drive/folders/116JwMbChYp9tBuPTz5jbiaulhXrXt1P2?usp=sharing)
	\|128\|~97B\|[BiGS-97B-128MNLI](https://huggingface.co/JunxiongWang/BiGS_128_MNLI)
	\|512\|~108B\|[BiGS-108B-512MNLI](https://huggingface.co/JunxiongWang/BiGS_512_MNLI)

	<!-- Sentence length: 128

	\|Training Tokens\|Model\|
	\|----------\|----------\|
	\|~11B\|[https://drive.google.com/drive/folders/1-nhzeWVgpXwMyNEQ5j-MwJxSzwKyT2an?usp=sharing](https://drive.google.com/drive/folders/1-nhzeWVgpXwMyNEQ5j-MwJxSzwKyT2an?usp=sharing)
	\|~29B\|[https://drive.google.com/drive/folders/10Mtl8_XUJb2mmHLyRC9x1wltdIWy6aaP?usp=sharing](https://drive.google.com/drive/folders/10Mtl8_XUJb2mmHLyRC9x1wltdIWy6aaP?usp=sharing)
	\|~97B\|[https://huggingface.co/JunxiongWang/BiGS_128](https://huggingface.co/JunxiongWang/BiGS_128)
	-->

	<!-- Sentence length: 512

	\|Training Tokens\|Model\|
	\|----------\|----------\|
	\|~108B\|[https://huggingface.co/JunxiongWang/BiGS_512](https://huggingface.co/JunxiongWang/BiGS_512) -->

	<!-- MNLI checkpoint:

	\|Training Tokens\|Model\|
	\|----------\|----------\|
	\|~108B\|[https://huggingface.co/JunxiongWang/BiGS_512_MNLI](https://huggingface.co/JunxiongWang/BiGS_512_MNLI)

	Sentence length: 1024

	\|Training Tokens\|Model\|
	\|----------\|----------\|
	\|~110B\|[https://huggingface.co/JunxiongWang/BiGS_1024](https://huggingface.co/JunxiongWang/BiGS_1024)

	Sentence length: 4096

	\|Training Tokens\|Model\|
	\|----------\|----------\|
	\|~110B\|[https://huggingface.co/JunxiongWang/BiGS_4096](https://huggingface.co/JunxiongWang/BiGS_4096)
	-->
	## Example Usage


	### Load Masked Language Model

	```python
	import jax
	from jax import numpy as jnp
	from transformers import BertTokenizer
	from BiGS.modeling_flax_bigs import FlaxBiGSForMaskedLM

	tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
	model = FlaxBiGSForMaskedLM.from_pretrained('JunxiongWang/BiGS_128')

	text = "The goal of life is [MASK]."
	encoded_input = tokenizer(text, return_tensors='np', padding='max_length', max_length=128)
	output = model(**encoded_input)
	tokenizer.convert_ids_to_tokens(jnp.flip(jnp.argsort(jax.nn.softmax(output.logits[encoded_input['input_ids']==103]))[0])[:10])
	# output: ['happiness', 'love', 'peace', 'perfection', 'life', 'enlightenment', 'god', 'survival', 'freedom', 'good']
	jnp.flip(jnp.sort(jax.nn.softmax(output.logits[encoded_input['input_ids']==103]))[0])[:10]
	# probability: [0.16052087, 0.04306792, 0.03651363, 0.03468223, 0.02927081, 0.02549769, 0.02385132, 0.02261189, 0.01672831, 0.01619471]

	text = "Paris is the [MASK] of France."
	encoded_input = tokenizer(text, return_tensors='np', padding='max_length', max_length=128)
	output = model(**encoded_input)
	tokenizer.convert_ids_to_tokens(jnp.flip(jnp.argsort(jax.nn.softmax(output.logits[encoded_input['input_ids']==103]))[0])[:8])
	# output: ['capital', 'centre', 'center', 'city', 'capitol', 'prefecture', 'headquarters', 'president', 'metropolis', 'heart']
	jnp.flip(jnp.sort(jax.nn.softmax(output.logits[encoded_input['input_ids']==103]))[0])[:10]
	# probability: [0.9981787 , 0.00034076, 0.00026992, 0.00026926, 0.00017787, 0.00004816, 0.00004256, 0.00003716, 0.00003634, 0.00002893]
	```

	### Load Sequence Classification Model

	```python
	from BiGS.modeling_flax_bigs import FlaxBiGSForSequenceClassification
	model = FlaxBiGSForSequenceClassification.from_pretrained('JunxiongWang/BiGS_512')
	```

	### Load Question Answering Model

	```python
	from BiGS.modeling_flax_bigs import FlaxBiGSForQuestionAnswering
	model = FlaxBiGSForQuestionAnswering.from_pretrained('JunxiongWang/BiGS_512')
	```

	### Load Multiple Choice Classification Model

	```python
	from BiGS.modeling_flax_bigs import FlaxBiGSForMultipleChoice
	model = FlaxBiGSForMultipleChoice.from_pretrained('JunxiongWang/BiGS_512')
	```