|
--- |
|
datasets: |
|
- EleutherAI/pile |
|
language: |
|
- en |
|
--- |
|
|
|
# DenseRetNet-350M |
|
|
|
An unofficial pretraining checkpoints for DenseRetNet-350M of paper DenseMamba: https://arxiv.org/abs/2403.00818, the trainig data is 15B tokens randomly samples from The Pile dataset. |
|
|
|
|
|
|
|
- recurrent generation examples: |
|
|
|
```python |
|
import torch |
|
import transformers |
|
model_name_or_path = '/path to model' |
|
MAX_NEW_TOKENS = 256 |
|
inference_dtype = torch.float16 |
|
|
|
generation_config = transformers.GenerationConfig( |
|
do_sample=False, |
|
max_new_tokens=MAX_NEW_TOKENS, |
|
) |
|
|
|
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name_or_path, use_fast=False, trust_remote_code=True) |
|
config = transformers.AutoConfig.from_pretrained(model_name_or_path, trust_remote_code=True) |
|
model = transformers.AutoModelForCausalLM.from_pretrained( |
|
model_name_or_path, torch_dtype=torch.float16, trust_remote_code=True) # .cuda() |
|
model.cuda() |
|
model = model.half() |
|
model.eval() |
|
input_sents = 'I have a dream' |
|
inputs = tokenizer(input_sents, return_tensors="pt", truncation=True, max_length=2048) |
|
output = model.generate(input_ids=inputs["input_ids"].cuda(), |
|
generation_config=generation_config, |
|
return_dict_in_generate=True, |
|
output_scores=True |
|
) |
|
output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True) |
|
print(output) |
|
``` |
|
|