checkpoints
- This model is a fine-tuned version of google/t5-v1_1-base on the
vblagoje/lfqa
dataset, with training duration of 2 epochs, for a (somewhat) apples-to-apples comparison with t5-base on the standard eli5 dataset.- This checkpoint does seem to be more coherent than t5-base on the original dataset.
- Compared to bart on lfqa, it seems to be able to respond to some questions independently of retrieval.
NOTE: the inference API is limited to generating approx. 64 chars for runtime reasons, for longer outputs try using it in python as a transformers pipeline object.
Intended uses & limitations
- Q&A, information retrieval
- it is probably better to use it with a retrieval pipeline than alone
Training and evaluation data
- see linked dataset. the dataset was filtered to only included the
askscience
subreddit in an attempt to focus on academic/technical queries.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 2
Training results
Framework versions
- Transformers 4.16.2
- Pytorch 1.10.0+cu113
- Datasets 1.18.3
- Tokenizers 0.11.0
- Downloads last month
- 19
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for pszemraj/t5-base-askscience-lfqa
Base model
google/t5-v1_1-base