Tristan Thrush
Tristan
AI & ML interests
NLP, Datasets, Multimodality
Recent Activity
upvoted
an
article
about 2 months ago
Optimizing Pretraining Data Mixes with LLM-Estimated Utility
updated
a model
about 2 months ago
Tristan/dclm-perplexity-correlations-410m-3
updated
a model
about 2 months ago
Tristan/dclm-perplexity-correlations-160m-3
Organizations
Tristan's activity
Convert dataset to Parquet
1
#10 opened 5 months ago
by
Tristan

Trouble getting access to dataset
3
#9 opened 5 months ago
by
iliang1234
Update license_agreement.txt
#7 opened 11 months ago
by
Tristan

Update README.md
#2 opened about 1 year ago
by
Tristan

Streaming dataset generation
3
#6 opened about 1 year ago
by
davidmezzetti

Notifications from Datasets Server
2
#5 opened over 1 year ago
by
parquet-converter

readme: add language tag
1
#6 opened over 1 year ago
by
stefan-it

Add code highlighting to the README
1
#4 opened over 1 year ago
by
bryant1410

Add LM and MLM tasks
1
#1 opened about 2 years ago
by
lhoestq

Add TF weights
2
#1 opened about 2 years ago
by
joaogante

Update tokenizer_config.json
1
#2 opened about 2 years ago
by
joaogante

Add TF weights
2
#1 opened about 2 years ago
by
joaogante

Add TF weights
2
#1 opened about 2 years ago
by
joaogante

Add TF weights
2
#2 opened about 2 years ago
by
joaogante

Update tokenizer_config.json
1
#3 opened about 2 years ago
by
joaogante

Create README.md
4
#1 opened over 2 years ago
by
puffy310

Make filters shareable
4
#10 opened over 2 years ago
by
BramVanroy

Resource used to produce this version of dataset?
1
#1 opened over 2 years ago
by
spate141

Update `hf_hub_url` call
1
#2 opened over 2 years ago
by
xiaohk

Replace hf_hub_url calls with relative path
1
#3 opened over 2 years ago
by
mariosasko
