Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
Spaces:
CONDA-Workshop
/
Data-Contamination-Database
like
16
Running
App
Files
Files
Community
29
refs/pr/6
Data-Contamination-Database
14 contributors
History:
17 commits
vishaal27
Add data from "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus"
ad06fdc
verified
7 months ago
.gitattributes
Safe
1.52 kB
initial commit
8 months ago
.gitignore
Safe
12 Bytes
Style + gitignore
8 months ago
README.md
Safe
352 Bytes
Initital commit
8 months ago
app.py
Safe
6.23 kB
Increase tab font size
7 months ago
contamination_report.csv
Safe
34.5 kB
Add data from "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus"
7 months ago
dataset.py
Safe
9.64 kB
Add PR links to previous commits
7 months ago
markdown.py
Safe
9.83 kB
update urls
7 months ago
requirements.txt
Safe
73 Bytes
Initital commit
8 months ago
utils.py
Safe
6.11 kB
Get token from environment
7 months ago