view article Article Announcing Finance Commons and the Bad Data Toolbox: Pioneering Open Data and Advanced Document Processing By Pclanglais • Jul 19 • 17
Common Corpus Collection The largest public domain dataset for training LLMs. • 27 items • Updated Jul 17 • 112