Experimental-Models (Experimental Models)

conceptofmind

authored a paper 7 months ago

Consent in Crisis: The Rapid Decline of the AI Data Commons

Paper • 2407.14933 • Published Jul 20, 2024 • 12

conceptofmind

posted an update 11 months ago

Post

3295

Teraflop AI is excited to help support the Caselaw Access Project and Harvard Library Innovation Lab, in the release of over 6.6 million state and federal court decisions published throughout U.S. history. It is important to democratize fair access to data to the public, legal community, and researchers. This is a processed and cleaned version of the original CAP data.

During the digitization of these texts, there were erroneous OCR errors that occurred. We worked to post-process each of the texts for model training to fix encoding, normalization, repetition, redundancy, parsing, and formatting.

Teraflop AI’s data engine allows for the massively parallel processing of web-scale datasets into cleaned text form.

Link to the processed dataset: https://huggingface.co/datasets/TeraflopAI/Caselaw_Access_Project

The Caselaw Access Project dataset is licensed under the CC0 License.

We plan to release trillions of commercially licensed text tokens, images, audio, videos, and other datasets spanning numerous domains and modalities over the next months. If you are interested in contributing commercially licensed data be sure to reach out: https://twitter.com/EnricoShippole

Follow us for the next collaborative dataset releases: https://twitter.com/TeraflopAI

conceptofmind

authored 2 papers about 1 year ago

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

Paper • 2310.16787 • Published Oct 25, 2023 • 5

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Paper • 2401.11605 • Published Jan 21, 2024 • 22

conceptofmind

posted an update about 1 year ago

Post

A 1b dense causal language model begins to "saturate" in terms of accuracy around 5 epochs on 1.2T tokens.

conceptofmind

authored a paper over 1 year ago

YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 66

conceptofmind

updated 4 models over 1 year ago

conceptofmind

updated a dataset over 1 year ago

Experimental-Models/D

Updated Aug 12, 2023 • 1

Experimental Models

AI & ML interests

Experimental-Models's activity

Consent in Crisis: The Rapid Decline of the AI Data Commons

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

YaRN: Efficient Context Window Extension of Large Language Models

Experimental-Models/D-Llama-2-7b-4k-3e-6-500k-epoch-3

Experimental-Models/D-Llama-2-7b-4k-3e-6-500k-epoch-2

Experimental-Models/D-Llama-2-7b-4k-3e-6-500k-epoch-1

Experimental-Models/D-Llama-2-7b-4k-3e-6-1m

Experimental-Models/D

AI & ML interests

Team members 1

Experimental-Models's activity