view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20 • 63
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published May 13 • 17
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System Paper • 2402.15538 • Published Feb 23 • 6
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models Paper • 2402.15021 • Published Feb 22 • 12
TravelPlanner: A Benchmark for Real-World Planning with Language Agents Paper • 2402.01622 • Published Feb 2 • 33
DPO vs KTO vs IPO Collection A collection of datasets and models used for the Aligning LLMs with Direct Preference Optimization Methods blogpost • 2 items • Updated Jan 16 • 11
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Paper • 2312.03818 • Published Dec 6, 2023 • 32
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper • 2310.19909 • Published Oct 30, 2023 • 20