AgentInstruct: Toward Generative Teaching with Agentic Flows Paper • 2407.03502 • Published Jul 3, 2024 • 51
Prompt Order Experiment Collection Prompt Order Experiment shows how to run a simple experiment on the hub and leverage tools like AutoTrain, and Inference Endpoints. • 16 items • Updated 21 days ago • 2
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 30
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published Oct 21, 2024 • 59
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published Oct 2, 2024 • 16
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation Paper • 2410.01171 • Published Oct 2, 2024 • 5
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 36
Challenges and Responses in the Practice of Large Language Models Paper • 2408.09416 • Published Aug 18, 2024 • 1
Characterizing Prompt Compression Methods for Long Context Inference Paper • 2407.08892 • Published Jul 11, 2024 • 9
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Paper • 2409.06595 • Published Sep 10, 2024 • 37
LiveBench: A Challenging, Contamination-Free LLM Benchmark Paper • 2406.19314 • Published Jun 27, 2024 • 20
Efficient Detection of Toxic Prompts in Large Language Models Paper • 2408.11727 • Published Aug 21, 2024 • 12
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval Paper • 2407.12883 • Published Jul 16, 2024 • 8
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27, 2024 • 138
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26, 2024 • 24