view article Article Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK By davidberenstein1957 β’ Nov 21, 2024 β’ 35
view article Article πΊπ¦ββ¬ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark By wolfram β’ about 10 hours ago β’ 14
Executable Code Actions Elicit Better LLM Agents Paper β’ 2402.01030 β’ Published Feb 1, 2024 β’ 29
Open LLM Leaderboard best models β€οΈβπ₯ Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: β’ 64 items β’ Updated 24 minutes ago β’ 493
GTE models Collection General Text Embedding Models Released by Tongyi Lab of Alibaba Group β’ 19 items β’ Updated 13 days ago β’ 19
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper β’ 2412.13018 β’ Published 17 days ago β’ 41
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper β’ 2306.05685 β’ Published Jun 9, 2023 β’ 32
π± Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs β’ 9 items β’ Updated about 1 month ago β’ 22
Open-Sora Plan: Open-Source Large Video Generation Model Paper β’ 2412.00131 β’ Published Nov 28, 2024 β’ 32
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 45 items β’ Updated Nov 28, 2024 β’ 452
Star Attention: Efficient LLM Inference over Long Sequences Paper β’ 2411.17116 β’ Published Nov 26, 2024 β’ 47
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published Nov 15, 2024 β’ 111
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais β’ Nov 13, 2024 β’ 98
TableGPT2: A Large Multimodal Model with Tabular Data Integration Paper β’ 2411.02059 β’ Published Nov 4, 2024 β’ 5
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 β’ 40 items β’ Updated Nov 28, 2024 β’ 258