Foundation AI Papers (II)
- Paper • 2404.19733 • Published • 47
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 73Note well ...
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 62
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 108Note "Less scalable version" of AGI backend model
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Paper • 2303.02536 • Published • 1Suppressing Pink Elephants with Direct Principle Feedback
Paper • 2402.07896 • Published • 9Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Paper • 2310.01801 • Published • 3Aligning LLM Agents by Learning Latent Preference from User Edits
Paper • 2404.15269 • Published • 1Language-Image Models with 3D Understanding
Paper • 2405.03685 • Published • 1Chain of Thoughtlessness: An Analysis of CoT in Planning
Paper • 2405.04776 • Published • 1Memory Mosaics
Paper • 2405.06394 • Published • 2The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 12RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67
PHUDGE: Phi-3 as Scalable Judge
Paper • 2405.08029 • Published • 1Note LoRA fine-tune on judge LM, using dataset from Prometheus's 10K feedback dataset. Turn LLM into a classifier to increase 'overfitting' and get a slightly better performing model based on Phi-3 (which arguably already have a stronger performance than Mistral) Not that surprising, and using large dataset to fine-tune on human preference is boring. They did release code for the experiment which is nice to have. The real gem is efficient alignment.
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Paper • 2405.09220 • Published • 24Understanding the performance gap between online and offline alignment algorithms
Paper • 2405.08448 • Published • 14
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Paper • 2405.05904 • Published • 6Note A good way to avoid penalty while being lazy is just to be generic, or provide fake information
Robust agents learn causal world models
Paper • 2402.10877 • Published • 2How Far Are We From AGI
Paper • 2405.10313 • Published • 3
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 46Note What is the difference again?
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Paper • 2405.12939 • Published • 1Note Majority vote is unreliable when the distribution is skewed. Variety in the prompt is used to elicit diversity in the evaluation distribution, where majority vote is obtained, this is the key for AoR.
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 87Note Duh
The Platonic Representation Hypothesis
Paper • 2405.07987 • Published • 2Note Intelligence has at least 2 levels: Level 1 associative intelligence, key to achieve it is representation of concept such that 'distance' between representation vectors accurately depict the closeness of these concepts, such intelligence can be achieved with Supervised Learning. Level 2 is deductive intelligence, key to achieve that is searching for the right connection and reach the correct conclusion robustless to noisy input. This should be achieved with Reinforcement Learning.
AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct
Paper • 2405.14906 • Published • 23Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning
Paper • 2405.17258 • Published • 14Executable Code Actions Elicit Better LLM Agents
Paper • 2402.01030 • Published • 27
Contextual Position Encoding: Learning to Count What's Important
Paper • 2405.18719 • Published • 5Note HUGE
Understanding Transformer Reasoning Capabilities via Graph Algorithms
Paper • 2405.18512 • Published • 1What's the Magic Word? A Control Theory of LLM Prompting
Paper • 2310.04444 • Published • 1Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper • 2406.00888 • Published • 30Calibrated Language Models Must Hallucinate
Paper • 2311.14648 • Published • 1How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 30
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 21Note SImilar to LLM2LLM, reduces the selection cost by using a smaller LLM. But, it goes back to the model agnostic training design which is suboptimal.
To Believe or Not to Believe Your LLM
Paper • 2406.02543 • Published • 32Note LLM suffers from confirmation bias. Given a single-label query Q, the confidence of LLM in answering A can be tested by adding "Another answer to Q is B" into the prompt and check the change in LLM's confidence in providing A as answer. More specifically P(A) / P(A) + P(B) is adopted as a score. Such changes serve as a query-specific hallucination indicator.
TextGrad: Automatic "Differentiation" via Text
Paper • 2406.07496 • Published • 27Note Requires drastic simplification as the current mehcanism basically doesn't work, nonthless, identifaction of the need for "semantic gradient" is a correct insight
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Paper • 2406.11194 • Published • 15Note Self Distillation Loss with Context to supervise distribution learning with distribution target improves efficient and also "battles knowlegde collapsing"
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Paper • 2406.06592 • Published • 25Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Paper • 2406.12034 • Published • 14Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper • 2406.06469 • Published • 24
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
Paper • 2406.14283 • Published • 2Note Overpromise quite a lot. This one applies Aligner (residual connected extra network on top of LLM) to learn a reward model and generate under MCTS structure.
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
Paper • 2406.11896 • Published • 18Note This is the future. I am trying to build OS version of this one too.
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
Paper • 2406.13542 • Published • 16Note Arguably a spin-off from Voyager
HARE: HumAn pRiors, a key to small language model Efficiency
Paper • 2406.11410 • Published • 38
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 85Note Trained a instruction synthethizer to generate QA pairs from raw text. (Efficient way of getting around GPT-4 rate limit I supposed) (10k SFT data for synthethizer training) Use the extra QA dataset to conduct collective pre-training (raw corpus + QA pairs) and found better performance.
Teaching Arithmetic to Small Transformers
Paper • 2307.03381 • Published • 17Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network
Paper • 2406.15109 • Published • 1Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 95Unlocking Continual Learning Abilities in Language Models
Paper • 2406.17245 • Published • 28ColPali: Efficient Document Retrieval with Vision Language Models
Paper • 2407.01449 • Published • 41Flextron: Many-in-One Flexible Large Language Model
Paper • 2406.10260 • Published • 2