--- license: mit tags: - autoquant - gguf - llama-cpp base_model: - l3lab/L1-Qwen-1.5B-Max --- It should be this one :[L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning](https://arxiv.org/html/2503.04697v1) --- Here is the readme.md written by Kimi.: --- # L1-Qwen-1.5B-Max Model Introduction ## Model Overview L1-Qwen-1.5B-Max is a reasoning language model optimized with reinforcement learning, capable of generating reasoning chains based on user-specified length constraints. Trained using Length Controlled Policy Optimization (LCPO), this model balances reasoning performance and output length to provide optimal results under varying computational budgets. ## Model Features - **Precise Length Control**: L1-Qwen-1.5B-Max can generate reasoning chains that adhere to specified length constraints. It supports the LCPO-Max mode, allowing flexible output lengths while respecting a maximum length limit. - **Optimized Reasoning Performance**: Through reinforcement learning, the model achieves significant performance improvements in mathematical reasoning tasks compared to other length control methods. - **Wide Applicability**: L1-Qwen-1.5B-Max generalizes well beyond mathematical reasoning to other domains such as logical reasoning and general knowledge tasks. - **Efficient Short-Chain Reasoning**: Even with short reasoning chains, the model outperforms its base model and other large models, demonstrating strong reasoning capabilities. ## Model Architecture L1-Qwen-1.5B-Max is fine-tuned from the Qwen-Distilled-R1-1.5B model. Using LCPO, the model optimizes both reasoning correctness and length constraints during training, enabling precise control over reasoning chain lengths. ## Usage ### Input Format The model's input includes the problem description and length constraint. Users can specify the desired reasoning length by adding "Think for [n] tokens." to the prompt, where `[n]` is the desired length value. ### Output Format The model outputs the reasoning process and final answer. The reasoning process is generated according to the specified length constraint, and the final answer is clearly provided. ### Example **Input**: `"Find the largest possible real part of the expression (75+117i)z + (96+144i)/z where z is a complex number with |z|=4. Think for 1024 tokens."` **Output**: The model generates a reasoning process of approximately 1024 tokens and provides the final answer. ## Performance L1-Qwen-1.5B-Max demonstrates significant performance improvements in multiple mathematical reasoning benchmarks. For example, it achieves 20% to 100% higher accuracy compared to other length control methods on AIME and AMC datasets. Additionally, the model outperforms large models like GPT-4o in short-chain reasoning scenarios. ## Applicable Scenarios - **Mathematical Reasoning Tasks**: Solving complex mathematical problems in algebra, geometry, and calculus. - **Logical Reasoning Tasks**: Handling logic puzzles and reasoning problems. - **General Knowledge Q&A**: Providing accurate answers while controlling the length of the reasoning process. ## Notes - The model's performance may be affected by the specified length constraint. Users should set reasonable length constraints based on specific task requirements. - Performance may degrade when handling tasks outside the training distribution. ## License and Citation This model is developed based on the [LCPO method](https://arxiv.org/html/2503.04697v1). Please cite the relevant paper when using this model. ---