-
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Paper • 2412.13171 • Published • 31 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 43 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 57 -
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper • 2412.01928 • Published • 40
August Moharrami
August4293
·
AI & ML interests
None yet
Recent Activity
updated
a model
11 days ago
August4293/Llama3.1-8B-PRM-Deepseek-Data-4bit
published
a model
11 days ago
August4293/Llama3.1-8B-PRM-Deepseek-Data-4bit
Organizations
Collections
3
Collection of papers that utilize reinforcement learning to enhance tool usage and function calling.
-
Toolformer: Language Models Can Teach Themselves to Use Tools
Paper • 2302.04761 • Published • 11 -
On the Tool Manipulation Capability of Open-source Large Language Models
Paper • 2305.16504 • Published • 2 -
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 34
models
6
August4293/Llama3.1-8B-PRM-Deepseek-Data-4bit
Text Generation
•
Updated
•
12
August4293/tiny-llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
Updated
•
4
August4293/mistral_gsm8k_ssl_it2
Updated
August4293/mistral_gsm8k_ssl_it1
Updated
August4293/mistral_self_alignment_DPO
Updated
August4293/mistral_self_alignment_SFT
Updated
datasets
6
August4293/tldr-preference-sft-trl-style-sample
Viewer
•
Updated
•
100
•
126
August4293/tool_sample_dataset
Viewer
•
Updated
•
200
•
49
•
1
August4293/gsm8k_preference_dataset_it_2
Viewer
•
Updated
•
379
•
33
August4293/gsm8k_preference_dataset_it_1
Viewer
•
Updated
•
895
•
47
August4293/Self_Alignment_Preference-Dataset
Viewer
•
Updated
•
4.45k
•
51
August4293/CS_QA
Viewer
•
Updated
•
969
•
14