| Diffusion Models: Deep Dive |
diffusion |
202603 |
OT |
diffusion,
RL,
training,
own
|
| Controlling Thinking Speed in Reasoning Models |
steering |
202510 |
Alibaba, Zhejiang University |
efficiency,
inference,
reasoning,
training-free
|
| HRM-Agent: Training a recurrent reasoning model in dynamic environments using reinforcement learning |
HRM |
202510 |
Cerenaut |
agents,
hierarchy,
latent reasoning,
reasoning,
reinforcement learning,
small model,
training
|
| Less is More: Recursive Reasoning with Tiny Networks |
TRM |
202510 |
Samsung |
hierarchy,
latent reasoning,
linear reasoning,
reasoning,
small model,
supervised,
training
|
| Reasoning with Sampling: Your Base Model is Smarter Than You Think |
MCMC |
202510 |
Harvard |
inference,
reasoning,
RL,
training-free
|
| Rethinking Thinking Tokens: LLMs as Improvement Operators |
PDR |
202510 |
Meta |
bounded context,
inference,
linear reasoning,
reasoning,
RL,
training
|
| Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization |
LTPO |
202510 |
University of Oxford, UCL, Chinese Academy of Sciences |
efficiency,
inference,
latent reasoning,
reasoning,
RL,
test-time,
training-free
|
| FlowRL: Matching Reward Distributions for LLM Reasoning |
FlowRL |
202509 |
mixed |
diversity,
GFlowNets,
reasoning,
RL,
training
|
| Soft Tokens, Hard Truths |
soft tokens |
202509 |
Meta |
continuous thoughts,
inference,
latent reasoning,
reasoning,
RL,
training
|
| LLMs Are Single-Threaded Reasoners |
n/a |
202508 |
Baidu |
inference,
reasoning,
RL,
soft thinking,
training-free
|
| The Hidden Drivers of HRM's Performance on ARC-AGI |
HRM |
202508 |
ARC Prize Team |
ARC,
hierarchy,
latent reasoning,
reasoning,
RL,
small model,
training
|
| RL Algorithms: Deep Dive |
RL |
202507 |
OT |
DAPO,
GRPO,
PPO,
TRPO,
RL,
training,
own
|
| RL Algorithms: Overview for LLMs |
RL |
202507 |
OT |
DAPO,
GRPO,
PPO,
TRPO,
RL,
training,
own
|
| Hierarchical Reasoning Model |
HRM |
202506 |
Sapient |
hierarchy,
latent reasoning,
linear reasoning,
reasoning,
small model,
supervised,
training
|
| Magistral |
GRPO |
202506 |
Minstral-AI |
GRPO,
DAPO,
reasoning,
RL,
training
|
| Continous Chain of Thought Enables Parellel Exploration and Reasoning |
CoT2 |
202505 |
Uni of Michigan, Google Research |
continuous thoughts,
GRPO,
inference,
latent reasoning,
reasoning,
RL
|
| ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models |
GRPO |
202505 |
NVIDIA |
GRPO,
reasoning,
RL,
training
|
| Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought |
COCONUT |
202505 |
Unis of California |
inference,
reasoning,
RL
|
| Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space |
soft thinking |
202505 |
Unis of California, others |
continuous thoughts,
inference,
latent reasoning,
reasoning,
training-free
|
| Text Generation Beyond Discrete Token Sampling |
mixture of inputs |
202505 |
UC San Diego, Microsoft Research |
continuous thoughts,
inference,
latent reasoning,
reasoning
|
| Training Large Language Models to Reason in a Continuous Latent Space |
COCONUT |
202412 |
Meta |
continuous thoughts,
inference,
latent reasoning,
reasoning,
training
|
| DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models |
GRPO |
202402 |
DeepSeek-AI |
GRPO,
reasoning,
RL,
training
|
| Let Models Speak Ciphers: Multiagent Debate through Embeddings |
CIPHER |
202310 |
ByteDance |
continuous thoughts,
inference,
latent reasoning,
reasoning
|