Summaries with Sam

Title Method Date Lab Tags
Diffusion Models: Deep Dive diffusion 202603 OT diffusion, RL, training, own
Controlling Thinking Speed in Reasoning Models steering 202510 Alibaba, Zhejiang University efficiency, inference, reasoning, training-free
HRM-Agent: Training a recurrent reasoning model in dynamic environments using reinforcement learning HRM 202510 Cerenaut agents, hierarchy, latent reasoning, reasoning, reinforcement learning, small model, training
Less is More: Recursive Reasoning with Tiny Networks TRM 202510 Samsung hierarchy, latent reasoning, linear reasoning, reasoning, small model, supervised, training
Reasoning with Sampling: Your Base Model is Smarter Than You Think MCMC 202510 Harvard inference, reasoning, RL, training-free
Rethinking Thinking Tokens: LLMs as Improvement Operators PDR 202510 Meta bounded context, inference, linear reasoning, reasoning, RL, training
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization LTPO 202510 University of Oxford, UCL, Chinese Academy of Sciences efficiency, inference, latent reasoning, reasoning, RL, test-time, training-free
FlowRL: Matching Reward Distributions for LLM Reasoning FlowRL 202509 mixed diversity, GFlowNets, reasoning, RL, training
Soft Tokens, Hard Truths soft tokens 202509 Meta continuous thoughts, inference, latent reasoning, reasoning, RL, training
LLMs Are Single-Threaded Reasoners n/a 202508 Baidu inference, reasoning, RL, soft thinking, training-free
The Hidden Drivers of HRM's Performance on ARC-AGI HRM 202508 ARC Prize Team ARC, hierarchy, latent reasoning, reasoning, RL, small model, training
RL Algorithms: Deep Dive RL 202507 OT DAPO, GRPO, PPO, TRPO, RL, training, own
RL Algorithms: Overview for LLMs RL 202507 OT DAPO, GRPO, PPO, TRPO, RL, training, own
Hierarchical Reasoning Model HRM 202506 Sapient hierarchy, latent reasoning, linear reasoning, reasoning, small model, supervised, training
Magistral GRPO 202506 Minstral-AI GRPO, DAPO, reasoning, RL, training
Continous Chain of Thought Enables Parellel Exploration and Reasoning CoT2 202505 Uni of Michigan, Google Research continuous thoughts, GRPO, inference, latent reasoning, reasoning, RL
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models GRPO 202505 NVIDIA GRPO, reasoning, RL, training
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought COCONUT 202505 Unis of California inference, reasoning, RL
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space soft thinking 202505 Unis of California, others continuous thoughts, inference, latent reasoning, reasoning, training-free
Text Generation Beyond Discrete Token Sampling mixture of inputs 202505 UC San Diego, Microsoft Research continuous thoughts, inference, latent reasoning, reasoning
Training Large Language Models to Reason in a Continuous Latent Space COCONUT 202412 Meta continuous thoughts, inference, latent reasoning, reasoning, training
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models GRPO 202402 DeepSeek-AI GRPO, reasoning, RL, training
Let Models Speak Ciphers: Multiagent Debate through Embeddings CIPHER 202310 ByteDance continuous thoughts, inference, latent reasoning, reasoning