Magistral |
GRPO |
202506 |
Minstral-AI |
grpo,
reasoning,
rl,
training
|
RL Algorithms Deep-Dive: TRPO, PPO & GRPO |
n/a |
202506 |
Olesker-Taylor |
grpo,
ppo,
trpo,
rl,
training,
own
|
Continous Chain of Thought Enables Parellel Exploration and Reasoning |
CoT2 |
202505 |
Gozeten, Ildiz, Zhang, Harutyunyan, Rawat, Oymak |
continuous thoughts,
inference,
reasoning,
superposition,
grpo,
rl
|
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought |
COCONUT |
202505 |
Zhu, Hao, Hu, Jiao, Russell, Tian |
inference,
reasoning,
rl,
superposition
|
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space |
soft thinking |
202505 |
Zhang, He, Yan, Shen, Zhao, Wang, Shen, Wang |
continuous thoughts,
inference,
reasoning,
superposition
|
Text Generation Beyond Discrete Token Sampling |
mixture of inputs |
202505 |
Zhuang, Liu, Singh, Shang, Gao |
continuous thoughts,
inference,
reasoning,
superposition
|
Training Large Language Models to Reason in a Continuous Latent Space |
COCONUT |
202412 |
Hao, Sukhbaatar, Su, Li, Hu, Weston, Tian |
continuous thoughts,
inference,
reasoning,
superposition
|
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models |
GRPO |
202404 |
DeepSeek-AI |
grpo,
reasoning,
rl,
training
|
Let Models Speak Ciphers: Multiagent Debate through Embeddings |
CIPHER |
202310 |
Pham, Liu, Yang, Chen, Liu, Yuan, Plummer, Wang, Yang |
continuous thoughts,
inference,
reasoning,
superposition
|