Musings

Title Method Date Authors Tags
Magistral GRPO 202506 Minstral-AI grpo, reasoning, rl, training
RL Algorithms Deep-Dive: TRPO, PPO & GRPO n/a 202506 Olesker-Taylor grpo, ppo, trpo, rl, training, own
Continous Chain of Thought Enables Parellel Exploration and Reasoning CoT2 202505 Gozeten, Ildiz, Zhang, Harutyunyan, Rawat, Oymak continuous thoughts, inference, reasoning, superposition, grpo, rl
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought COCONUT 202505 Zhu, Hao, Hu, Jiao, Russell, Tian inference, reasoning, rl, superposition
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space soft thinking 202505 Zhang, He, Yan, Shen, Zhao, Wang, Shen, Wang continuous thoughts, inference, reasoning, superposition
Text Generation Beyond Discrete Token Sampling mixture of inputs 202505 Zhuang, Liu, Singh, Shang, Gao continuous thoughts, inference, reasoning, superposition
Training Large Language Models to Reason in a Continuous Latent Space COCONUT 202412 Hao, Sukhbaatar, Su, Li, Hu, Weston, Tian continuous thoughts, inference, reasoning, superposition
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models GRPO 202404 DeepSeek-AI grpo, reasoning, rl, training
Let Models Speak Ciphers: Multiagent Debate through Embeddings CIPHER 202310 Pham, Liu, Yang, Chen, Liu, Yuan, Plummer, Wang, Yang continuous thoughts, inference, reasoning, superposition