Your mum has probably heard of Thinking, Fast and Slow (Daniel Kahneman, 2011). So has everyone in LLMs, and this is yet another paper with the same theme:
"How can we combine the advantages of both System 1 and System 2 thinking within one model, thus simultaneously achieving both efficiency and accuracy?"
The authors argue that some LRMs intrinsically possess both slow- and fast-thinking abilities. They observe that the slow and fast outputs consistently start with distinct opening words: "okay" or "alright" versus "to" or "first". This provides a built-in switch that can be manually activated. A principal component aligning with the "slow → fast" direction is found.
A dynamic reasoning speed control method is used: for each hidden layer , replace .
The authors observe that some LRMs inherently exhibit both fast- and slow-thinking modes. They analyse the leading-word frequencies in top-100 shortest/longest responses by DeepSeek-R1-Distill-Qwen-7B on MATH-500.

Given this fairly stark difference, they hypothesise that different leading words steer LRMs to long/short responses. This is investigated by seeding the response with different words - presumably "To" and "Okay", but unclear.

There is a pretty significant drop in both performance and token count in fast vs slow.
RopE (uncited) claims that abstract cognitive functions are encoded as linear directions in LLMs' representation space. The authors hypothesise similarly that distinct reasoning are governed by a direction. Three stages are used to compute the vector.
Designing Stimuli. Given an input , the fast/slow response is used as the positive/negative stimulus, denoted .
Collecting Hidden Representations. For each layer, each input stimulus is processed, and the hidden states at the final position of the stimulus is collected.
Constructing the PCA Model. Denote the pairs collected in Step 2 as . For half the pairs, calculate the difference () and the other half the reverse (); this forms a dataset . The first principal component aligns with "slow → fast": and .
The vector is used as the steering vector. Specifically, at a given lager, the hidden state is modified with intensity :
A collection of LRMs are benchmarked with differing values of .

The previous experiments fix throughout the run. Contrastingly, humans dedicate more/less computational energy to (perceived) difficult/easy parts. The goal of this section is to estimate the difficulty, and dynamically adjust accordingly.
The details are rather technical, not particularly clearly explained and, I feel, probably not super important. So, the reader is directed to §4 in the paper. Just the (apparently impressive) results are reported here, with a small amount of commentary. Throughout is restricted to .

The paper makes a strong case for activating already-existing fast-/slow-thinking transitions. The training-free approach is much more natural than some of the somewhat ad-hoc soft thinking, mixture of inputs or similar approaches.
That said, the choice of steering vector is slightly ad-hoc, taking a selection of observed words. Perhaps this could be improved by trying to learn it? It is model-specific, though, which is a good start.
A clearer analysis of the adaptive case would be nice, with some kind of real-time estimate on the "thinking speed" - eg, 'effective '.
Overall, it is a decent paper, making a much more concrete case for "System 1 vs System 2" than many others that I've read.
"We believe our work would inspire future research on the development of more efficient and intelligent AI systems. We foresee no negative societal impacts from our research[.]"
Well, that's good to know. Here I was, thinking there could be pros and cons. No, "no negative societal impacts".