In addition, they show a counter-intuitive scaling limit: their reasoning effort and hard work improves with difficulty complexity around a degree, then declines Irrespective of obtaining an enough token spending plan. By comparing LRMs with their regular LLM counterparts under equivalent inference compute, we detect three overall performance regimes: https://www.youtube.com/watch?v=snr3is5MTiU