53 lines
No EOL
11 KiB
Markdown
53 lines
No EOL
11 KiB
Markdown
**Temporal Coherence: A Bottleneck in Automation**
|
||
|
||
*This research was conducted as part of the Economics of Transformative AI hackathon with Apart Research.*
|
||
|
||
Why hasn't AI automated away more professions? One potential explanation is that, despite their intelligence, these systems cannot (yet) act coherently over long periods of time. While AI has demonstrated impressive capabilities in solving concrete, isolated problems, maintaining consistent goals, reasoning, and plans over extended time frames has proven a more significant challenge.
|
||
|
||
In this context, our project seeks to answer two questions. First, we know AI systems are getting better at completing tasks over longer time horizons, as [METR’s recent work](http://.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/) shows. But how much better will they need to get in order to start having a real impact on the economy, or at least for their automation capabilities to increase significantly?[^1] Second, supposing this ability is a crucial bottleneck preventing AI systems from automating economic tasks, how much value could be unlocked and how soon?
|
||
|
||
TIn order to answer these questions, we define a concept that tries to capture AI’s ability to maintain consistent goals and plans while pursuing some task, called *temporal coherence*, in the context of [recent work](https://www.nber.org/papers/w32255) by Anton Korinek and Donghyun Suh. Then, we argue that temporal coherence might be the AI ability that, if solved, could unlock more economic value. With this framework in mind, we try to estimate the importance of temporal coherence in the US economy by measuring the time needed to complete all remotable tasks in the [O\*Net dataset](https://www.onetonline.org/). Our findings showsuggest that AI agents could have enough temporal coherence to perform any remote economic task by 2030, and that AI automation potential could increase in a pretty discontinuous manner, unlocking lots of economic value soon.
|
||
|
||
**What is Temporal Coherence?**
|
||
|
||
Recent discussions about the economics of AI have highlighted that while LLMs are very useful assistants and have most likely increased the productivity of many workers across the board, their effects in terms of full-job task automation have still to be felt (see [here](https://epoch.ai/epoch-after-hours/disagreements-on-agi-timelines) and [here](https://www.dwarkesh.com/p/ege-tamay)). The explanations offered for solving this puzzle usually invoke the words ‘agency’, ‘autonomy’, ‘coherence over long horizons’, or ‘adapt plans to simple circumstances’. TBecause this myriad of concepts is never clearly defined, this can create confusion.[^2]
|
||
|
||
To impose some conceptual clarity, we define the concept of *temporal coherence*. Temporal coherencehis concept is to be understood in the context of modern labor economics models, in particular, the task-based framework created by David Autor, Daron Acemoglu, Pascual Restrepo and others. In these models, tasks, rather than occupations, are the fundamental units of economic production.[^3] [Recent work](https://www.nber.org/papers/w32255) by Anton Korinek and Donghyun Suh refines this approach by modelling human work as composed of atomistic tasks that vary in computational complexity, conceptualising technological progress as expanding the ‘automation frontier’, gradually enabling machines to perform increasingly complex atomistic tasks. Our proposal is to add, on top of this atomistic framework, the ability of *putting together different atomistic tasks in the pursuit of a more complex task*. We call this ability temporal coherence.
|
||
|
||
To illustrate these abstract concepts and the usefulness of temporal coherence, let us consider teaching an economics course. While occupational databases like O\*NET might list ‘teach economic theories’ as a task, this high-level label can be decomposed into subtasks: planning a syllabus, preparing lectures, delivering explanations, answering questions, recognising confusion, adjusting content dynamically, and grading assignments. Each subtask differs in computational complexity. Planning a lecture may be relatively simple, and current AI systems might already be able to do a decent job at it; dynamically adapting explanations in response to subtle student cues, however, might be more challenging.
|
||
|
||
But even if an AI system could perform each subtask—preparing slides or answering factual questions—it is very likely that current systems could not deliver a coherent semester-long course. Teaching an economics course requires doing each one of the subtasks consistently and coherently until the course is effectively completed. For instance, effective teaching requires maintaining thematic and conceptual consistency over time, adapting to cumulative student understanding, and revising instructional strategies based on longitudinal feedback. In short, AIs do not yet have enough temporal coherence to complete this task at present.
|
||
|
||
The fact that temporal coherence remains a challenge when a task involves many different atomistic tasks being put together, seems a good explanation for why we haven’t seen more AI task automation yet.
|
||
|
||
**The economic value of temporal coherence**
|
||
|
||
Having established the concept of temporal coherence, we now want to argue that, out of all the abilities which AI currently struggles with, solving temporal coherence would unlock the *most* economic value.
|
||
|
||
This argument comes from the observation that current AI systems are really smart, combined with an intuition about what most real-world economic activity involves in practice.
|
||
|
||
We say this because we observeThe argument for this is a combination of thethinking this is the case combines the observation that current AI systems are really smart, and we have anwith an intuition about what most real-world economic tasks involve in practice. The intelligence of state-of-the-art LLMs arecan be clearly established by the results they obtain in various benchmarks. For example, reasoning models, like [OpenAI’s o3](https://openai.com/es-ES/index/introducing-o3-and-o4-mini/), seem very competent in domains like math or coding, where they get results approaching, matching or even surpassing experts in some cases.[^4] Models getting smarter and smarter rapidly is a [trend](https://ourworldindata.org/grapher/test-scores-ai-capabilities-relative-human-performance) that began with ChatGPT and continues to this day.
|
||
|
||
Despite these impressive results which clearly demonstrate AIs’ ability and intelligence, it does not seem that most economically valuable tasks could be performed *just* with this ‘raw’ intelligence. ‘'Unhobblings’' are necessary, to borrow Leopold Aschenbrenner’s [terminology](https://situational-awareness.ai/). What are AI systems lacking, exactly? Some possibilities include full multimodality, memory, context, cooperation abilities, and temporal coherence.[^5]
|
||
|
||
IConsidering just how intelligent these systems are, or, using our terminology, how many atomistic tasks they can perform, it would seem to us that if AI systems were capable of putting to use those abilities in a coherent way in the pursuit of a goal or task (that is, if they had perfect temporal coherence), then their automation potential would vastly increase. In contrast, systems with, say, perfect multimodality but without temporal coherence would suffer from the same automation limitations that LLMs face now. The automation potential of temporal coherence just seems much greater than for any other ability.[^6]
|
||
|
||
**Our Research Approach**
|
||
|
||
Temporal coherence is an ability that comes in degrees, and [recent work](http://.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/) by METR shows that AI systems are getting better at it,[^7] and at a rapid pace. But this, by itself, does not tell us how important these advances are in terms of AI task automation potential. It could be that, as METR’s projections show, by 2026 we have systems capable of completing tasks over an 8-hour time horizon. But if most tasks in the economy require much more coherence than this, then by 2026 temporal coherence will still be a bottleneck. To understand how big of a challenge temporal coherence will be for task automation, *we need a measure of the importance of temporal coherence in the economy*.
|
||
|
||
Unfortunately, measuring temporal coherence directly is challenginga challenge, since \--no established metric exists yet that categorises long-term projects by saying ‘this task requires X months of coherence’ or ‘that task requires Y years of coherence’. For this reason, we use as a proxy for temporal coherence *the time it would take a human to complete a given task autonomously*. This is an imperfect measurement, but it’s supported by the simple heuristic that tasks requiring longer periods of sustained, autonomous human effort seemingly necessitate a higher degree of temporal coherence for an AI agent to successfully automate them. Further, by using this proxy, our results can be combined with METR’s results to produce additional insights on the possible economic impact of future AI agents, as we explain in the next section.
|
||
|
||
[^1]: We make this clarification because many other factors are involved in actual automation of tasks. For instance, there might be other capabilities that current AI models lack which would also be needed for automation; or maybe it’s just not profitable to do so; or there’s regulation that prohibits it; etc.
|
||
|
||
[^2]: Just as an example, in the podcast linked above from Epoch AI, Ege Erdil makes a distinction between some of these concepts that is not usually made: ‘So to me it looks like the lack of common sense and lack of agency and ability to execute plans is a different competence than maintaining coherence over long context.’ This suggests clearer definitions are needed.
|
||
|
||
[^3]: See [here](https://www.nber.org/papers/w30074) for a great overview of the evolution of models about wage inequality and automation. The main papers in this area can be found [here](https://economics.mit.edu/sites/default/files/publications/the%20task%20approach%202013.pdf), [here](https://www.aeaweb.org/articles?id=10.1257/jep.33.2.3), and [here](https://economics.mit.edu/sites/default/files/publications/The%20Race%20Between%20Man%20and%20Machine%20-%20Implications%20of.pdf).
|
||
|
||
[^4]: We don’t want to go over all benchmarks in this post, but some important examples include MathFrontier, SWE-bench or GPQA Diamond, in which o3 obtained impressive results according to many commentators.
|
||
|
||
[^5]: This is not meant to be an exhaustive list, in part because, again, definitions get tricky and some of these abilities might be grouped together or involve other abilities depending on how you think about them.
|
||
|
||
[^6]: Some definitional issues will be commented on Section 3\.
|
||
|
||
[^7]: That’s the way we read the evidence, at least, from the framework of temporal coherence we have proposed. |