Can evolutionary biology tell us anything about LLMs?

Richard Dawkins recently made splashes by stating that he believes that Claude is conscious. It’s provoked the usual back and forth in the community, that will long continue in various forms, no doubt. However, there was one particular post he made that caught my attention:

My own title was, “If my friend Claudia is not conscious, then what the hell is consciousness for?”
If Claudia is unconscious, her behaviour shows that an unconscious zombie could survive without consciousness. Why wasn’t natural selection content to evolve competent zombies?
— Richard Dawkins (@RichardDawkins) May 2, 2026

I’m not on X, so perhaps this has been discussed, but what interests me is that it frames the discussion in evolutionary terms, which I think could lead to some insights.

Why would evolution drive towards consciousness? Despite consciousness being notoriously difficult to define, I think it’s a reasonable question. The agents that we know are conscious, humans, have to navigate the real world as well as the social world – and although the social world can be unforgiving, the real world really will punish mistakes. I know that there is research into embedding LLMs into robotics, but these things are very far from real-world deployment (would love to hear from anyone from robotics on this).

Whenever we see ‘AI’ in general and generative AI in particular come up against the real world – such as medical use cases or self-driving cars – things quickly fall apart. The fact is that self-driving cars need humans to get them out of any number of predicaments. And we are not even vaguely close to any kind of autonomous AI systems in healthcare (contrary to what the hype cycle might try to sell you). So maybe that’s one thing cosciouness does for us – it allows us to navigate the long-tail of the distribution. Biological evolution occurs in the real world, not the abstraction of human language or human systems. I keep hearing stories of Claude destroying entire code-bases. I’m not sure how much credence these stories have, the market will ultimately decide – but the harsh realities of the world can’t be manipulated like human markets, and I suspect if LLMs were to try to survive in the real world they would very quickly vanish.

I’m not entirely sure what tasks were so impressive to Dawkins, but according to a Guardian article they included feedback on an unpublished novel. I’ve been using GPT for the exact same task and have reached the opposite conclusion. It is very helpful to get rapid feedback on some structural issues in the novel, but it’s stupidity, absolute lack of creativity and sycophantic tendencies are all on display. Which is not to say it’s not helpful, I genuinely find it useful to point out any number of potential issues – but it’s suggested solutions are essentially useless. Maybe I just need to switch to Claude. But then that is how I like it: I want to write a novel myself, not prompt an LLM to write a novel.

I mention this as another factor I’d like to consider is energy consumption. Perhaps an agent with no regard for energy constraints can navigate certain aspects of the world all while appearing to be conscious to us, but actually having none (i.e. a philosophical zombie). But biological agents are quite severely energy constrained and perhaps this is what drives consciousness to evolve.

I did some back of the envelope calculations (with GPT of course). A typical Claude prompt output might consume 0.8 Wh or 2.9 kJ – although a reasoning task might consume 17 Wh or 61 kJ. I’ve had to use GPT in reasoning mode to get useful outputs for novel writing so let’s take the latter. I also have to be very precise and careful in how I prompt the agent in order to get useful responses, meaning I am having to invest some of my cognitive resources to the task on the LLM’s behalf, but let’s be generous and ignore that.

Estimating human energy consumption is far more difficult. Do we consider just the brain’s consumption or the whole body? If just the brain, can we estimate how much might be dedicated to the task at hand and how much to ‘background’ cognitive tasks. I settled on the overall brain – estimated as 20% of overall energy consumption in the body. Then do we factor in the reading time of a human – I think we should. However, I personally found putting an entire novel into GPT essentially useless – context rot is real – so I put in one chapter at a time.

Assumptions:

Chapter length: 5,000 words
Adult silent reading speed: ~238 words/min
Feedback writing/thinking time: 20 minutes
Brain power: ~20 W.

Gives about 50 kJ for the human.

5,000 words input ≈ 6,500–7,000 tokens
1,000 words feedback ≈ 1,300 tokens
Reasoning-mode hidden tokens: assume 3,000–20,000 extra tokens (who knows?)
Total processed/generated token burden: roughly 10,000–30,000+ tokens
Energy: use a rough frontier-LLM range of 5–50 Wh for a long reasoning-mode task

This gives a large range: low estimate: 18 kJ , mid estimate: 72 kJ and high estimate: 180 kJ

This is much closer than I thought it would be, but perhaps I’ve missed something (let me know). The LLM could be more 3 times more efficient or 3 times more less efficient than a human. Alas, my calculations are not too helpful, it would seem that human and LLM energy consumption is roughly similar for this task. One thing is clear to me from this; we are not comparing like for like and whatever LLMs are doing is not what humans are doing. It also highlights how difficult it is to do any science on proprietary LLMs, so much just has to be assumed.

This is all speculative an my part of course. But what surprises me is that Dawkins, an evolutionary biologist, doesn’t seem to have taken any of these sorts of issues into account (and no doubt much I’ve missed as I’m no evolutionary biologist). Perhaps he has and I’ve just missed it? Let me know