The Eternal Sloptember

I’m calling it now, the adoption of AI agents into software development will be one of the most costly mistakes in the field’s history. Agents cannot program, and it’s taking longer and longer to realize that they can’t. They are a highly sophisticated statistical model designed to mimic the distribution of programming. The output is broken, but in a way that’s getting harder and harder to detect. Which is exactly what you’d expect from an increasingly accurate statistical model.

At first, I rejected this. I bought into the Twitter explanation of status anxiety. I define some of my self worth by my programming abilities, so wouldn’t it make sense to get defensive around that loss? Deny the models can code for as long as I could to preserve my ego?

I mean, it’s very clear they can solve math problems I couldn’t hope to solve if I devoted my life to it. So why can’t they program? Maybe I’m just not good enough of a programmer to recognize their genius.

I really tried for the last 6 months. I wrote some parts of tinygrad with agents. I reversed a USB <-> PCIe chip with agents. But each time I suspected I could have done it better and faster manually. The agent frontloads all the progress, then gives you a slot machine lever to pull to hope it gets the polish done. It never quite gets there.

And in before, “you are using it wrong.” I have tried all the different models, different harnesses, different prompts. It’s not this. The people who say this would probably say the same thing about slot machines, you see, you have to bet 5 lines after you get a cherry no wonder you aren’t winning!

I’m not saying that AI isn’t useful, it clearly is. It’s definitely a better Google for most searches. And whenever you need a quick prototype and don’t care about polish, it is absurdly fast. But is it a software engineer? Not close to the bar at any company I have worked at. The key aspect is knowing when to use it and when not to.

I thought more about the self worth preservation thing. AFL found more bugs than LLMs and nobody felt that way about it. Chess and Go are more popular than ever. I cannot fucking wait until I have armies of robot associates I can trust to clean up my code! I don’t fear loss of status, I almost think this is some kind of psyop to sell agents. Fear of loss is one of the only ways to make big companies move. Though I think in that fear they are making a big mistake.

Agents will end up hurting large organizations more than high performing individuals or small orgs. I’ve watched how my friends and coworkers have adopted these tools over the last 6 months. A trait you find in all high performing people is the ability to error correct, and they have mostly been good at seeing when slop is slop. It takes a bit to explore/exploit and tune the outer loops around when to use them, when to trust them, how to use them, etc…but I haven’t seen anyone of them move to a model where they don’t carefully read and understand each line, except in some confined domains.

Contrast this with a large organization. Much slower feedback loops, much less alignment. The bottom performers won’t have that self check. They are the ones producing 10x output with the agents. What do you think is happening to the average output of that organization? What is happening to the average output of the world?

Agents will end up producing more code, more apps, and more features than ever before. It is a golden era for buckets and buckets of slop, and a dark age for gems of quality.

I hear that Apple is pushing AI on all their engineers. When people think in the abstract, they think AI will do all this stuff, but let’s focus on a concrete example. Do you think macOS will get better or worse in the next 2 years?

When people see an artifact, they make assumptions about the process that was used to create it. Without even thinking about it, they assume the creator had a basically human state of mind. This assumption is no longer true. Things can be broken in ways that weren’t previously possible, and old proxies of underlying quality like syntax and grammar are useless. AI produced artifacts are not produced by the same process as human ones, and this difference, while extremely subtle in statistics, makes itself obvious when you try to interact with and build on the artifact in human ways.

Without fully endorsing all their ideas, I’m now in the LeCun/Marcus camp on LLMs. I don’t think models like this will ever be able to program, I think the process matters. I think that deep learning is still the solution, but real programming agents will need world models, not some RLVR shit that comments out the failing test and tells you all the tests are now passing.

The real story of this era will be who manages to avoid harming themselves in their AI psychosis.

推荐订阅源

Lobsters