What exhausts me isn’t “falling behind.” It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
This agentic arms race by C-suite know-nothings feels less like leverage and more like denial. We took a stochastic text generator, noticed it lies confidently, wipes entire databases and harddrives, and responded by wrapping it in managers, sub-agents, memories, tools, permissions, workflows, and orchestration layers so we don’t have to look directly at the fact that it still doesn’t understand anything.
Now we’re expected to maintain a mental model not just of our system, but of a swarm of half-reliable interns talking to each other in a language that isn’t executable, reproducible, or stable.
Work now feels duller than dishwater, enough to have forced me to career pivot for 2026.
I think AI-assisted programming may be having the opposite effect, at least for me.
I'm now incentivized to use less abstractions.
Why do we code with React? It's because synchronizing state between a UI and a data model is difficult and it's easy to make mistakes, so it's worth paying the React complexity/page-weight tax in order for a "better developer experience" that allows us to build working, reliable software with less typing of code into a text editor.
If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
How often have you dropped in a big complex library like Moment.js just because you needed to convert a time from one format to another, and it would take too long to hand-write that one feature (and add tests for it to make sure it's robust)? With an LLM that's a single prompt and a couple of minutes of wait.
Using LLMs to build black box abstraction layers is a choice. We can choose to have them build LESS abstraction layers for us instead.
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
I've had plenty of junior devs justify massive code bases of random scripts and 100+ line functions with the same logic. There's a reason senior devs almost always push back on this when it's encountered.
Everything hinges on that "if". But you're baking a tautology into your reasoning: "if LLMs can do everything we need them to, we can use LLMs for everything we need".
The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.
So "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.
This is clearly not the case with simplistic LLM usage today. "Ah! But you need agents and memory and context management, etc!" But all of these are abstractions. This is what I believe the parent comment is really pointing out.
If AI could do what we originally hoped it could: follow simple instructions to solve complex tasks. We'd be great, and I would agree with your argument. But we are very clearly not in that world. Especially since Karpathy can't even keep up with the sophisticated machinery necessary to properly orchestrate these tools. All of the people decrying "you're not doing it right!" are emphatically proving that LLMs cannot perform these tasks at the level we need them to.
I'm saying that a key component of the dependency calculation has changed.
It used to be that one of the most influential facts affecting your decision to add a new library was the cost of writing the subset of code that you needed yourself. If writing that code and the accompanying tests represented more than an hour of work, a library was usually a better investment.
If the code and tests take a few minutes those calculations can look very different.
Making these decisions effectively and responsibly is one of the key characteristics of a senior engineer, which is why it's so interesting that all of those years of intuition are being disrupted.
The code we are producing remains the same. The difference is that a senior developer may have written that function + tests in several hours, at a cost of thousands of dollars. Now that same senior developer can produce exactly the same code at a time cost of less than $100.
React is hundreds of thousands of lines of code (or millions - I haven’t looked in awhile). Sure, you can start by having the LLM create a simple way to sync state across components, but in a serious project you’re going to run into edge-cases that cause the complexity of your LLM-built library to keep growing. There may come a point at which the complexity grows to such a point that the LLM itself can’t maintain the library effectively. I think the same rough argument applies to MomentJS.
What's concerning to many of us is that you've (and others) have said this same thing s/Opus 4.5/some other model/
That feels more like chasing than a clear line of improvement. It's interrupted very different from something like "my habits have changed quite a bit since reading The Art of Computer Programming". They're categorically different.
It's because the models keep getting better! What you could do with GPT-4 was more impressive than what you could do with GPT 3.5. What you could do with Sonnet 3.5 was more impressive yet, and Sonnet 4, and Sonnet 4.5.
Some of these improvements have been minor, some of them have been big enough to feel like step changes. Sonnet 3.7 + Claude Code (they came out at the same time) was a big step change; Opus 4.5 similarly feels like a big step change.
If you're sincerely trying these models out with the intention of seeing if you can make them work for you, and doing all the things you should do in those cases, then even if you're getting negative results somehow, you need to keep trying, because there will come a point where the negative turns positive for you.
If you're someone who's been using them productively for a while now, you need to keep changing how you use them, because what used to work is no longer optimal.
Models keep getting better but the argument I'm critiquing stays the same.
So does the comment I critiqued in the sibling comment to yours. I don't know why it's so hard to believe we just haven't tried. I have a Claude subscription. I'm an ML researcher myself. Trust me, I do try.
But that last part also makes me keenly aware of their limitations and failures. Frankly I don't trust experts who aren't critiquing their field. Leave the selling points to the marketing team. The engineer and researcher's job is to be critical. To find problems. I mean how the hell do you solve problems if you're unable to identify them lol. Let the marketing team lead development direction instead? Sounds like a bad way to solve problems
> benchmark shows huge improvements
Benchmarks are often difficult to interpret. It is really problematic that they got incorporated into marketing. If you don't understand what a benchmark measures, and more importantly, what it doesn't measure, then I promise you that you're misunderstanding what those numbers mean.
For METR I think they say a lot right here (emphasis my own) that reinforces my point
> Current frontier AIs are vastly better than humans at text prediction and knowledge tasks. They outperform experts on most *exam-style problems* for a fraction of the cost. ... And yet the best AI agents are not currently able to carry out substantive projects by themselves or directly substitute for human labor. *They are unable to reliably handle even relatively low-skill*, computer-based work like remote executive assistance. It is clear that capabilities are increasing very rapidly in some sense, but it is unclear how this corresponds to real-world impact.
So make sure you're really careful to understand what is being measured. What improvement actually means. To understand the bounds.
It's great that they include longer tasks but also notice the biases and distribution in the human workers. This is important in properly evaluating.
Also remember what exactly I quoted. For a long time we've all known that being good at leetcode doesn't make one a good engineer. But it's an easy thing to test and the test correlates with other skills that are likely to be learned to be good at those tests (despite being able to metric hack). We're talking about massive compression machines. That pattern match. Pattern matching tends to get much more difficult as task time increases but this is not a necessary condition.
Treat every benchmark adversarialy. If you can't figure out how to metric hack it then you don't know what a benchmark is measuring (and just because you know what can hack it doesn't mean you understand it nor that that's what is being measured)
I think you should ask yourself: If it were true that 1) these things do in fact work, 2) these things are in fact getting better... what would people be saying?
The answer is: Exactly what we are saying. This is also why people keep suggesting that you need to try them out with a more open mind, or with different techniques: Because we know with absolute first-person iron-clad certainty what is possible, and if you don't think it's possible, you're missing something.
It seems to be "people keep saying the models are good"?
That's true. They are.
And the reason people keep saying it is because the frontier of what they do keeps getting pushed back.
Actual, working, useful code completion in the GPT 4 days? Amazing! It could automatically write entire functions for me!
The ability to write whole classes and utility programs in the Claude 3.5 days? Amazing! This is like having a junior programmer!
And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!
But now we are beginning to see that programming in 6 months time might look very different to now because these AI system code very differently to us. That's exactly the point.
So what is it you are arguing against?
I think you said you didn't like that people are saying the same thing, but in this post it seems more complicated?
> And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!
People have been doing this parlor trick with various "substantial" programs [1] since GPT 3. And no, the models aren't better today, unless you're talking about being better at the same kinds of programs.
[1] If I have to see one more half-baked demo of a running game or a flight sim...
It’s a vague statement that I obviously cannot defend in all interpretations, but what I mean is: the performance of models at making non-trivial applications end-to-end, today, is not practically better than it was a few years ago. They’re (probably) better at making toys or one-shotting simple stuff, and they can definitely (sometimes) crank out shitty code for bigger apps that “works”, but they’re just as terrible as ever if you actually understand what quality looks like and care to keep your code from descending into entropy.
I think "substantial" is doing a lot of heavy lifting in the sentence I quoted. For example, I’m not going to argue that aspects of the process haven’t improved, or that Claude 4.5 isn't better than GPT 4 at coding, but I still can’t trust any of the things to work on any modestly complex codebase without close supervision, and that is what I understood the broad argument to be about. It's completely irrelevant to me if they slay the benchmarks or make killer one-shot N-body demos, and it's marginally relevant that they have better context windows or now hallucinate 10% less often (in that they're more useful as tools, which I don't dispute at all), but if you want to claim that they're suddenly super-capable robot engineers that I can throw at any "substantial" problem, you have to bring evidence, because that's a claim that defies my day-to-day experience. They're just constantly so full of shit, and that hasn't changed, at all.
FWIW, this line of argument usually turns into a mott and bailey fallacy, where someone makes an outrageous claim (e.g. "models have recently gained the ability to operate independently as a senior engineer!"), and when challenged on the hyperbole, retreats to a more reasonable position ("Claude 4.5 is clearly better than GPT 3!"), but with the speculative caveat that "we don't know where things will be in N years". I'm not interested in that kind of speculation.
Have you spent much time with Codex 5.1 or 5.2 in OpenAI Codex or a Claude Opus 4".5 in Claude code over the last ~6 weeks?
I think they represent a meaningful step change in what models can build. For me they are the moment we went from building relatively trivial things unassisted to building quite large and complex system that take multiple hours, often still triggered by a single prompt.
- A WebAssembly runtime in Python which I haven't yet published
The above projects all took multiple prompts, but were still mostly built by prompting Claude Code for web on my iPhone in between Christmas family things.
I'm not confident any of these projects would have worked with the coding agents and models we had had four months ago. There is no chance they would've worked with the January 2025 available models.
I’ve used Sonnet 4.5 and Codex 5 and 5.1, but not in their native environment [1].
Setting aside the fact that your examples are mostly “replicate this existing thing in language X” [2], again, I’m not saying that the models haven’t gotten better at crapping out code, or that they’re not useful tools. I use them every day. They're great tools, when someone actually intelligent is using them. I also freely concede that they're better tools than a year ago.
The devil is (as always) in the details: how many prompts did it take? what exactly did you have to prompt for? how closely did you look at the code? how closely did you test the end result? Remember that I can, with some amount of prompting, generate perfectly acceptable code for a complex, real-world app, using only GPT 4. But even the newest models generate absolute bullshit on a fairly regular basis. So telling me that you did something complex with an unspecified amount of additional prompting is fine, but not particularly responsive to the original claim.
[1] Copilot, with a liberal sprinkling of ChatGPT in the web UI. Please don’t engage in “you’re holding it wrong” or "you didn't use the right model" with me - I use enough frontier models on a regular basis to have a good sense of their common failings and happy paths. Also, I am trying to do something other than experiment with models, so if I have to switch environments every day, I’m not doing it. If I have to pay for multiple $200 memberships, I’m not doing it. If they require an exact setup to make them “work”, I am unlikely to do it. Finally, if your entire argument here hinges on a point release of a specific model in the last six weeks…yeah. Not gonna take that seriously, because it's the same exact argument, every six weeks. </caveats>
[2] Nothing really wrong with this -- most programming is an iterative exercise of replicating pre-existing things with minor tweaks -- but we're pretty far into the bailey now, I think. The original argument was that you can one-shot a complex application. Now we're in "I can replicate a large pre-existing thing with repeated hand-holding". Fine, and completely within my own envelope for model performance, but not really the original claim.
I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?
Copilot style autocomplete or chatting with a model directly is an entirely different experience from letting the model spend half an hour writing code, running that code and iterating on the result uninterrupted.
Here's an example where I sent a prompt at 2:38pm and it churned away for 7 minutes (executing 17 bash commands), then I gave it another prompt and it churned for half an hour and shipped 7 commits with 160 passing tests: https://static.simonwillison.net/static/2025/claude-code-mic...
> I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?
edit: I wrote a different response here, then I realized we might be talking about different things.
Are you asking if I let the agents use tools without my prior approval? I do that for a certain subset of tools (e.g. run tests, do requests, run queries, certain shell commands, even use the browser if possible), but I do not let the agents do branch merges, deploys, etc. I find that the best models are just barely good enough to produce a bad first draft of a multi-file feature (e.g. adding an entirely new controller+view to a web app), and I would never ever consider YOLOing their output to production unless I didn't care at all. I try to get to tests passing clean before even looking at the code.
Also, I am happy to let Copilot burn tokens in this manner and will regularly do it for refactors or initial drafts of new features, I'm honestly not sure if the juice is worth the squeeze -- I still typically have to spend substantial time reworking whatever they create, and the revision time required scales with the amount of time they spend spinning. If I had to pay per token, I'd be much more circumspect about this approach.
Yes, that's what I meant. I wasn't sure if you meant classic tab-based autocomplete or Copilot tool-based agent Copilot.
Letting it burn tokens on running tests and refactors (but not letting it merge branches or deploy) is the thing that feels like a huge leap forward to me. We are talking about the same set of capabilities.
For me it is something I can describe in a single casual prompt.
For example I wrote a fully working version of https://tools.nicklothian.com/llm_comparator.html in a single prompt. I refined it and added features with more prompts, but it worked from the start.
Good question. No strict line, and it's always going to be subjective and a little bit silly to categorize, but when I'm debating this argument I'm thinking: a product that does not exist today (obviously many parts of even a novel product will be completely derivative, and that's fine), with multiple views, controllers, and models, and a non-trivial amount of domain-specific business logic. Likely 50k+ lines of code, but obviously that's very hand-wavy and not how I'd differentiate.
Think: SaaS application that solves some domain specific problem in corporate accounting, versus "in-browser speadsheet", or "first-person shooter video game with AI, multi-player support, editable levels, networking and high-resolution 3D graphics" vs "flappy bird clone".
When you're working on a product of this size, you're probably solving problems like the ones cited by simonw multiple times a week, if not daily.
Is there an endpoint for AI improvement? If we can go from functions to classes to substantial programs then it seems like just a few more steps to rewriting whole software products and putting a lot of existing companies out of business.
"AI, I don't like paying for my SAP license, make me a clone with just the features I need".
- Models keep getting better[0]
- Models since GPT 3 are able to replace junior developers
It's true that both of these can be true at the same time but they are still in contention. We're not seeing agents ready to replace mid level engineersand quite frankly I've yet to see a model actually ready to replace juniors. Possibly low end interns but the major utility of interns is to trial run employment. Frankly it still seems like interns and juniors are advancing faster than these models in the type of skills that matter for companies (not to mention that institutional knowledge is quite valuable). But there's interns that started when GPT 3.5 came out that are seniors now.
The problem is we've been promised that these employees would be replaced[1] any day now, yet that's not happening.
People forget, it is harder to advance when you're already skilled. It's not hard to go from non-programmer to a junior level. Hard to go from junior to senior. And even harder to advance to staff. The difficulty level only increases. This is true for most skills and this is where there's a lot of naivity. We can be advancing faster while the actual capabilities begin to crawl forward rather than leap.
[0] Implication is not just at coding test style questions but also in more general coding development.
[1] Which has another problem in the pipeline. If you don't have junior devs and are unable to replace both mid and seniors by the time that a junior would advance to a senior then you have built a bubble. There's a lot of big bets being made that this will happen yet the evidence is not pointing that way.
Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.
Why do you use the word "chasing" to describe this? I don't understand. Maybe you should try it and compare it to earlier models to see what people mean.
> Why do you use the word "chasing" to describe this?
I think you'll get the answer to this if you read my comment and your response to understand why you didn't address mine.
Btw, I have tried it. It's annoying that people think the problem is not trying. It was getting old when GPT 3.5 came out. Let's update the argument...
Looking forward to hearing about how you're using Opus 4.5, from my experience and what I've heard from others, it's been able to overcome many obstacles that previous iterations stumbled on
Please do. I'm trying to help other devs in my company get more out of agentic coding, and I've noticed that not everyone is defaulting to Opus 4.5 or even Codex 5.2, and I'm not always able to give good examples to them for why they should. It would be great to have a blog post to point to…
> Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.
Reality is we went from LLMs as chatbots editing a couple files per request with decent results. To running multiple coding agents in parallel to implement major features based on a spec document and some clarifying questions - in a year.
Even IF llms don't get any better there is a mountain of lemons left to squeeze in their current state.
As it should, normally, because "we'll rewrite it in React later" used to represent weeks if not months of massively disruptive work. I've seen migration projects like that push on for more than a year!
The new normal isn't like that. Rewrite an existing cleanly implemented Vanilla JavaScript project (with tests) in React the kind of rote task you can throw at a coding agent like Claude Code and come back the next morning and expect most (and occasionally all) of the work to be done.
I’m going to add my perspective here as they seem to all be ganging up on you Simon.
He is right. The game has changed. We can now refactor using an agent and have it done by morning. The cost of architectural mistakes is minimal and if it gets out of hand, you refactor and take a nap anyway.
What’s interesting is now it’s about intent. The prompts and specs you write, the documents you keep that outline your intended solution, and you let the agent go. You do research. Agent does code. I’ve seen this at scale.
> The new normal isn't like that. Rewrite an existing cleanly implemented Vanilla JavaScript project (with tests) in React the kind of rote task you can throw at a coding agent like Claude Code and come back the next morning and expect most (and occasionally all) of the work to be done.
... meant that person would do it in a clandestine fashion rather than this be an agreed upon task prior? Is this how you operate?
> And everyone else's work has to be completely put on hold
On a big enough team, getting everyone to a stopping point where they can wait for you to do your big bang refactor to the entire code base- even if it is only a day later- is still really disruptive.
The last time I went through something like this, we did it really carefully, migrating a page at a time from a multi page application to a SPA. Even that required ensuring that whichever page transitioned didn't have other people working on it, let alone the whole code base.
Again, I simply don't buy that you're going to be able to AI your way through such a radical transition on anything other than a trivial application with a small or tiny team.
> meant that person would do it in a clandestine fashion rather than this be an agreed upon task prior? Is this how you operate?
This doesn't mean this at all
In an AI heavy project it's not unusual to have many speculative refactors kicked off and then you come back to see what it is like.
Wonder you can do a Rust SIMD optimized version of that Numpy code you have? Try it! You don't even need to waste review time on it because you have heavy test coverage and can see if it is worth looking at.
If you have 100s of devs working on the project it’s not possible to do a full rewrite in one go. So its to about clandestine but rather that there’s just no way to get it done regardless of how much AI superpowers you bring to bear.
Let's say I'm mildly convinced by your argument. I've read your blog post that was popular on HN a week or so ago and I've made similar little toy programs with AI that scratch a particular niche.
Do you care to make any concrete predictions on when most developers will embrace this new normal as part of their day to day routine? One year? Five?
And how much of this is just another iteration in the wheel of recarnation[0]? Maybe we're looking at a future where we see return to the monoculture library dense supply chain that we use today but the libraries are made by swarms of AI agents instead and the programmer/user is responsible for guiding other AI agents to create business logic?
It's really hard to predict how other developers are going to work, especially given how resistant a lot of developers are to fully exploring the new tools.
I do think there's been a bit of a shift in the last two months, with GPT 5.1 and 5.2 Codex and Opus 4.5.
We have models that can reliably follow complex instructions over multiple hour projects now - that's completely new. Those of us at the cutting edge are still coming to terms with the consequences of this (as illustrated by this Karpathy tweet).
I don't trust my predictions myself, but I think the next few months are going to see some big changes in terms of what mainstream developers understand these tools as being capable of.
"The future is already here, it's just unevenly distributed."
At some companies, most developers already are using it in their day to day. IME, the more senior the developer is, the more likely they are to be heavily using LLMs to write all/most of their code these days. Talking to friends and former coworkers at startups and Big Tech (and my own coworkers, and of course my own experience), this isn't a "someday" thing.
People who work at more conservative companies, the kind that don't already have enterprise Cursor/Anthropic/OpenAI agreements, and are maybe still cautiously evaluating Copilot... maybe not so much.
"React is hundreds of thousands of lines of code".
Most of which are irrelevant to my project. It's easier to maintain a few hundred lines of self written code than to carry the react-kitchen-sink around for all eternity.
Not all UIs converge to a React like requirement. For a lot of use cases React is over-engineering but the profession just lacks the balls to use something simpler, like htmx for example.
Without commenting if parent is right or wrong. (I suspect it is correct)
If its true, the market will soon reward it. Being able to competently write good code cheaper will be rewarded. People don't employ programmers because they care about them, they are employed to produce output. If someone can use llms to produce more output for less $$ they will quickly make the people that don't understand the technology less competitive in the workplace.
That's a trap: it's not obvious for those without experience in both business and engineering on how to estimate or later calculate this $$. The trap is in the cost of changes and fix budget when things will break. And things will break. Often. Also, the requirements will change often, that's normal (our world is not static). So the cost has some tendency to change (guess which direction). The thoughtless copy-paste and rewrite-everything approach is nice, but the cost goes up steep with time soon. Those who don't know it will be trapped dead and lose business.
A major difference is when we have to read and understand it because of a bug. Perhaps the LLM can help us find it! But abstraction provides a mental scaffold
I feel like "abstraction" is overloaded in many conversations.
Personally I love abstraction when it means "generalize these routines to a simple and elegant version". Even if it's harder to understand than a single instance it is worth the investment and gives far better understanding of the code and what it's doing.
But there's also abstraction meaning to make less understandable or more complex and I think LLMs operate this way. It takes a long time to understand code. Not because any single line of code is harder to understand but because they need to be understood in context.
I think part of this is in people misunderstanding elegance. It doesn't mean aesthetically pleasing, but to do something in a simple and efficient way. Yes, write it rough the first round but we should also strive for elegance. It more seems like we are just trying to get the first rough draft and move onto the next thing.
> Making these decisions effectively and responsibly is one of the key characteristics of a senior engineer, which is why it's so interesting that all of those years of intuition are being disrupted.
They're not being disrupted. This is exactly why some people don't trust LLMs to re-invent wheels. It doesn't matter if it can one-shot some code and tests - what matters is that some problems require experience to know what exactly is needed to solve that problem. Libraries enable this experience and knowledge to centralize.
When considering whether inventing something in-house is a good idea vs using a library, "up front dev cost" factors relatively little to me.
Rather, the problem more often I see with junior devs is pulling in a dozen dependencies when writing a single function would have done the job.
Indeed, part of becoming a senior developer is learning why you should avoid left-pad but accept date-fns.
We’re still in the early stages of operationalising LLMs. This is like mobile apps in 2010 or SPA web dev in 2014. People are throwing a lot of stuff at the wall and there’s going be a ton of churn and chaos before we figure out how to use it and it settles down a bit. I used to joke that I didn’t like taking vacations because the entire front end stack will have been chucked out and replaced with something new by the time I get back, but it’s pretty stable now.
Also I find it odd you’d characterise the current LLM progress as somehow being below where we hoped it would be. A few years back, people would have said you were absolutely nuts if you’d have predicted how good these models would become. Very few people (apart from those trying to sell you something) were exclaiming we’d be imminently entering a world where you enter an idea and out comes a complex solution without any further guidance or refining. When the AI can do that, we can just tell it to improve itself in a loop and AGI is just some GPU cycles away. Most people still expect - and hope - that’s a little way off yet.
That doesn’t mean the relative cost of abstracting and inlining hasn’t changed dramatically or that these tools aren’t incredibly useful when you figure out how to hold them.
Or you could just do what most people always do and wait for the trailblazers to either get burnt or figure out what works, and then jump on the bandwagon when it stabilises - but accept that when it does stabilise, you’ll be a few years behind those who have been picking shrapnel out of their hands for the last few years.
> The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.
Hyperbole. It's also very often a "world of pain" with a lot of senior code.
> things will break and when they do, it will incur a world of pain
How much if this is still true and exaggerated in our world environment today where the cost of making things is near 0?
I think “Evolution” would say that the cost of producing is near 0 so the possibility of creating what we want is high. The cost of trying again is low so mistakes and pain aren’t super high. For really high stakes situation (which most situations are not) bring the expert human in the loop until the expert better than that human is AI.
Current dependency hell that is modern development, just how wide the openings are for supply chain attacks and seemingly every other week we get a new RCE.
I'd rather 100 loosely coupled scripts peer reviewed by a half a dozen of LLM agents.
But this doesn't solve dependency hell. If the functionalities were loosely coupled, you can already vendor the code in and manually review them. If they are not, say it is a db, you still have to depend on that?
Or maybe you can use AI to vendor dependencies, review existing dependencies and updates. Never tried that, maybe that is better than the current approach, which is just trusting the upstream most of the time until something breaks.
By vendoring the code in, in this case I mean copying the related code into the project. You don't review everything. It is a bad way to deal with dependencies, but it feels similar to how people are using LLMs now for utility functions.
> "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.
Ignoring for a second they actually already are indeed, it doesn’t matter because the cost of rewriting the mess drops by an order of magnitude with each frontier model release. You won’t need good code because you’ll be throwing everything away all the time.
In everyday life I am a plodding and practical programmer who has learned the hard way that any working code base has numerous “fences” in the Chesterton sense.
I think, though, that for small systems and small parts of systems LLMs do move the repair-replace line in the replace direction, especially if the tests are good.
I'm incentivised to use abstractions that are harder to learn, but execute faster or more safely once compiled. E.g. more Rust, Lean.
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
LLMs benefit from abstractions the same way as we do.
LLMs currently copy our approaches to solving problems and copy all the problems those approaches bring.
Letting LLMs skip all the abstractions is about as likely to succeed as genetic programming is efficient.
For example, writing more vanilla JS instead of React, you're just reinventing the necessary abstractions more verbosely and with a higher risk of duplicate code or mismatching abstractions.
In a recent interview with Bret Weinstein, a former professor of evolutionary biology, he proposed that one property of evolution that makes the story of one species evolving into another more likely is that it's not just random permutations of single genes; it's also permutations to counter variables encoded as telomeres and possibly microsatellites.
Bret compares this to flipping random bits in a program to make it work better vs. tweaking variables randomly in a high-level language. Mutating parameters at a high-level for something that already works is more likely to result in something else that works than mutating parameters at a low level.
So I believe LLMs benefit from high abstractions, like us.
We just need good ones; and good ones for us might not be the same as good ones for LLMs.
> For example, writing more vanilla JS instead of React, you're just reinventing the necessary abstractions more verbosely and with a higher risk of duplicate code or mismatching abstractions.
Right, but I'm also getting pages that load faster and don't require a build step, making them more convenient to hack on. I'm enjoying that trade-off a lot.
Exactly. LLMs are a lot like human developers: they benefit from existing abstractions. Reinventing everything from scratch is a recipe for disaster—especially given an LLM’s limited context window.
I find it interesting for your example you chose Moment.js -- a time library instead of something utilitarian like Lodash. For years I've following Jon Skeet's blog about implementing his time library NodaTime (a port of JodaTime). There are a crazy number of edge cases and many unintuitive things about modeling time within a computer.
If I just wanted the equivalent of Lodash's _.intersection() method, I get it. The requirements are pretty straightforward and I can verify the LLM code & tests myself. One less dependency is great. But with time, I know I don't know enough to verify the LLM's output.
Similar to encryption libraries, it's a common recommendation to leave time-based code to developers who live and breathe those black boxes. I trust the community verify the correctness of those concepts, something I can't do myself with LLM output.
I'd rather have LLMs build on top of proven, battle-tested production libraries than keep writing their own from scratch. You're going to fill up context with all of its re-invented wheels when it already knows how to use common options.
Not to mention that testing things like this is hard. And why waste time (and context and complexity) for humans and LLMs trying to do something hard like state syncing when you can focus on something else?
For smol things like left-pad, sure but the two examples given (moment and react) solve really hard problems. If I were reviewing a PR where someone tried to re-implement time zone handling in JS, that’s not making it through review.
In JS, the DOM and time zones are some of the most messed up foundations you’re building on top of ime. (The DOM is amazing for documents but not designed for web apps.)
I think we really need to be careful about adding dependencies that we’re maintaining ourselves, especially when you factor in employee churn and existing options. Unless it’s the differentiator for the business you’re building, my advice to engineers is to strongly consider other options and have a case for why they don’t fit.
AI can play into the engineering blind spot of building it ourselves because it’s fun. But engineering as a discipline requires restraint.
Whether that's true about React and Moment varies on a case-by-case basis.
If you're building something simple
like a contact form React may not be the right choice. If you're building something like Trello that calculation is different.
LLMs also have encyclopedic knowledge. Several times LLMs have found some huge block of code I wrote and reduced it down to a few lines. The other day they removed several thousand lines of brittle code I wrote previously for some API calls with a well-tested package I didn't know about. Literally thousands down to dozens.
My code is constantly shrinking, becoming better quality, more performant, more best-practice on a daily basis. And I'm learning like crazy. I'm constantly looking up changes it recommends to see why and what the reasons are behind them.
It can be a big damned dummy too, though. Just today it was proposing a massive server-side script to workaround an issue with my app I was deploying, when the actual solution was to just make a simple one-line change to the app. ("You're absolutely right!")
Has anyone tried the experiment that is sort of implied here? I was wondering earlier today, what it would be like to pick a simple app, pick on OS, and just tell an LLM to write that app using only machine code and native ADKs, and skip all intermediate layers?
We seem to have created a large bureaucracy for software development, where telling a computer how to execute an app involves keeping a lot of cogs in a big complicated machine happy. But why use the automation to just roll the cogs? Why not just simplify/streamline? Does an LLM need to worry about using the latest and greatest abstractions? I have to assume this has been tried already...
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
I'm worried that there is a tendency in LLM-generated code to avoid even local abstractions, such as putting common code into separate (local functions), and even use records/structures. You end up with code that is best maintained with an LLM, which is good for the LLM provider and their future revenue. But we humans as reviewers and ultimate long-term maintainers benefit from those minor abstractions.
Yeah, I find myself needing to watch out for that. I'll frequently say "refactor that to reduce duplicated code" - which is generally very safe once the LLM has added test coverage for the new feature.
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
But this is a highly non-trivial problem. How do you even possibly manually verify that the test suite is complete and tests all possible corner cases (of which there are so many because synchronizing state is a hard problem)?
At least React solves this problem in a non-stochastic, deterministic manner. What can be a good reason to replace something like React that works determinstically with LLM-assisted code that is generated stochastically and there's no easy way to manually verify if the implementation or the test suite is correct and complete?
You don't, same as for the "generate momentjs and use it". People now firmly believe they can use an LLM to build custom versions of these libraries and rewrite whole ecosystems out of nowhere because Claude said "here's the code".
I've come to realize fighting this is useless, people will do this, its going to create large fuck ups and there will be heaps of money to be made on the cleanup jobs.
I think the gap between people dealing with JavaScript cruft all day and backend large systems development is creating a massive conversational disconnect… like, this thread is plain-faced and seriously discussing reinventing date handling locally for funsies.
I also think that any company creating a reverse-centaur workforce of blind and dumb half baked devs ritualistically shaking chicken bones at their pay-as-you-go automaton has effectively outsourced their core business to OpenAI/MS while paying for the privilege. And, on the twenty year timeline as service and capital costs create crunches, those mega corps will literally be sitting on whole copies of internal business schematics and critical code of their subservient customers…
They say things, they do other things. Trusting Microsoft not to eat your sector through abusive partner programs and licensing entanglements backed with government capture? Surely the LLMs can explain how that has gone historically and how smart that is going forward.
They’ve done this before in their locked environments and programming languages, anyone that doesn’t think this is going to end the same way is delusional.
I’m starting to think actually knowing how to write code might end up being a superpower with so many people completely lost to the stochastic parrots. I’m already getting inbounds from friends and acquaintances that need “help” with their generated shit, gonna start asking for money for it.
There's going to be lots of fuck ups, but with frontier models improving so much there's also going to be lots of great things made. Horrible, soul crushing technical debt addressed because it was offloaded to models rather than spending a person's thought and sanity on it.
I think overall for engineering this is going to be a net positive.
...is a loaded question, with a complex and nuanced answer. Especially when you continue:
> it's worth paying the React complexity/page-weight tax
All right; then why do we code in React when a smaller alternative, such as Preact, exists, which solves the same problem, but for a much lower page-weight tax?
Why do we code in React when a mechanism to synchronize data with tiny UI fragments through signals exists, as exemplified by Solid?
Why do people use React to code things where data doesn't even change, or changes so little that to sync it with the UI does not present any challenge whatsoever, such as blogs or landing pages?
I don't think the question 'why do we code with React?' has a simple and satisfactory answer anymore. I am sure marketing and educational practices play a large role in it.
My cynical answer is that most web developers who learned their craftsin the last decade learned frontend React-first, and a lot of them genuinely don't have experience working without it.
Which means hiring for a React team is easier. Which means learning React makes you more employable.
> most web developers who learned their craftsin the last decade learned frontend React-first, and a lot of them genuinely don't have experience working without it
That's not cynical, that's the reality.
I do a lot of interviews and mentor juniors, and I can 100% confirm that.
And funny enough, React-only devs was a bigger problem 5 years ago.
Today the problem is developers who can *only* use Next.js. A lot can't use Vite+React or plain React, or whatever.
And about 50% of Ruby developers I interviewed from 2022-2024 were unable to code a FizzBuzz in Ruby without launching a whole Rails project.
My job during the hiring process is to filter them.
But that's me. Other companies might be interested.
I often choose to work on non-cookie-cutter products, so it's better to have developers with more curiosity to ask questions, like yourself asked above.
These people ganging up on you, felt really bad because I support your claim.
Let me help you with a context where LLMs actually shine and is a blessing. I think it is also same with Karpathy who comes from research.
In any research, replicating paper is wildy difficult task. It takes 6-24 months of dedicated work across an entire team to replicate a good research paper.
Now, there is a reason why we want to do it. Sometimes the solution actually lies in the research. Most of research is experimental and garbage code anyway.
For each of us working in research, LLM is blessing because of rapid prototyping it provides.
Then there are research engineers whose role is to apply research to production code. We as research engineers really don't care about the popular library. As long as something does the job, we will just roll with it.
The reason is simple because there is nothing out there that solved the problem.
As we move further from research, the tools we build will find all sort of issues and we improve on them.
Idk about what people think about webdev, but this has been my perspective in SWE in general.
Most of the webdevs here who are coping with the fact that their react skill matters are quite delusional because they have never traversed the stack down to foundation. It doesn't matter how you render the document as long as you render it.
Every abstraction originates from research and some small proof of concept. You might reinvent abstraction, but when the cost of reinventing it is essentially zero then you are stilfing your own learning because you are choosing to exploit vs choosing to explore.
There is a balance and good engineers know it. Perhaps all of the people who ganged up on you never approached their work this way.
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
for simple stuff, sure, React was ALWAYS inefficient. Even Javascript/client-side logic is still overkill a lot of the times except for that pesky "user expectations" thing.
for anything codebase that's long-lived and complex, combinatorics tells us how it'll near-impossible to have good+fast test coverage on all that.
part of the reason people don't roll their own is because being able to assume that the library won't have major bugs leads to an incredible reduction in necessary test service, and generally people have found it a safe-enough assumption.
throwing that out and trying to just cover the necessary stuff instead - because you're also throwing out your ability to quickly recognize risky changes since you aren't familiar with all the code - has a high chance of painting you into messy corners.
"just hire a thousand low-skilled people and force them to write tests" had more problems as a hiring plan then just "people are expensive."
I've come to a similar conclusion. One example is how much easier it is to put an interface on top of sqlite. I've been burned badly with the hidden details of ORM s. ORMs are the sirens call of getting rid of all that boiler plate code when encoding and decoding objects into a db. However this abstraction breaks in many hidden ways. Lazy loading details, in-memory state vs db mismatch, cascading details, etc all have unexpected problems that can be hard to predict. Using an LLM to do the grunt work lets you easily see and reason about all the details. You don't have to guess about what's happening and you can make your own choices.
If you work at a megacorp right now, you know whats happening isn't people deciding to use less libraries. It's developers being measured by their lines of code, and the more AI you use the more lines of code and 'features' you can ship.
However, the quality of this code is fucking terrible, no one is reading what they push deeply, and these models don't have enough 'sense' to make really robust and effective test suites. Even if they did, a comprehensive test suite is not the solution to poorly designed code, it's a band aid -- and an expensive one at scale.
Most likely we will see some disasters happening in the next few years due to this mode of software development, and only then will people understand to use these agents as tools and not replacements.
...Or maybe we'll get AGI and it will fix/maintain the trash going out there today.
I don't trust LLM enough to handle the maintenance of all the abstraction buried in react / similar library. I caught some of the LLMs taking nasty shortcuts (e.g. removing test constraints or validations in order to make the test green). Multiple times. Which completely breaks trust.
And if I have to closely supervise every single change, I don't believe my development process will be any better. If not worse.
Let alone new engineers who join the team and all of a sudden have to deal with a unique solution layer which doesn't exist anywhere else.
The same question might be asked about ASML: if ASML EUV machines are so great, why does ASML sell them to TSMC instead of fabbing chips themselves? The reality is that firms specialize in certain areas, and may lose their comparative advantage when they move outside of their specialty.
I would guess fear of losing market share and valuable data, as well as pressure to appear to be winning the AI race for the companies' own stock price.
i.e competition. If there were only one AI company, they would probably not release anything close to their most capable version to the public. ala Google pre-chatgpt.
I’m not sure that really answers the question? Or perhaps my interpretation of the question is different.
If (say) the code generation technology of Anthropic is so good, why be in the business of selling access to AI systems? Why not instead conquer every other software industry overnight?
Have Claude churn out the best office application suite ever. Have Claude make the best operating system ever. Have Claude make the best photo editing software, music production software, 3D rendering software, DNA analysis software, banking software, etc.
Why be merely the best AI software company when you can be the best at all software everywhere for all time?
Huh, I've been assuming the opposite: better to use React even if you don't need it, because of its prevalence in the training data. Is it not the case that LLMs are better at standard stacks like that than custom JS?
Hard to say for sure. I've been finding that frontier LLMs write very good code when I tell them "vanilla JS, no React" - in that their code matches my personal taste at least - but that's hardly a robust benchmark.
> and it can maintain a test suite that shows everything works correctly
Are you able to efficiently verify that the test suite is testing what it should be testing? (I would not count "manually reviewing all the test code" as efficient if you have a similar amount of test code to actual code.)
Sometimes a change to the code under test means that a (perhaps unavoidably brittle) test needs to be changed. In this case, the LLM should change the test to match the behaviour of the code under test. Other times, a change to the code under test represents a bug that a failing test should catch -- in this case, the LLM should fix the code under test, and leave the test unchanged. How do you have confidence that the LLM chooses the right path in each case?
One of my personal rules for automated test suites is that my tests should fail if one of the libraries I'm using changes in a way that breaks my features.
Makes upgrading dependencies so much less painful!
Of course, but this is largely unmaintainable, shifting the responsibility of correctness check from libraries to users. That's why we modularize/abstract/simplify, in order to minimize the need for actual checks
Our industry wants disruption, speed, delivery! Automatic code generation does that wonderfully.
If we wanted safety, stability, performance, and polish, the impact of LLMs would be more limited. They have a tendency to pile up code on top of code.
I think the new tech is just accelerating an already existing problem. Most tech products are already rotting, take a look at windows or iOS.
I wonder what will it take for a significant turning point in this mentality.
One possible positive outcome of all this could be sending LLMs to clean up oceans of low value tech debt. Let the humans move fast, let the machines straighten out and tidy up.
The ROI of doing this is weak because of how long it takes an expensive human. But if you could clean it up more cheaply, the ROI strengthens considerably- and there’s a lot of it.
It's not something that suddenly changed. "I'll generate some code" is as nondeterministic as "I'll look for a library that does it", "I'll assign John to code this feature", or "I'll outsource this code to a consulting company". Even if you write yourself, you're pretty nondeterministic in your results - you're not going to write exactly the same code to solve a problem, even if you explicitly try.
If I use a library, I know it will do the same thing from the same inputs, every time. If I don't understand something about its behavior, then I can look to the documentation. Some are better about this, some are crap. But a good library will continuing doing what I want years or decades later.
An LLM can't decide between one sentence and the next what to do.
The library is deterministic, but looking for the library isn't. In the same way that generating code is not deterministic, but the generated code normally is.
I...guess? But once you know of a good library for problem X, you don't need to look for it anymore. I guess if you have a bunch of developers and 0 control over what they do, and they're free to drag in additional dependencies willy-nilly, then yes, that part isn't deterministic? But that's a much bigger problem than anything library-related...
Contrary to code generation, all the other examples have one common point which is the main advantage, which is the alignment between your objective and their actions. With a good enough incentive, they may as well be deterministic.
When you order home delivery, you don’t care about by who and how. Only the end result matters. And we’ve ensured that reliability is good enough that failures are accidents, not common occurrence.
Code generation is not reliable enough to have the same quasi deterministic label.
It's not the same, LLM's are qualitatively different due to the stochastic and non-reproducible nature of their output. From the LLM's point of view, non-functional or incorrect code is exactly the same as correct code because it doesn't understand anything that it's generating. When a human does it, you can say they did a bad or good job, but there is a thought process and actual "intelligence" and reasoning that went into the decisions.
I think this insight was really the thing that made me understand the limitations of LLMs a lot better. Some people say when it produces things that are incorrect or fabricated it is "hallucinating", but the truth is that everything it produces is a hallucination, and the fact it's sometimes correct is incidental.
I'm not sure who generates random code without a goal or checking if it works afterwards. Smells like a straw man. Normally you set the rules, you know how to validate if the result works, and you may even generate tests that keep that state. If I got completely random results rather than what I expect, I wouldn't be using that system - but it's correct and helpful almost every time. What you describe is just not how people work with LLMs in practice.
Correct. The thing has no concept of true or false. 0 or 1.
Therefore it cannot necessarily discern between two statements that are practically identical in the eyes of humans. This doesnt make the technology useless but its clearly not some AGI nonsense.
It's wild that management would be willing to accept it.
I think that for some people it is harder to reason about determinism because it is similar to correctness, and correctness can, in many scenarios be something you trade off - for example in relation to scaling and speed you will often trade off correctness.
If you do not think clearly about the difference with determinism and other similar properties like (real-time) correctness which you might be willing to trade off, you might think that trading off determinism is just more of the same.
Note: I'm against trading off determinism, but I am willing to think there might be a reason to trade it off, just I worry that people are not actually thinking through what it is they're trading when they do it.
hmm, OK good point. But programs that are not deterministic would seem to have a bug that needs fixing. And it can't be fixed, but I guess the employees can't be fixed either.
Determinism require formality (enactment of rules) and some kind of omniscience about the system. Both are hard to acquire. I’ve seen people trying hard not to read any kind of manual and failing to reason logically even when given hints about the solution to a problem.
I think those that are most successful at creating maintainable code with AI are those that spend more time upfront limiting the nondeterminism aspect using design and context.
I am still using LLMs just to ask questions and never giving them the keyboard so I haven’t quite experienced this yet. It has not made me a 10x dev but at times it has made me a 2x dev, and that’s quite enough for me.
It’s like jacking off, once in a while won’t hurt and may even be beneficial. But if you do it constantly you’re gonna have a problem.
There has always been a laissez-faire subset of programmers who thrive on living in the debugger, getting occasional dopamine hits every time they remove any footgun they previously placed.
I cannot count the times that I've had essentially this conversation:
"If x happens, then y, and z, it will crash here."
"What are the odds of that happening?"
"If you can even ask that question, the probability that it will occur at a customer site somewhere sometime approaches one."
It's completely crazy. I've had variants on the conversation from hardware designers, too. One time, I was asked to torture a UART, since we had shipped a broken one. (I normally build stuff, but I am your go-to whitebox tester, because I hone in on things that look suspicious rather than shying away from them.) When I was asked the inevitable "Could that really happen in a customer system?" after creating a synthetic scenario where the UART and DMA together failed, my response was:
"I don't know. You have two choices. Either fix it where the test passes, or prove that no customer could ever inadvertently recreate the test conditions."
My dad worked in the auto industry and they came across a defect in an engine control computer where they were able to give it something like 10 million to one odds of triggering.
They then turned the thing on, it ran for several seconds, encountered the error, and crashed.
Oh, that's right, the CPU can do millions of things a second.
Something I keep in the back of my mind when thinking about the odds in programming. You need to do extra leg work to make sure that you're measuring things in a way that's practical.
I've recently had a lot of fun teaching junior devs the basics of defensive programming.
The phrasing that usually make it click for them is: "Yes, this is an unlikely bug, but if this bug where to happen how long would it take you to figure out this is the problem and fix it?"
In most cases these are extremely subtle issues that the juniors immediately realize would be nightmares to debug and could easily eat up days of hair-pulling work while someone non-technical above them waiting for the solution is rapidly losing their patience.
The best senior devs I've worked with over my career all have shared an uncanny knack for seeing a problem months before it impacts production. While they are frequently ignored, in those cases more often then not they get an apology a few months down the line when exactly what they predict would happen, happens.
Delving a bit deeper... I've been wondering if the problem's related to the rise in H1B workers and contractors. These programmers have an extra incentive to avoid pushing back on c-suite/skip level decisions - staying out of in-office politics reduces the risk of deportation. I think companies with a higher % of engineers working with that incentive have a higher risk of losing market share in the long-term.
I’ll answer that with a simple “No”. My H1B colleges are every bit as rigorous and innovative as any engineer. It is in no one’s long term interest to generate shoddy code.
I'm not stating the code is shoddy - I agree the quality's fine. I'm referring to the IC engineer's role in pushing back against unrealistic demands/design decisions that are passed down by the PM's and c-suite teams. Doing this can increase internal tension, but it makes the product and customer experience better in the long run. In my career, I've felt safe pushing back because I don't have to worry about moving if my pushback is poorly received.
This whole things of AI assisted and vibe coding phenomena including the other comments remind me of this very popular post on HN that keep appearing almost every year on HN [1],[2].
[1] Don't Call Yourself A Programmer, And Other Career Advice:
My work is better than it has been for decades. Now I can finally think and experiment instead of wasting my time on coding nitty-gritty detail, impossible to abstract. Last autumn was the game changer, basically Codex and later Opus 4.5; the latter is good with any decent scaffolding.
I have to admit, LLMs do save a lot of typing a d associated syntax errors. If you know what you want and can spot and fix mistakes made by the LLM then they can be pretty useful. I don’t think it’s wise to use them for development if you are not knowledgeable enough in the domain and language to recognize errors or dead ends in the generated code though.
Plumbing seems like a relatively popular AI-proof pivot. If AI really does start taking jobs en masse, then plumbers are going to be plentiful and cheap.
What we really need is a lot more housing. So construction work is a safer pivot. But, construction work is difficult and dangerous and not something everyone can do. Also, society will collapse (apparently) if we ever make housing affordable, so maybe the powers-that-be wont allow an increase in construction work, even if there are plenty of construction workers.
That's similar to what happened in Java enterprise stack: ...wrapper and ...factory classes and all-you-can-eat abstractions that hide implementation and make engineering crazy expensive while not adding much (or anything, in most cases) to product quality. Now the same is happening in work processes with agentic systems and workflows.
Could we all just agree to stop using the term "abstraction". It's meaningless and confusing. It's cover for a multitude of sins, because it really could mean anything at all. Don't lay all the blame on the c-suite; they are what they are, and have their own view. Don't moan about the latest egregious excess of some llm. If it works for you, use it; if it doesn't, don't. But, stop whinging.
> It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
No profession collectively made such a decision. Programming was always very splitted into many, many subcultures, each with their own (mutually incompatible over the whole profession) ideas what makes a good program.
So, I guess rather some programmers inside some part of a Silicon Valley echo chamber in which you also live made such a decision.
> the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
I've usually found complaints about abstraction in programming odd because frankly, all we do is abstraction. It often seems to be used to mean /I/ don't understand, therefore we should do something more complicated and with many more lines of code that's less flexible.
But this usage? I'm fully on board. Too much abstraction is when it's incomprehensible. To who is the next question (my usual complaint is that a junior should not be that level) and I think you're right to point out that the "who" here is everyone.
We're killing a whole side of creativity and elegance while only slightly aiding another side. There's utility to this, but also a cost.
I think what frustrates me most about CS is that as a community we tend to go all in on something. We went all in on VR then crypto, and now AI. We should be trying new things but it more feels like we take these sides as if they're objective and anyone not hopping on the hype train is an idiots or luddite. The way the whole industry jumps to these things just feels more like FOMO than intelligent strategy. Like making a sparkling water company an "AI first" company... its like we love solutions looking for problems
The system is designed to do exactly that. This is called ‘productivity increase’ and is deflationary in large dosages. Deflation sounds good until you understand where it’s coming from.
> It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
The ubiquitous adoption of LLMs for generating code is mostly a sign of bad abstraction or the absence of abstraction, not the excess of abstraction.
And choosing/making the right abstraction is kind of the name of the game, right? So it's not abstraction per se that's a problem.
This is true, though the people that actually push the field forward do know enough about every level of abstraction to get the job done. Making something (very important) horrible just to rush to market can be a pretty big progress blocker.
Jensen is someone I trust to understand the business side and some of those lower technical layers, so I'm not too concerned.
And if you're writing machine code directly, you're still relying on about ten layers of abstraction that the wizards at the chip design firms have built for you.
This agentic arms race by C-suite know-nothings feels less like leverage and more like denial. We took a stochastic text generator, noticed it lies confidently, wipes entire databases and harddrives, and responded by wrapping it in managers, sub-agents, memories, tools, permissions, workflows, and orchestration layers so we don’t have to look directly at the fact that it still doesn’t understand anything.
Now we’re expected to maintain a mental model not just of our system, but of a swarm of half-reliable interns talking to each other in a language that isn’t executable, reproducible, or stable.
Work now feels duller than dishwater, enough to have forced me to career pivot for 2026.