I feel this. I take breaks. But I also get drawn to overworking ( as I'm doing r...

lubujackson · 2025-12-16T21:26:20 1765920380

I think it taxes your brain in two different ways - the mental model of the code is updated in the same way as a PR from a co-worker updates code, but in a minute instead of every now and then. So you need to recalibrate your understanding and think through edge cases to determine if the approach is what you want or if it will support future changes etc. And this happens after every prompt. The older/more experienced you are, the harder it is to NOT DO THIS thinking even if you are intending to "vibe" something, since it is baked into your programming flow.

The other tax is the intermittent downtime when you are waiting for the LLM to finish. In the olden days you might have productive downtime waiting for code to compile or a test suite to run. While this was happening you might review your assumptions or check your changes or realize you forgot an edge case and start working on a patch immediately.

When an LLM is running, you can't do this. Your changes are being done on your behalf. You don't know how long the LLM will take, or how you might rephrase your prompt if it does the wrong thing until you see and review the output. At best, you can context switch to some other problem but then 30 seconds later you come back into "review mode" and have to think architecturally about the changes made then "prompt mode" to determine how to proceed.

When you are doing basic stuff all of this is ok, but when you are trying to structure a large project or deal with multiple competing concerns you quickly overwhelm your ability to think clearly because you are thinking deeply about things while getting interrupted by completed LLM tasks or context switching.

CPLX · 2025-12-16T19:26:52 1765913212

My least favorite part is where it runs into some stupid problem and then tries to go around it.

Like when I'm asking it to run a bunch of tests against the UI using a browser tool, and something doesn't work. Then it goes and just writes code to update the database instead of using the user element.

My other thing that makes me insane is when I tell it what to do, and it says, "But wait, let me do something else instead."

colechristensen · 2025-12-16T19:12:45 1765912365

Build tools to keep it in check.

vidarh · 2025-12-16T19:23:12 1765912992

Really, this. You still need to check its work, but it is also pretty good at checking its work if told to look at specific things.

Make it stop. Tell it to review whether the code is cohesive. Tell it to review it for security issues. Tell it to review it for common problems you've seen in just your codebase.

Tell it to write a todo list for everything it finds, and tell it fix it.

And only review the code once it's worked through a checklist of its own reviews.

We wouldn't waste time reviewing a first draft from another developer if they hadn't bothered looking over it and test it properly, so why would we do that for an AI agent that is far cheaper.

lelanthran · 2025-12-17T10:51:54 1765968714

I wouldn't mind see a collection of objectives and the emitted output. My experience with LLM output is that they are very often over-engineered for no good reason, which is taxing on me to review.

I want to see this code written to some objective, to compare with what I would have written to the same objective. What I've seen so far are specs so detailed that very little is left to the discretion of the LLM.

What I want to see are those where the LLM is asked for something, and provided it because I am curious to compare it to my proposed solution.

(This sounds like a great idea for a site that shows users the user-submitted task, and only after they submit their attempt does it show them the LLM's attempt. Someone please vibe code this up, TIA)

prmph · 2025-12-16T21:08:55 1765919335

So why can't the deterministic part of the agent program embed in all these checks?

colechristensen · 2025-12-16T23:31:43 1765927903

It absolutely can, I'm building things to do this for me. Claude Code has hooks that are supposed to trigger upon certain states and so far they don't trigger reliably enough to be useful. What we need are the primitives to build code based development cycles where each step is executed by a model but the flow is dictated by code. Everything today relies too heavily on prompt engineering and with long context windows instruction following goes lax. I ask my model "What did you do wrong?" and it comes back clearly with "I didn't follow instructions" and then gives clear and detailed correct reasons about how it didn't follow instructions... but that's not supremely helpful because it still doesn't follow instructions afterwards.

vidarh · 2025-12-17T10:02:09 1765965729

It increasingly is. E.g. if you use Claude Code, you'll notice it "likes" to produce todo lists that rendered specially via the TodoWrite tool that's built in.

But it's also a balance of avoiding being over-prescriptive in tools that needs to support very different workflows, and it's easy to add more specific checks via plugins.

We're bound to see more packaged up workflows over time, but the tooling here is still in very early stages.

colechristensen · 2025-12-16T19:29:55 1765913395

Tell it to grade its work in various categories and that you'll only accept B+ or greater work. Focusing on how good it's doing is an important distinction.

habinero · 2025-12-16T19:48:03 1765914483

It's very funny that I can't tell if this is sarcasm or not. "Just tell it to do better."

colechristensen · 2025-12-16T19:51:09 1765914669

Oh I'm not at all joking. It's better at evaluating quality than producing it blindly. Tell it to grade it's work and it can tell you most of the stuff it did wrong. Tell it to grade it's work again. Keep going through the cycle and you'll get significantly better code.

The thinking should probably include this kind of introspection (give me a million dollars for training and I'll write a paper) but if it doesn't you can just prompt it to.

Izkata · 2025-12-16T21:29:21 1765920561

An experiment on that from a year ago: https://news.ycombinator.com/item?id=42584400

vidarh · 2025-12-17T10:12:17 1765966337

Think of it as a "you should - and is allowed to - spend more time on this" command, because that is pretty much what it is. The model only gets so much "thinking time" to produce the initial output. By asking it to iterate you're giving it more time to think and iterate.