I would expect older models make you feel this way.
* Agents not trying to do the impossible (or not being an "over eager people pleaser" as it has been described) has significantly improved over the past few months. No wonder the older models fail.
I find myself using VS Code for "things like this" (its visual extension ecosystem).
I've grown attached to the git diff view, so I use it for reviewing PRs mostly (especially larger ones as github UI has been struggling with them as of late).
The rest of my code is written in Vim or by Claude.
I spent some time last night "over iterating" on a plan to do some refactoring in a large codebase.
I created the original plan with a very specific ask - create an abstraction to remove some tight coupling. Small problem that had a big surface area. The planning/brainstorming was great and I like the plan we came up with.
I then tried to use a prompt like OP's to improve it (as I said, large surface area so I wanted to review it) - "Please review PLAN_DOC.md - is it a comprehensive plan for this project?". I'd run it -> get feedback -> give it back to Claude to improve the plan.
I (naively perhaps) expected this process to converge to a "perfect plan". At this point I think of it more like a probability tree where there's a chance of improving the plan, but a non-zero chance of getting off the rails. And once you go off the rails, you only veer further and further from the truth.
There are certainly problems where "throwing compute" at it and continuing to iterate with an LLM will work great. I would expect those to have firm success criteria. Providing definitions of quality would significantly improve the output here as well (or decrease the probability of going off the rails I suppose). Otherwise Claude will confuse quality like we see here.
Shout out OP for sharing their work and moving us forward.
I think I end up doing that with plans inadvertently too. Oftentimes I'll iterate on a plan too many times, and only recognize that it's too far gone and needs a restart with more direction after sinking in 15 minutes into it.
I enjoyed the read. Felt like Russell just thought about this question for a while, and shared his thoughts. It was very practical and enjoyable.
Disclaimer: I recall some "wow, we don't talk like that anymore" moments. And I didn't enjoy the hyperbole of the cover quotes. But the content of the book debunks those.
So this being from github.github.io implies it's published by the "github" account on github.
reply