More

enraged_camel · 2026-02-14T05:47:36 1771048056

>> The thing I keep seeing firsthand is that automation doesn't eliminate the job - it eliminates the boring part of the job, and then the job description shifts.

No, not necessarily. There are different kinds of automation.

Earlier in my career I sold and implemented enterprise automation solutions for large clients. Think document scanning, intelligent data extraction and indexing and automatic routing. The C-level buyers overwhelmingly had one goal: to reduce headcount. And that was almost always the result. Retraining redundant staff for other roles was rare. It was only done in contexts where retaining accumulated institutional knowledge was important and worth the expense.

Here's the thing though: to overcome objections from those staff, whom we had to interview to understand the processes we were automating, we told them your story: you aren't being replaced, you're being repurposed for higher-level work. Wouldn't it be nice if the computer did the boring and tedious parts of your job so that you can focus on more important things? Most of them were convinced. Some, particularly those who had been around the block, weren't.

Ultimately, technologies like AI will have the the same impact. They weren't quite there yet, but I think it's just a matter of time.

matwood · 2026-02-14T07:05:58 1771052758

> The C-level buyers overwhelmingly had one goal: to reduce headcount.

For many businesses this is the only way to significantly reduce costs.

enraged_camel · 2026-02-13T23:41:11 1771026071

Around 250k here. The AI does an excellent job finding its way around, fixing complex bugs (and doing it correctly), doing intensive refactors and implementing new features using existing patterns.

enraged_camel · 2026-02-13T17:59:36 1771005576

Except this thing routinely ignores my AGENTS.md instructions. Very unreliable.

enraged_camel · 2026-02-13T07:25:22 1770967522

>> I would certainly take a careful person over the likes of yegge who seems to be neither pragmatic, nor an engineer.

What utter nonsense. Yegge has been a programmer for longer than some people on this board have been alive, has worked on a lot of interesting and massively challenging projects and generously shared what he has learned with the community. Questioning his engineering chops is both laughable and absurd.

swordsith · 2026-02-13T09:50:18 1770976218

the buck on engineer status in my opinion stops when someone becomes a crypto scammer.

enraged_camel · 2026-02-12T19:49:24 1770925764

Is there a list of these for each model, that you've catalogued somewhere?

simonw · 2026-02-13T12:22:27 1770985347

At the moment that's mostly my tag page here but I really need to formalize it: https://simonwillison.net/tags/pelican-riding-a-bicycle/

enraged_camel · 2026-02-11T16:34:27 1770827667

enraged_camel · 2026-02-11T10:32:43 1770805963

>> 1. Copy-pasting existing working code with small variations. If the intended variation is bigger then it fails to bring productivity gains, because it's almost universally wrong.

This does not match my experience. At all. I can throw extremely large and complex things at it and it nails them with very high accuracy and precision in most cases.

Here's an example: when Opus 4.5 came out I used it extensively to migrate our database and codebase from a one-Postgres-schema-per-tenant architecture to a single schema architecture. We are talking about eight years worth of database operations over about two dozen interconnected and complex domains. The task spanned migrating data out of 150 database tables for each tenant schema, then validating the integrity at the destination tables, plus refactoring the entire backend codebase (about 250k lines of code), plus all of the test suite. On top of that, there were also API changes that necessitated lots of tweaks to the frontend.

This is a project that would have taken me 4-6 months easily and the extreme tediousness of it would probably have burned me out. With Opus 4.5 I got it done in a couple of weeks, mostly nights and weekends. Over many phases and iterations, it caught, debugged and fixed its own bugs related to the migration and data validation logic that it wrote, all of which I reviewed carefully. We did extensive user testing afterwards and found only one issue, and that was actually a typo that I had made while tweaking something in the API client after Opus was done. No bugs after go-live.

So yeah, when I hear people say things like "it can only handle copy paste with small variations, otherwise it's universally wrong" I'm always flabbergasted.

exfalso · 2026-02-11T13:27:03 1770816423

Interesting. I've had it fail on much simpler tasks.

Example: was writing a flatbuffers routine which translated a simple type schema to fbs reflection schema. I was thinking well this is quite simple, surely Opus would have no trouble with it.

Output looked reasonable, compiled.. and was completely wrong. It seemed to just output random but reasonable looking indices and offsets. It also inserted in one part of the code a literal TODO saying "someone who understands fbs reflection should write this". Had to write it from scratch.

Another example: was writing a fuzzer for testing a certain computation. In this case, there was existing code to look at (working fuzzers for slighly different use cases), but the main logic had to be somewhat different. Opus managed to do the copy paste and then messed up the only part where it had to be a bit more creative. Again, showing the limitation of where it starts breaking. Overall I actually considered this a success, because I didn't have to deal with the "boring" bit.

Another example: colleague was using Claude to write a feature that output some error information from an otherwise completely encrypted computation. Claude proceeded to insert a global backdoor into the encryption, only caught in review. The inserted comments even explained the backdoor.

I would describe a success story if there was one. But aside from throwing together simple react frontends and SQL queries (highly copy-pasteable recurring patterns in the training set) I had literally zero success. There is an invisible ceiling.

enraged_camel · 2026-02-11T09:40:19 1770802819

Yeah. I started integrating AI into my daily workflows December 2024. I would say AI didn't become genuinely useful until around September 2025, when Sonnet 4.5 came out. The Opus 4.5 release in November was the real event horizon.

enraged_camel · 2026-02-11T07:35:40 1770795340

I mean, markets and industries change as well. Companies that have product market fit today can find themselves having to pivot as the sands shift under them.

enraged_camel · 2026-02-09T21:14:42 1770671682

More expensive than Sonnet 4.5, but no comparison benchmarks. I think I’ll pass.

wmeredith · 2026-02-10T04:57:25 1770699445

I had the same thought. Cursor 1.0 was cheap and blazingly fast. 1.5 seems to keep the speed, but who knows how much better it is, and it's no longer cheap.

leerob · 2026-02-09T21:29:07 1770672547

We've found it to be a strong mix of speed and intelligence. It scores higher than Sonnet 4.5 on Terminal-Bench 2, maybe we will post more on this later.

enraged_camel · 2026-02-09T22:37:49 1770676669

Yeah, please do. Because when the AI labs you are competing with are posting extensive benchmarks and you just say "well we used our own internal benchmark" it is a bit sus, especially given the fact that the price has tripled.

fishpham · 2026-02-09T21:44:44 1770673484

You should! This blog post doesn't really give any reason to use it besides "it's better on Cursor's internal benchmark". A full model card would be great.

rubslopes · 2026-02-09T23:31:49 1770679909

The way benchmarks for Composer have been presented since v1 feels unusually cautious. To users, that reads as “the model isn’t very good”.