Hacker Newsnew | past | comments | ask | show | jobs | submit | regularfry's commentslogin

> Except they can't. Their costs are not magically lower when you use claude code vs when you use a third-party client.

I don't have a dog in this fight but is this actually true? If you're using Claude Code they can know that whatever client-side model selection they put into it is active. So if they can get away with routing 80% of the requests to Haiku and only route to Opus for the requests that really need it, that does give them a cost model where they can rely on lower costs than if a third-party client just routes to Opus for everything. Even if they aren't doing that sort of thing now, it would be understandable if they wanted to.


It (CC) does have a /models command, you can still decide to route everything to Opus if you just want to burn tokens I guess it's not default so most wouldn't, but still, people willing to go to a third party client are more likely that kind of power user anyway

They still have the total consumption under their control (*bar prompt caching and other specific optimizations) where in the past they even had different quotas per model, it shouldn't cost them more money, just be a worse/different service I guess


> it shouldn't cost them more money

As things are currently, better models mean bigger models that take more storage+RAM+CPU, or just spend more time processing a request. All this translates to higher costs, and may be mitigated by particular configs triggered by knowledge that a given client, providing particular guarantees, is on the other side.


That’s kind of the point. Even if users can choose which model to use (and apparently the default is the largest one), they could still say (For roughly the same cost): your Opus quota is X, your Haiku quota is Y, go ham. We’ll throttle you when you hit the limit.

But they don't want the subscription to be quota'd like that. The API automatically does that though, as different models use different amounts of tokens when generating responses, and the billing is per token. And quite literally is having the user account for the actual costs of usage, which is the thing said users are trying to avoid, on their own terms, and getting upset about when they aren't.

> It (CC) does have a /models command, you can still decide to route everything to Opus if you just want to burn tokens I guess it's not default so most wouldn't

Opus is claude code's default model as of sometime recently (around Opus 4.6?)


That’s not how Claude Code works. It’s not like a web chatbot with a layer that routes based on complexity of request.

You don't control what happens when a request hits their endpoint though.

> Regarding the 4% improvement for human written AGENTS.md: this would be huge indeed if it were a _consistent_ improvement. However, for example on Sonnet 4.5, performance _drops_ by over 2%. Qwen3 benefits most and GPT-5.2 improves by 1-2%.

Ok so that's interesting in itself. Apologies if you go into this in the paper, not had time to read it yet, but does this tell us something about the models themselves? Is there a benchmark lurking here? It feels like this is revealing something about the training, but I'm not sure exactly what.


It could... but as pointed out by other the significance is unclear and per-model results have even less samples than the benchmark average. So: maybe :)

So initially my thought was "why would this be better than existing infill patterns" but my second thought was that the reason Miura-ori patterns are interesting in the first place is because they fold. Not in this application so much, but in general, the way they flex is why they're interesting. The upshot here is that if you embedded that sort of pattern in a closed box, the degrees of freedom would try to transfer the force of a vertical load on the top to a horizontal stress in the outer shell of the base, in both x and y. A bit like a spherical dome.

I'm not sure that it's better than a dome; it might be for cases where you can't predict where on the top surface the load is going to be? I'm also not sure that a sheet of printed infill is sufficiently similar in its physical properties to a sheet of paper/card for this to transfer well, but it would be an interesting experiment to do.


What's particularly interesting here is that one of the success cases (BMW era Mini) was built on a Rover design. They picked it up and ran with it - basically gutted the electrics and power-train to match their existing systems and supply chains - but it was already in flight when BMW came on.

That factory was fascinating to work in, looking back on it I saw a lot of Deming-compatible stuff going on that I wasn't equipped to recognise at the time. There was strong German representation in factory management, lots of interaction with people coming and going from Munich all the time. But the production line staff had a large agency contingent so it didn't have the "job for life" ethos that the Toyota Way would say is essential.


It was the electrics and the power train that were the problem. Oh, that and process.

There'll inevitably be cargo culting driven by MBA curriculums and "they're making a lot of money, let's do what they did" without examining the specifics of the situation to distinguish luck from judgement.

The chief problem I have with Reinertsen (and it's not his fault, at all) is how difficult it is to get people to buy in to the idea that cost of delay exists, let alone buy in to measuring it.

They're claiming 20+tps inference on a macbook with the unsloth quant.

Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.

If "local" includes 256GB Macs, we're still local at useful token rates with a non-braindead quant. I'd expect there to be a smaller version along at some point.

My daily drivers are 44-keys, keyboard.io Atreus layout. It's taken me quite a bit of tuning to avoid excessive chording with what's on each layer while keeping things conceptually related, but it feels like I've had fewer trade-offs to navigate than this. It does surprise me that this particular configuration of keys (5x4+2) isn't more popular - you don't have quite the number of contortions that the 5x3+3 pads force you into, and you don't have the bulk and extra stretch of the next common size up, which seems to be 6x4+2...

I remain convinced that Graal is deep magic.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: