More

reissbaker · 2025-12-30T23:23:12 1767136992

The GPT-5 series is a new model, based on the o1/o3 series. It's very much inaccurate to say that it's a routing system and prompt chain built on top of 4o. 4o was not a reasoning model and reasoning prompts are very weak compared to actual RLVR training.

No one knows whether the base model has changed, but 4o was not a base model, and neither is 5.x. Although I would be kind of surprised if the base model hadn't also changed, FWIW: they've significantly advanced their synthetic data generation pipeline (as made obvious via their gpt-oss-120b release, which allegedly was entirely generated from their synthetic data pipelines), which is a little silly if they're not using it to augment pretraining/midtraining for the models they actually make money from. But either way, 5.x isn't just a prompt chain and routing on top of 4o.

doug_durham · 2025-12-31T00:09:52 1767139792

Prior to 5.2 you couldn’t expect to get good answers to questions prior to March 2024. It was arguing with me that Bruno Mars did not have two hit songs in the last year. It’s clear that in 2025 OpenAI used the old 4.0 base model and tried to supercharge it using RLVR. That had very mixed results.

brokencode · 2025-12-31T01:26:25 1767144385

That just means their pretraining data set was older. You can train as many models as you want on the same data.

I’m sure all these AI labs have extensive data gathering, cleanup, and validation processes for new data they train the model on.

Or at least I hope they don’t just download the current state of the web on the day they need to start training the new model and cross their fingers.

reissbaker · 2025-12-30T07:48:44 1767080924

The U.S. already standardized on a charging port: Tesla's. https://en.wikipedia.org/wiki/North_American_Charging_Standa...

gwbas1c · 2025-12-30T14:32:03 1767105123

There is no legal mandate for NACS.

Cars are still sold with J1772/CCS ports, there are still CCS chargers being deployed, there are still J1772 home chargers being sold, almost every level 2 charger is J1772, and my NACS EV came with two dongles.

(FWIW, the new Leaf has a NACS port that's only used for level 3 charging, and separate J1772 port for level 1/2 charging.)

If there was a legal mandate for a changeover, it would be a very different story.

---

We pretty much need to force NACS: Force all public chargers (level 2 and 3) to be NACS, force all cars sold to be NACS, and make it super-easy for people with older cars to get dongles.

reissbaker · 2025-12-30T18:55:34 1767120934

AFAIK every major car manufacturer has announced they're switching to NACS for the American market (or has already switched). I think you're underselling how standard it is. And it's already easy to get dongles for old cars! You can get them on Amazon with two day shipping.

gwbas1c · 2025-12-30T19:04:57 1767121497

The manufacturers all want you to use their dongle. It's not CYA, either. A lot of the Amazon ones aren't safe.

> I think you're underselling how standard it is.

It's about availability:

There's still way more CCS / J1772 than NACS when I use public chargers, or when I look to purchase home chargers. The dealer that I bought my Ioniq 9 had a CCS charger, and the other dealer that I took it to for service had a CCS charger. When I park it near work, it's a J1772. (I wouldn't have bought the Ionic 9 if it was CCS/J1772.)

Searching Google for "What percentage of EVs for sale in the US are NACS" says:

> Transition Period for New Sales: While nearly all major automakers have committed to the NACS standard, many 2025 model year vehicles are still a mix of CCS ports with available NACS adapters, or new models coming with a native NACS port.

> 2026 Model Year: Virtually all new models from every major automaker are expected to come standard with the NACS port

Searching Google for "What percentage of EV chargers in the US are NACS" says:

> As of late 2025, NACS (Tesla's standard, now SAE J3400) dominates in available ports, especially DC fast charging, due to Tesla's massive Supercharger network (over 57% of ports) and rapid adoption by other automakers, with NACS already representing a significant portion of all installed ports, though CCS1 still sees new deployments, creating a dynamic transition where NACS is the majority in Tesla vehicles and rapidly growing across the infrastructure.

---

What distorts the issue is that so many EVs are Teslas, and that so many chargers are Supercharger. Once you exclude Tesla / Supercharger from the comparison, there's still too much CCS/J1772.

reissbaker · 2025-12-31T08:21:31 1767169291

The fact that virtually all new models in 2026 will have NACS tells you we don't need to regulate in 2026 that all new cars must be built with NACS. That's what's happening anyway.

Once you exclude Supercharger

Why would you exclude Superchargers from the comparison of American charging networks? Most V3/V4 Superchargers support charging non-Tesla NACS cars (or non-NACS cars with a dongle), and they're much more reliable than the non-Tesla chargers e.g. EVGo. The reason NACS took off is because the Supercharger network is so good, even for non-Tesla cars.

reissbaker · 2025-12-30T07:46:00 1767080760

One of the greatest mass murderers in history...? I uh, am morbidly curious to hear your thought process here.

arghandugh · 2025-12-30T14:34:54 1767105294

If you are unfamiliar with the shuttering of USAID this year, you “uh” be in an information bubble that is not serving you.

Search terms that will help you on your journey include “DOGE”, “Kenya”, and “cholera”.

reissbaker · 2025-12-30T18:51:56 1767120716

...Curiosity satisfied, I suppose. While I disagree with some of the USAID cuts, I don't think that "not giving charity" is the same thing as "mass murder."

arghandugh · 2025-12-30T20:01:03 1767124863

Well, that’s between you and your worth as a human being I suppose. I do thank you for doing the legwork.

reissbaker · 2025-12-26T21:34:33 1766784873

According to Wikipedia, the Yunnan mushrooms indeed have their hallucinogens broken down after cooking: https://en.wikipedia.org/wiki/Hallucinogenic_bolete_mushroom

Good guess!

Although, the local hospital records imply that hallucinations can last for days or even months, so uh, probably not a great idea to go looking for them...

contingencies · 2025-12-27T00:32:21 1766795541

According to a voluminous illustrated tome I acquired during my extended stay, Yunnan has at least seven species of native psilocybe. Like nearby areas along the Himalayas, cannabis and opium are endemic and widely utilized in traditional cultures of the area. Heroin processed in Myanmar became a problem in rural Yunnan the early 2000s and present-era government shut it down with a heavy-handed campaign around 15 years ago. These days it's probably trans-shipped more than locally consumed.

temp0826 · 2025-12-27T00:36:16 1766795776

My guess would be there is probably some contamination with something ergot-like going on. Long-lasting but maybe hard to detect because such a small amount is needed for effect that it's easy to miss.

reissbaker · 2025-12-25T20:02:51 1766692971

Sadly, it's worse. We don't have one experiment that works in mice: we have dozens, if not hundreds. We've cured "Alzheimer's in mice" many times over. The treatments never work in humans, because it's not the same disease. We don't know the root cause of the human disease and so we can't model it accurately in mice.

esperent · 2025-12-26T03:25:02 1766719502

> We don't know the root cause of the human disease

It's increasingly likely that there is no "root cause" to find in humans, but rather, that Alzheimer's is what happens when there's enough external stressors acting on the brain.

I've seen an analogy of a leaky roof being used: the leaks are things like age, stress, heavy metals, mold, bad sleep, bad diet. Genetics defines the original building materials (resilience) of the roof. You can put buckets under a certain number of leaks but if there are too many your ability to repair gets overwhelmed and the result is diseases like dementia.

I think something similar applies to other diseases of aging like heart disease, arthritis, osteoporosis, diabetes, perhaps even cancer.

The downside of this is that's it's hard to imagine a miracle drug being the solution. But the upside is that a combination therapy that identifies the "leaks" and works on reducing or eliminating them will likely be effective against a wide range of age related diseases.

The therapy will likely consist of drugs and supplements in combination with lifestyle changes.

everdrive · 2025-12-25T20:42:44 1766695364

I totally get that people are not mice, however animals studies have been useful for all sorts of diseases. Are they really uniquely bad for Alzheimer's?

greygoo222 · 2025-12-25T21:46:02 1766699162

To put it simply, mice don't get Alzheimer's. We're not studying mice with Alzheimer's, we're studying mice with an mutation chosen for resembling Alzheimer's. But we don't know whether this model replicates the actual mechanisms of the disease, or if it's superficial.

everdrive · 2025-12-26T11:51:26 1766749886

Thanks for the explanation, this really clears up the concerns here. It's easy to imagine scientists attempting to model in in mice and making real progress, but it's also easy to imagine us misunderstanding the real disease well enough such that what we've modeled in mice does not produce real results.

adinb · 2025-12-25T21:44:29 1766699069

No. I believe the problem is with our artificial models of Alzheimer’s in mice.

reissbaker · 2025-12-24T22:22:11 1766614931

They're not eliminating a competitor, they're (effectively) acquiring a competitor. Nvidia's GPUs are great for training, and not bad for inference, but the custom chips are better for inference and Nvidia's worried about losing customers. Nvidia will no doubt sell custom Groq-like chips for inference now.

blitzar · 2025-12-25T08:56:56 1766653016

> Nvidia will no doubt sell custom Groq-like chips for inference now.

For $0bn they could have sold an Nvidia-like chip for inference.

reissbaker · 2025-12-23T17:38:49 1766511529

This generally isn't true. Cloud vendors have to make back the cost of electricity and the cost of the GPUs. If you already bought the Mac for other purposes, also using it for LLM generation means your marginal cost is just the electricity.

Also, vendors need to make a profit! So tack a little extra on as well.

However, you're right that it will be much slower. Even just an 8xH100 can do 100+ tps for GLM-4.7 at FP8; no Mac can get anywhere close to that decode speed. And for long prompts (which are compute constrained) the difference will be even more stark.

foobar10000 · 2025-12-24T04:22:17 1766550137

A question on the 100+ tps - is this for short prompts? For large contexts that generate a chunk of tokens at context sizes at 120k+, I was seeing 30-50 - and that's with 95% KV cache hit rate. Am wondering if I'm simply doing something wrong here...

reissbaker · 2025-12-29T11:09:09 1767006549

Depends on how well the speculator predicts your prompts, assuming you're using speculative decoding — weird prompts are slower, but e.g. TypeScript code diffs should be very fast. For SGLang, you also want to use a larger chunked prefill size and larger max batch sizes for CUDA graphs than the defaults IME.

reissbaker · 2025-12-22T20:53:44 1766436824

No, it's not Harmony; Z.ai has their own format, which they modified slightly for this release (by removing the required newlines from their previous format). You can see their tool call parsing code here: https://github.com/sgl-project/sglang/blob/34013d9d5a591e3c0...

embedding-shape · 2025-12-23T09:07:46 1766480866

Man, really? Why, just why? If it's similar, why not just the same? It's like they're purposefully adding more work for the ecosystem to support their special model instead of just trying to add more value to the ecosystem.

reissbaker · 2025-12-23T17:21:50 1766510510

The parser is a small part of running an LLM, and Zai's format is superior to Harmony: it avoids having the model escape JSON in most cases by using XML, so e.g. long code edits are more in-domain compared to pretraining data (where code is typically not nested in JSON and isn't JSON-escaped). FWIW almost everyone has their own format.

Also, Harmony is a mess. The common API specs adopted by the open-source community don't have developer roles, so including one is just bloat for the Responses API no one outside of OpenAI adopted. And why are there two types of hidden CoT reasoning? Harmony tool definition syntax invents a novel programming language that the model has never seen in training, so you need even more post-training to get it to work (Zai just uses JSON Schema). Etc etc. It's just bad.

Re: removing newlines from their old format, it's slightly annoying, but it does give a slight speed boost, since it removes one token per call and one token per argument. Not a huge difference, but not nothing, especially with parallel tool calls.

embedding-shape · 2025-12-23T18:50:37 1766515837

Sometimes worse is better, I don't really care what the specific format is, just that providers/model releasers would use more of the same, because compatibility sucks when everyone has their very own format. Conveniently for them, it gets harder to compare models when everyone has different formats too.

reissbaker · 2025-12-22T20:51:08 1766436668

s/Sonnet 3.5/Sonnet 4.5

The model output also IMO look significantly more beautiful than GLM-4.6; no doubt in part helped by ample distillation data from the closed-source models. Still, not complaining, I'd much prefer a cheap and open-source model vs. a more-expensive closed-source one.

reissbaker · 2025-12-22T20:43:40 1766436220

I don't mind if they're distilling frontier models to make them cheaper, and open-sourcing the weights!

Imustaskforhelp · 2025-12-22T21:41:45 1766439705

Same, although gemini 3 flash already gives a run for the cheaper aspect but a part of me really wants to get open source too because that way if I really want to some day, I can have privacy or get my own hardware to run it

I genuinely hope that gemini 3 flash gets open sourced but I feel like that can actually crash the AI bubble if something like this happens because I genuinely feel like although there are still some issues of vibing with the overall model itself, I find it very competent overall and fast and I genuinely feel like at this point, there might be some placebo effects too but in reality, the model feels really solid.

Like all of western countries (mostly) wouldn't really have a point to compete or incentives if someone open sources the model because then the competition would rather be on providers/ their speeds (like how groq,cerebras have an insane speed)

I had heard that google would allow institutions like universities to self host gemini models or similar so there are chances as to what if the AI bubble actually pops up if gemini models or top tier models accidentally get leaked or similar but I genuinely doubt of it as happening and there are many other ways that the AI bubble will pop.

scotty79 · 2025-12-23T12:45:01 1766493901

Models being open weights lets infrastructure providers compete in delivering models as service, fastests and cheapest.

At some point companies should be forced to release the weights after a reasonable time passed since they sold the service for the first time. Maybe after 3 years or so.

It would be great for competition and security research.