More

otabdeveloper4 · 2026-02-21T13:31:37 1771680697

> We antropomorphize these systems too much.

They're sold as AGI by the cloud providers and the whole stock market scam will collapse if normies are allowed to peek behind the curtain.

alansaber · 2026-02-21T14:44:50 1771685090

The stock market being built on conjecture? Surely not sir.

otabdeveloper4 · 2026-02-21T13:28:45 1771680525

> We have models that are doing better than humans at IMO.

Not really. From my brief experience they can guess the final answer but the intermediate justifications and proofs are complete hallucinated bullshit.

(Possibly because the final answer is usually some sort of neat and beatiful answer and human evaluators don't care about the final answer anyways, in any olympiad you're graded on the soundness of your reasoning.)

simianwords · 2026-02-21T13:39:04 1771681144

what's the best way to falsify it?

tveita · 2026-02-21T15:42:59 1771688579

You could start by reading research on the topic instead of disregarding expert opinion based on your own gut feeling

E.g. https://www.anthropic.com/research/tracing-thoughts-language...

simianwords · 2026-02-21T16:26:53 1771691213

It’s specific on Claude.

otabdeveloper4 · 2026-02-21T06:29:28 1771655368

You're making too much sense for a computer security specialist.

otabdeveloper4 · 2026-02-21T05:47:41 1771652861

Do you really need political commentary in your family photos?

(Apparently the answer is "yes", but the commentary must be of the partisan approved kind.)

otabdeveloper4 · 2026-02-20T20:14:52 1771618492

otabdeveloper4 · 2026-02-20T20:03:48 1771617828

Make it for Qwen 2.5 and I'd buy it.

You don't actually need "frontier models" for Real Work (c).

(Summarization, classification and the rest of the usual NLP suspects.)

SkyPuncher · 2026-02-21T02:49:16 1771642156

I completely agree. So many things can benefit from having "smart classifiers".

Like, give me semantic search that can detect the difference between SSL and TLS without needing to put a full LLM in the loop.

otabdeveloper4 · 2026-02-20T19:57:26 1771617446

This doesn't work. The model outputs the most probable tokens. Running it again and asking for less probable tokens just results in the same but with more errors.

therealdrag0 · 2026-02-20T22:11:30 1771625490

Do you not have experience with agents solving problems? They already successfully do this. They try different things until they get a solution.

otabdeveloper4 · 2026-02-20T19:47:11 1771616831

> they'd fail on any novel problem not in their training data

Yes, and that's exactly what they do.

No, none of the problems you gave to the LLM while toying around with them are in any way novel.

adamtaylor_13 · 2026-02-20T21:01:42 1771621302

None of my codebases are in their training data, yet they routinely contribute to them in meaningful ways. They write code that I'm happy with that improves the codebases I work in.

Do you not consider that novel problem solving?

otabdeveloper4 · 2026-02-20T19:27:26 1771615646

The Chinese one, obviously.

otabdeveloper4 · 2026-02-20T15:35:08 1771601708

> You can absolutely send ChatGPT to look for a cheap flight and it will do pretty well.

Sure, once they figure out how to count to three.