Hacker Newsnew | past | comments | ask | show | jobs | submit | otabdeveloper4's commentslogin

> We antropomorphize these systems too much.

They're sold as AGI by the cloud providers and the whole stock market scam will collapse if normies are allowed to peek behind the curtain.


The stock market being built on conjecture? Surely not sir.

> We have models that are doing better than humans at IMO.

Not really. From my brief experience they can guess the final answer but the intermediate justifications and proofs are complete hallucinated bullshit.

(Possibly because the final answer is usually some sort of neat and beatiful answer and human evaluators don't care about the final answer anyways, in any olympiad you're graded on the soundness of your reasoning.)


what's the best way to falsify it?

You could start by reading research on the topic instead of disregarding expert opinion based on your own gut feeling

E.g. https://www.anthropic.com/research/tracing-thoughts-language...


It’s specific on Claude.

You're making too much sense for a computer security specialist.

Do you really need political commentary in your family photos?

(Apparently the answer is "yes", but the commentary must be of the partisan approved kind.)



Make it for Qwen 2.5 and I'd buy it.

You don't actually need "frontier models" for Real Work (c).

(Summarization, classification and the rest of the usual NLP suspects.)


I completely agree. So many things can benefit from having "smart classifiers".

Like, give me semantic search that can detect the difference between SSL and TLS without needing to put a full LLM in the loop.


This doesn't work. The model outputs the most probable tokens. Running it again and asking for less probable tokens just results in the same but with more errors.

Do you not have experience with agents solving problems? They already successfully do this. They try different things until they get a solution.

> they'd fail on any novel problem not in their training data

Yes, and that's exactly what they do.

No, none of the problems you gave to the LLM while toying around with them are in any way novel.


None of my codebases are in their training data, yet they routinely contribute to them in meaningful ways. They write code that I'm happy with that improves the codebases I work in.

Do you not consider that novel problem solving?


The Chinese one, obviously.

> You can absolutely send ChatGPT to look for a cheap flight and it will do pretty well.

Sure, once they figure out how to count to three.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: