This is very correct, the only thing I as an engineer care about is that we rigorously optimize our spend to get the best possible CAC (Customer Acquisition Cost)
It’s an incredibly hard problem and has many variables, that’s why the platforms charge what they do, they allow you to work with these variables in a somewhat approachable way.
I expect Xbox is nothing but a subscription service in five years, no studios and no consoles. The acquisition feels like someone was bored and wanted to spend some money, once they had it they immediately lost interest and now it’s all just fading into obscurity.
I wouldn’t be surprised if any form of screen shot is fake (as in not made the way it claims), in my experience Occam’s razor tends to lead that way when extraordinary claims are made regarding LLM’s.
Considering how cheap and easy it is to buy views/likes/subscribers I wouldn’t trust it blindly. Somehow I feel that people pushing AI music also would game the system, but I don’t have any proof unfortunately.
I did that when I was 14 because I had no other choice, damn you SoundBlaster! I didn't get any menu but I got sound in the end.
I don't think conflating intelligence with "what a computer can do" makes much sense though. I can't calculate the X digit of PI in less than Z, I'm still intelligent (or I pretend to be).
But the question is not about intelligence, it's a red herring, it's just about utility and they (LLM's) are useful.
Does GW have competitors? Feels like they own their niche (with the IP associated) completely with extreme amounts of content.
Similar to how Magic rules their segment of the market
Magic has competition in Yu-Gi-Oh and Pokemon. I think Pokemon outsells MTG now. Warhammer doesn't have anything else in their league. The other games are a very tiny percent of an already small niche.
I find it very easy to understand, people don’t generally want to work for free to support billionaires, and they have few venues to act on that, this is one of them.
There are no ”commons” in this scenario, there are a few frontier labs owning everything (taking it without attribution) and they have the capability to take it away, or increase prices to a point where it becomes a tool for the rich.
Nobody is doing this for the good of anything, it’s a money grab.
Were these contributions not a radical act against zero-sum games in the first place? And now you're gonna let the zero-sum people win by restricting your own outputs to similarly zero-sum endeavors?
I don't wanna look a gift horse in the mouth here. I'm happy to have benefited from whatever contributions were originally forthcoming and I wouldn't begrudge anybody for no longer going above and beyond and instead reverting to normal behavior.
I just don't get it, it's like you're opposed to people building walls, but you see a particularly large wall which makes you mad, so your response is to go build a wall yourself.
It's not about building a wall. It's about ensuring that the terms of the license chosen by the author are respected.
This is why I think permissive licenses are a mistake for most projects. Unlike copyleft licenses, they allow users to take away the freedoms they enjoy from users of derivative works. It's no surprise that dishonest actors take advantage of this for their own gain. This is the paradox of tolerance.
"AI" companies take this a step further, and completely disregard the original license. Whereas copyleft would somewhat be a deterrent for potential abusers, it's not for this new wave of companies. They can hide behind the already loosely defined legal frameworks, and claim that the data is derivative enough, or impossible to trace back, or what have you. It's dishonest at best, and corrupts the last remnants of public good will we still enjoy on the internet.
We need new legal frameworks for this technology, but since that is a glacial process, companies can get rich in the meantime. Especially shovel salespeople.
Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.
When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).
Sure, I get an occasional bad result from Opus - then I revert and try again, or ask it for a fix. Even with a couple of restarts, it's going to be faster than me on average. (And that's ignoring the situations where I have to restart myself)
Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.
The problem is it’s imperfect in very unpredictable ways. Meaning you always need to keep it on a short leash for anything serious, which puts a limit on the productivity boost. And that’s fine, but does this match the level of investment and expectations?
It’s not about being perfect, it’s about not being as great as the marketing, and many proponents, claim.
The issue is that there’s no common definition of ”fixed”. ”Make it run no matter what” is a more apt description in my experience, which works to a point but then becomes very painful.
Nope, I did get a lot of fancy markdown with emojis though so I guess that was a nice tradeoff.
In general, even with access to the entire code base (which is very small), I find the inherent need in the models to satisfy the prompter to be their biggest flaw since it tends to constantly lead down this path. I often have to correct over convoluted SQL too because my problems are simple and the training data seems to favor extremely advanced operations.
As if any taxes will be paid to the areas affected, and add to that the billions in taxes used to subsidize everything before a single cent is a net positive.
This is the problem, the entire internet is a really bad set of training data because it’s extremely polluted.
Also the derived argument doesn’t really hold, just because you know about two things doesn’t mean you’d be able to come up with the third, it’s actually very hard most of the time and requires you to not do next token prediction.
The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data. It can figure the world out just as we can figure it out when we are as well inundated with bullshit data. The pathways exist in the LLM but it won’t necessarily reveal that to you unless you tune it with RL.
> The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data.
I don't believe they can. LLMs have no concept of truth.
What's likely is that the "truth" for many subjects is represented way more than fiction and when there is objective truth it's consistently represented in similar way. On the other hand there are many variations of "fiction" for the same subject.
They can and we have definitive proof. When we tune LLM models with reinforcement learning the models end up hallucinating less and becoming more reliable. Basically in a nut shell we reward the model when telling the truth and punish it when it’s not.
So think of it like this, to create the model we use terabytes of data. Then we do RL which is probably less than one percent of additional data involved in the initial training.
The change in the model is that reliability is increased and hallucinations are reduced at a far greater rate than one percent. So much so that modern models can be used for agentic tasks.
How can less than one percent of reinforcement training get the model to tell the truth greater than one percent of the time?
The answer is obvious. It ALREADY knew the truth. There’s no other logical way to explain this. The LLM in its original state just predicts text but it doesn’t care about truth or the kind of answer you want. With a little bit of reinforcement it suddenly does much better.
It’s not a perfect process and reinforcement learning often causes the model to be deceptive an not necessarily tell the truth but it more gives an answer that may seem like the truth or an answer that the trainer wants to hear. In general though we can measurably see a difference in truthfulness and reliability to an extent far greater than the data involved in training and that is logical proof it knows the difference.
Additionally while I say it knows the truth already this is likely more of a blurry line. Even humans don’t fully know the truth so my claim here is that an LLM knows the truth to a certain extent. It can be wildly off for certain things but in general it knows and this “knowing” has to be coaxed out of the model through RL.
Keep in mind the LLM is just auto trained on reams and reams of data. That training is massive. Reinforcement training is done on a human basis. A human must rate the answers so it is significantly less.
> The answer is obvious. It ALREADY knew the truth. There’s no other logical way to explain this.
I can think of several offhand.
1. The effect was never real, you've just convinced yourself it is because you want it to be, ie you Clever Hans'd yourself.
2. The effect is an artifact of how you measure "truth" and disappears outside that context ("It can be wildly off for certain things")
3. The effect was completely fabricated and is the result of fraud.
If you want to convince me that "I threatened a statistical model with a stick and it somehow got more accurate, therefore it's both intelligent and lying" is true, I need a lot less breathless overcredulity and a lot more "I have actively tried to disprove this result, here's what I found"
You asked for something concrete, so I’ll anchor every claim to either documented results or directly observable training mechanics.
First, the claim that RLHF materially reduces hallucinations and increases factual accuracy is not anecdotal. It shows up quantitatively in benchmarks designed to measure this exact thing, such as TruthfulQA, Natural Questions, and fact verification datasets like FEVER. Base models and RL-tuned models share the same architecture and almost identical weights, yet the RL-tuned versions score substantially higher. These benchmarks are external to the reward model and can be run independently.
Second, the reinforcement signal itself does not contain factual information. This is a property of how RLHF works. Human raters provide preference comparisons or scores, and the reward model outputs a single scalar. There are no facts, explanations, or world models being injected. From an information perspective, this signal has extremely low bandwidth compared to pretraining.
Third, the scale difference is documented by every group that has published training details. Pretraining consumes trillions of tokens. RLHF uses on the order of tens or hundreds of thousands of human judgments. Even generous estimates put it well under one percent of the total training signal. This is not controversial.
Fourth, the improvement generalizes beyond the reward distribution. RL-tuned models perform better on prompts, domains, and benchmarks that were not part of the preference data and are evaluated automatically rather than by humans. If this were a Clever Hans effect or evaluator bias, performance would collapse when the reward model is not in the loop. It does not.
Fifth, the gains are not confined to a single definition of “truth.” They appear simultaneously in question answering accuracy, contradiction detection, multi-step reasoning, tool use success, and agent task completion rates. These are different evaluation mechanisms. The only common factor is that the model must internally distinguish correct from incorrect world states.
Finally, reinforcement learning cannot plausibly inject new factual structure at scale. This follows from gradient dynamics. RLHF biases which internal activations are favored, it does not have the capacity to encode millions of correlated facts about the world when the signal itself contains none of that information. This is why the literature consistently frames RLHF as behavior shaping or alignment, not knowledge acquisition.
Given those facts, the conclusion is not rhetorical. If a tiny, low-bandwidth, non-factual signal produces large, general improvements in factual reliability, then the information enabling those improvements must already exist in the pretrained model. Reinforcement learning is selecting among latent representations, not creating them.
You can object to calling this “knowing the truth,” but that’s a semantic move, not a substantive one. A system that internally represents distinctions that reliably track true versus false statements across domains, and can be biased to express those distinctions more consistently, functionally encodes truth.
Your three alternatives don’t survive contact with this. Clever Hans fails because the effect generalizes. Measurement artifact fails because multiple independent metrics move together. Fraud fails because these results are reproduced across competing labs, companies, and open-source implementations.
If you think this is still wrong, the next step isn’t skepticism in the abstract. It’s to name a concrete alternative mechanism that is compatible with the documented training process and observed generalization. Without that, the position you’re defending isn’t cautious, it’s incoherent.
Your three alternatives don’t survive contact with this. Clever Hans fails because the effect generalizes. Measurement artifact fails because multiple independent metrics move together. Fraud fails because these results are reproduced across competing labs, companies, and open-source implementations.
He doesn't care. You might as well be arguing with a Scientologist.
I’ll give it a shot. He’s hiding behind that clever Hans story, thinking he’s above human delusion, but the reality is he’s the picture perfect example of how humans fool themselves. It’s so ironic.
reply