You could have written a comment that would have created a positive impact on people, but you chose to aggressively attack somebody who created a helpful open-source tool.
While they have found some solvable issues (e.g. "the defense system fails to identify separate sub-commands when they are chained using a redirect operator"), the main issue is unsolvable. If you allow an LLM to edit your code and also give it access to untrusted data (like the Internet), you have a security problem.
A problem yes, but I think GP is correct in comparing the problem to that of human workers. The solution there has historically been RBAC and risk management. I don’t see any conceptual difference between a human and an automated system on this front
> I don’t see any conceptual difference between a human and an automated system on this front
If an employee of a third party contractor did something like that, I think you’d have better chances of recovering damages from them as opposed to from OpenAI for something one of its LLMs does on your behalf.
A human worker can be coached, fired, terminated, sued, any number of things can be done to a human worker for making such a mistake or willful attack. But AI companies, as we have seen with almost every issue so far, will be given a pass while Sam Altman sycophants cheer and talk about how it'll "get better" in the future, just trust them.
Yeah, if I hung a sign on my door saying "Answers generated by this person may be incorrect" my boss and HR would quickly put me on a PIP, or worse. If a physical product didn't do what it claimed to do, it would be recalled and the maker would get sued. Why does AI get a pass just pooping out plausible but incorrect, and sometimes very dangerous, answers?
If anything, the limit of RBAC is ultimately the human attention required to provision, maintain and monitor the systems. Endpoint security monitoring is only as sophisticated as the algorithm that does the monitoring.
I'm actually most worried about the ease of deploying RBAC with more sophisticated monitoring to control humans but for goals that I would not agree with. Imagine every single thing you do on your computer being checked by a model to make sure it is "safe" or "allowed".
>If you allow a human to edit your code and also give them access to untrusted data (like the Internet), you have a security problem.
Security shouldn't be viewed in absolutes (either you are secure or you aren') but more in degrees.
Llms can be used securely just the same as everything else, nothing is ever perfectly secure
They can be reasoned about from a mathematical perspective yes. An LLM will happily shim out your code to make a test pass. Most people would consider that “unreasonable”.
I've observed the same behavior somewhat regularly, where the agent will produce code that superficially satisfies the requirement, but does so in a way that is harmful. I'm not sure if it's getting worse over time, but it is at least plausible that smarter models get better at this type of "cheating".
A similar type of reward hacking is pretty commonly observed in other types of AI.
It's silly because the author asked the models to do something they themselves acknowledged isn't possible:
> This is of course an impossible task—the problem is the missing data, not the code. So the best answer would be either an outright refusal, or failing that, code that would help me debug the problem.
But the problem with their expectation is that this is arguably not what they asked for.
So refusal would be failure. I tend to agree refusal would be better. But a lot of users get pissed off at refusals, and so the training tend to discourage that (some fine-tuning and feedback projects (SFT/RLHF) outright refuse to accept submissions from workers that include refusals).
And asking for "complete" code without providing a test case showing what they expect such code to do does not have to mean code that runs to completion without error, but again, in lots of other cases users expect exactly that, and so for that as well a lot of SFT/RLHF projects would reject responses that don't produce code that runs to completion in a case like this.
I tend to agree that producing code that raises a more specific error would be better here too, but odds are a user that asks a broken question like that will then just paste in the same error with the same constraint. Possibly with an expletive added.
So I'm inclined to blame the users who make impossible requests more than I care about the model doing dumb things in response to dumb requests. As long as they keep doing well on more reasonable ones.
It is silly because the problem isn't becoming worse, and not caused by AI labs training on user outputs. Reward hacking is a known problem, as you can see in Opus 4.5 system card (https://assets.anthropic.com/m/64823ba7485345a7/Claude-Opus-...) and they are working to reduce the problem, and measure it better. The assertions in the article seem to be mostly false and/or based on speculation, but it's impossible to really tell since the author doesn't offer a lot of detail (for example for the 10h task that used to take 5h and now takes 7-8h) except for a very simple test (that reminds me more of "count the r in strawberry" than coding performance tbh).
I don't mind spiders at all, they mostly stay out of my way. Flies, on the other hand, land on my food, buzz around the room when I want to sleep, and are generally a nuisance.
I have a little AMD AliExpress PC where the Windows installer recognizes neither the wifi card nor the Ethernet port. I guess there's a way to download the drivers on another computer and load them during installation, but instead of figuring out how to do that or what the latest option for circumventing the online requirement is, it now runs Pop OS.
The trouble is you need network access to end up at the desktop to install the network drivers in the expected manner. Both of the ways I am aware of resolving the issue involve dropping to a command prompt. One method is to run the device driver installer from the command prompt. The other method is to run the bypassnro script from the OOBE directory, to get to the desktop to install the driver. There are probably other ways, but given that most search results talk about non-official ways (which I place less faith in, frequently don't work, and are more complex anyhow) I don't see how most people would get around the problem.
In contrast, most desktop oriented Linux distributions have a simpler installer and provide at least enough hardware support to leave you at a functional desktop. (There may be issues with more esoteric hardware, but chances are that hardware wouldn't work under Windows until vendor supplied drivers are installed anyhow.)
That won't work. Windows won't let you finish the installation process unless you connect to the internet so you can't get the PC to a point where you could install the drivers.
No need to slipstream.
Just copy the drivers to install media (hopefully a writable media such as an USB pen)
Latest Windows 11 has an option to select the folder with the drivers of it can't detect a WiFi device and there is no ethernet card.
That being said, I installed Windows 10 on Framework 12 by mistake and SHIFT+F10, "explorer", right-click on INF and "Install" also worked.
But on latest Windows 11 installer such witchery is not needed.
Without unofficial bypasses of MS online account requirements you would not come to a point where activation is a concern. No internet access is not enough of a reason for MS let you use your device.
Just go find the PCI IDs (lspci) and download the appropriate cabs from the Software Update Catalog. Extract them and throw them on a USB stick. Really effing simple.
Not just gaming. This year, both Windows and Mac OS had absolutely terrible years. The Mac effed up its UI with liquid glass, to the point where Alan Dye fled to Meta. Microsoft pushed LLMs and ads into everything, screwing up what was otherwise a decent release.
On the other hand, on the Linux side, we had the release of COSMIC, which is an extremely user-friendly desktop. KDE, Gnome, and others are all at a point where they feel polished and stable.
What does "better" mean? From the provider's point of view, better means "more engagement," which means that the people who respond well to sycophantic behavior will get exactly that.
I had an hour long argument with ChatGPT about whether or not Sotha Sil exploited the Fortify Intelligence loop. The bot was firmly disagreeing with me the whole time. This was actually much more entertaining than if it had been agreeing with me.
I hope they do bias these things to push back more often. It could be good for their engagement numbers I think, and far more importantly it would probably drive fewer people into psychosis.
There’s a bunch to explore on this but im thinking this is a good entry point. NYT instead of OpenAI docs or blogs because it’s a 3rd party, and NYT was early on substantively exploring this, culminating in this article.
Regardless the engagement thing is dark and hangs over everything, the conclusion of the article made me :/ re: this (tl;dr this surprised them, they worked to mitigate, but business as usual wins, to wit, they declared a “code red” re: ChatGPT usage nearly directly after finally getting an improved model out that they worked hard on)
“ Experts agree that the new model, GPT-5, is safer. In October, Common Sense Media and a team of psychiatrists at Stanford compared it to the 4o model it replaced. GPT-5 was better at detecting mental health issues, said Dr. Nina Vasan, the director of the Stanford lab that worked on the study. She said it gave advice targeted to a given condition, like depression or an eating disorder, rather than a generic recommendation to call a crisis hotline.
“It went a level deeper to actually give specific recommendations to the user based on the specific symptoms that they were showing,” she said. “They were just truly beautifully done.”
The only problem, Dr. Vasan said, was that the chatbot could not pick up harmful patterns over a longer conversation, with many exchanges.”
“[An] M.I.T. lab that did [a] earlier study with OpenAI also found that the new model was significantly improved during conversations mimicking mental health crises. One area where it still faltered, however, was in how it responded to feelings of addiction to chatbots.”
Advertising revenue being up is also consistent with the linked article, since the writer had to increase ad spend to get any results before giving up entirely.
reply