AFAICT, the future of software development looks like a lot of unprofessional software development.
Not unlike digital photography and Instagram. Has it killed film-photo divisions of photography companies? Yes. Has it put professional photographers out of business? Hardly, and in fact, the opposite. What the ubiquitous phone camera has done, is expose a lot many more people to the steep challenge of making truly good photographs. It has raised the average population-scale level of photo-erudition and taste to ever-more sophisticated levels. And, it has pushed the envelope on what photography can do.
So---assuming the AI overlords prevail (which I'm deeply skeptical of, but suppose a trillion dollars are right and I'm wrong)---what happens when LLMs allow anyone to vibe-code their own SaaS or Database or IDE or bespoke health-monitoring app or whatever...?
The important difference with digital photography is the phone photographer won't use pro lighting, different lenses, reflectors, bounced flash or other gear that contributes to the "pro photography" look.
With software, vibe-coders might use AI agents that have all the equivalent "pro photo gear" for professional output.
There's a moat around pro-photography protecting it from its snack-size phone-camera cousin. All those lights, lenses and tripods are the physical moat. If we ponder the question whether software development has an equivalent moat, the gp's gloom may be warranted.
- How much does a top-end "pro" AI account cost? Who can pay that on a sustained basis through a career? (Someone has to pay --- the cost gets loaded onto fully-loaded employee cost. And when it's the business doing it, you can bet that eventually CFOs are going to ask the hard questions.).
- How much does pro photography gear cost? Does every photographer own all the gear? (No. Renting is standard, and in fact, preferred. Nobody wants to buy thousand-dollar marco lens they will use only a few times to lean macro photography and do the occasional pro shoot. A specialist macro photgrapher will, because they can amortize the cost.)
The people whose behaviors are ultimately leading to this statement don't care about the spirit of the prize or whether it counts. It's about who's "the greatest" and forcing others to submit to power.
These prompt injection vulnerabilities give me the heebie jeebies. LLMs feel so non deterministic that it appears to me to be really hard to guard against. Can someone with experience in the area tell me if I'm off base?
> it appears to me to be really hard to guard against
I don't want to sound glib, but one could simply not let an LLM execute arbitrary code without reviewing it first, or only let it execute code inside an isolated environment designed to run untrusted code
the idea of letting an LLM execute code it's dreamt up, with no oversight, in an environment you care about, is absolutely bananas to me
> if a skilled human has to check everything it does then "AI" becomes worthless
Well, perhaps not worthless, but certainly not "a trillion-dollar revolution that will let me fire 90% of my workforce and then execute my Perfect Rich Guy Visionary Ideas without any more pesky back-talk."
That said, the "worth" is brings to the shareholders will likely be a downgrade for everybody else, both workers and consumers, because:
> The market’s bet on AI is that an AI salesman will visit the CEO of Kaiser and make this pitch: “Look, you fire 9/10s of your radiologists [...] and the remaining radiologists’ job will be to oversee the diagnoses the AI makes at superhuman speed, and somehow remain vigilant as they do so, despite the fact that the AI is usually right, except when it’s catastrophically wrong.
> “And if the AI misses a tumor, this will be the human radiologist’s fault, because they are the ‘human in the loop.’ It’s their signature on the diagnosis.”
> This is a reverse centaur, and it’s a specific kind of reverse-centaur: it’s what Dan Davies [calls] an “accountability sink.” The radiologist’s job isn’t really to oversee the AI’s work, it’s to take the blame for the AI’s mistakes.
> Like an Amazon delivery driver, who sits in a cabin surrounded by AI cameras, that monitor the driver’s eyes and take points off if the driver looks in a proscribed direction, and monitors the driver’s mouth because singing isn’t allowed on the job, and rats the driver out to the boss if they don’t make quota.
> The driver is in that van because the van can’t drive itself and can’t get a parcel from the curb to your porch. The driver is a peripheral for a van, and the van drives the driver, at superhuman speed, demanding superhuman endurance. But the driver is human, so the van doesn’t just use the driver. The van uses the driver up.
I guess it resonates for me because it strikes at my own justification for my work automating things, as I'm not mercenary or deluded enough to enjoy the idea of putting people out of work or removing the fun parts. I want to make tools that empower individuals, like how I felt the PC of the 1990s was going to give people more autonomy and more (effective, desirable) choices... As opposed to, say, the dystopian 1984 Telescreen.
Right. this feels more and more like a situation of extraction, abusive and theft of empowerment of the people and funneling it up to the top. It's apparent, and people are too afraid and weak to do anything.
Or so they think.
And I think of a saying that all capitalistic systems eventually turn in socialist ones or get replaced with dictators. Is this really the history of humanity over and over? can't help but hope for more.
The really fast part is the challenge though. If we assume that in pre-LLM world, there was enough resource for mid/senior level engineers to review junior engineer code and then in LLM world, lets say we can produce 10x the code, unless we 10x the number of mid/senior level engineering resource dedicated to review, what was once possible is no longer possible...
I do feel like I can review 2-3x with a quicker context switching loop. Picking back up and following what the junior engineer did a a couple of weeks after we discussed the scope of work is hard.
It could be as useful as a junior dev. You probably shouldn't let a junior dev run arbitrary commands in production without some sort of oversight or rails, either.
Even as a more experienced dev, I like having a second pair of eyes on critical commands...
I think a nice compromise would be to restrict agentic coding workflows to cloud containers and a web interface. Bootstrap a project and new functional foundations locally using traditional autocomplete/chat methods (which you want to anyway to avoid a foundation of StackOverflow-derived slop) then implement additional features using the cloud agents. Don't commit any secrets to SCM and curate the tools that these agents can use. This way your dev laptops are firmly in human control (with IDEs freed up for actual coding) while LLMs are safelt leveraged. Win-win.
You could literally ask the LLM to obfuscate it and I bet it would do a pretty good job. Good luck parsing 1,000 lines of code manually to identify an exploit that you’re not even specifically looking for.
LLMs are vulnerable in the same way humans are vulnerable. We found a way to automate PEBKAC.
I expect that agent LLMs are going to get more and more hardened against prompt injection attacks, but it's hard to get the chance of them working all the way down to zero while still having a useful LLM. So the "solution" is to limit AI privileges and avoid the "lethal trifecta".
Determinism is one thing, but the more pressing thing is permission boundaries. All these AI agent tools need to come with no permissions at all out of the box, and everything should be granularly granted. But that would break all the cool demos and marketing pitches.
Allowing agent to run wild with any arbitrary shell commands is just plain stupid. This should never happen to begin with.
> All these AI agent tools need to come with no permissions at all out of the box, and everything should be granularly granted.
That's what the tools already do. if you were watching some cool demo that didnt have all the prompts they may have been running the tools in "yolo mode" which is not usually a normal thing.
So the attacker doesn't need to send an evil-bit over the network, if they can trigger the system into dreaming up the evil-bit indirectly as its own output at some point.
The problem isn't non-determinism per se, an agent that reliably obeys a prompt injection in a README file is behaving entirely deterministically: its behavior is totally determined by the inputs.
You're correct, but the answer is that - typically - they don't access untrusted content all that often.
The number of scenarios in which you have your coding agent retrieving random websites from the internet is very low.
What typically happens is that they use a provider's "web search" API if they need external content, which already pre-processes and summarises all content, so these types of attacks are impossible.
Don't forget: this attack relies on injecting a malicious prompt into a project's README.md that you're actively working on.
> a provider's "web search" API [...] pre-processes and summarises all content, so these types of attacks are impossible.
Inigo Montoya: "Are you sure the design is safe?"
Vizzini: "As I told you, it would be absolutely, totally, and in all other ways inconceivable. The web-gateway API sanitizes everything, and no user of the system would enter anything problematic. Out of curiosity, why do you ask?"
Inigo Montoya: "No reason. It's only... I just happened to look in the logs and something is there."
Vizzini: "What? Probably some local power-user, making weird queries out of curiosity, after hours... in... malware-infested waters..."
At least the malware does already run on the coders machine. Fun starts, when malware just start to run on users machine and the coders are not coders anymore, just prompters and have no idea how such a thing can happen.
Isn't that already the case? Coders already think composer and node are great, an ecosystem predicated upon running thousands of untrusted pieces of code without any review or oversight.
If someone can write instructions to download a malicious script into an codebase, hoping an AI agent will read and follow them, they could just as easily write the same wget command directly into a build script or the source itself (probably more effective). In that way it's a very similar threat to the supply chain attacks we're hopefully already familiar with. So it is a serious issue but not necessarily one we don't know how to deal with. The solutions (auditing all third party code, isolating dev environments) just happen to be hard in practice.
Given the displeasure a lot of developers have towards AI, I would not be surprised if such attacks became more common. We’ve seen artists poisoning their uploads to protect them (or rather, try and take revenge), I don’t doubt it might be the same for a non-negligible part of developers.
Yes, fetching arbitrary webpages is its own can of worms. But feels less intractable to me, it's usually easy to disable web search tools by policy without hurting the utility of the tools very much (depends on use case of course).
Just to be the pedant here, LLMs are fully deterministic (the same LLM, in the same state, with the same inputs, will deliver the same output, and you can totally verify that by running a LLM locally). It's just that they are chaotic (a prompt and a second with slight and seemingly minor changes can produce not just different but conflictual outputs).
You are very on base. In fact, there is a deep conflict that needs to be solved: the non-determinism is the feature of an agent. Something that can "think" for itself and act. If you force agents to be deterministic, don't you just have a slow workflow at that point?
I think it should be on the article to prove its title. I hardly think presenting one test case to some different models substantiates the claim that "AI Coding Assistants Are Getting Worse." Note that I have no idea if the title is true or not, but it certainly doesn't follow from the content of the article alone.
With llms being hard to test objectively, any claim made about them has to be substantiated with atleast anecdotes. The article presented some backing, if you dont think its enough you gotta present some of your own, or people cant talk you seriously
I did present my own evidence to support _my_ argument that the article is woefully lacking data to support its conclusion. It's not on me to try to make the counterargument (that AI coding assistants aren't getting worse) because that's not my opinion.
I am used to seeing technical papers from ieee, but this is an opinion piece? I mean, there is some anecdata and one test case presented to a few different models but nothing more.
I am not necessarily saying the conclusions are wrong, just that they are not really substantiated in any way
This may be a situation where HackerNews' shorthand of omitting the subdomain is not good. spectrum.ieee.org appears to be more of a newsletter or editorial part of the website, but you wouldn't know that's what this was just based on the HN tag.
I've been on this site for over a decade now and didn't know this. That's a genuinely baffling decision given how different content across subdomains can be.
And the example given was specific to OpenAI models, yet the title is a blanket statement.
I agree with the author that GPT-5 models are much more fixated on solving exactly the problem given and not as good at taking a step back and thinking about the big picture. The author also needs to take a step back and realize other providers still do this just fine.
Ah you're right, scrolled past that - the most salient contrast in the chart is still just GPT-5 vs GPT-4, and it feels easy to contrive such results by pinning one model's response as "ideal" and making that a benchmark for everything else.
and they are using OpenAI models, who haven't had a successful training run since Ilya left, GPT 5x is built on GPT 4x, not from scratch aiui
I'm having a blast with gemini-3-flash and a custom copilor replacement extension, it's much more capable than Copilot ever was with any model for me and a personalized dx with deep insights into my usage and what the agentic system is doing under the hood.
can you talk a little more about your replacement extention? I get copilot from my worksplace and id love to know what I can do with it, ive been trying to build some containerized stuff with copilot cli but im worried I have to give it a little more permissions than im comfortable with around git etc
The entire extension and agent framework is in that repo too
extensions/vscode and lib/agent
I let my agent do whatever because I know exactly what it can and can't do. For example, it can use git, but cannot push, and any git changes are local to its containerized environment and don't get exported back to my filesystem where I do real git work. I could create an envelope where they could push git, and more likely I'll give them something where they can call GitHub ali, that's really more useful anyway
I disagree with this. You can tell someone that a question is not appropriate for a community without being a jerk. Or you can tell them that there needs to be more information in a question without being a jerk. I do not think being mean is a prerequisite for successful curation of information.
As Ops person who has to tell people "This is terrible idea, we are not doing it.", I've always struggled with how to tell someone nicely "No" without them seeing it as "Well, I guess my idea delivery is off, my idea is fine though."
When dealing with those personalities, seems only way to get them to completely reconsider them approach is hard "F off". Which I why I understand old Linus T. Emails. They were clearly in response to someone acting like "I just need to convince them"
There are bad questions (and ideas, like you said). Stackoverflow tried to incentivize asking good, novel questions. You grow up often being told "there are no stupid questions" but that is absolutely not the case.
A good question isn't just "how do I do x in y language?" But something more like "I'm trying to do x in y language. Here's what I've tried: <code> and here is the issue I have <output or description of issue>. <More details as relevant>"
This does two things: 1. It demonstrates that the question ask-er actually cares about whatever it is they are doing, and it's just trying to get free homework answers. 2. Ideally it forces the ask-er to provide enough information that an answer-er can do so without asking follow ups.
Biggest thing as someone who has been in Discords that are geared towards support, you can either gear towards new people or professionals but walking the line between both is almost impossible.
I believe in gearing towards teachers. Q&A sites are often at their best when the Q and A come from the same source. But it needs to be someone who understands that the Q is common and can speak the language of those who don't know the A. Unfortunately, not a common skillset (typically doesn't pay the bills).
The key part of your post is "has to tell people". Absolutely nobody on SO was obligated to respond to anything. The toxicity was a choice, and those doing it enjoyed it.
To play devil's advocate, I think some people confuse terse, succinct communication with being mean. I find this to be a common cultural difference between contributors from different backgrounds.
Personally I find a lot of "welcoming" language to be excessively saccharine and ultimately insincere. Something between being talked down to like I'm a child and corpo-slop. Ultimately I don't think there's necessarily a one-size-fits-all solution here and it's weird that some people expect that such a thing can or should exist.
Personally I'm not a fan of terse writing; if something's worth saying at all it's worth using suitably expressive language to describe it, and being short and cold with people isn't a good form of interpersonal communication in my view. Pleasantries are important for establishing mutual respect, if they're considered the baseline of politeness in a particular culture then it's overtly disrespectful to forgo them with strangers. Terseness is efficient for the writer certainly, but it's not necessarily for the reader.
Written like you're on one side of the cultural barrier and think that you have to be somehow naturally correct because that's what's natural to you. To others, that attitude is just arrogant and self-centered. Why should one particular culture dictate the behavior of everyone, and especially why should it be your culture?
What you call "establishing mutual respect" is just "insincere and shallow" to others. I do not believe for a second that a grocery store cashier wants to know how my day has been.
That's not what I mean, I don't like corpo-speak either. I mean just treating people like they're human beings, neither with affected shortness nor affected warmth. I really don't like the common notion that you have to be cold and short with people to be a good engineer, it makes the culture considerably less pleasant and more abrasive than it needs to be in my view.
I could just as well turn that around and say why should we all adopt your preference of unpleasantly curt communication? Is that not also an imposition of someone else's culture?
What if short isn't "cold" at all? That's a value you're projecting to it.
I understand there are cultures that value flowery speech more than mine. I'm asking you to stop using emotionally loaded words to describe how other people behave.
I very much disagree with the first point that there had somehow been a critical mass of questions achieved to explain the slowdown. With the change in tech, outdated answer, and increasing population of people writing code, I just don't buy it.
reply