14% - that's the first time i've seen a hard number like that. It's significant.
Have there been any good studies on programmer productivity? 14% would be a just a blip in the constant flow of productivity-enhancing language, tools, architectures etc. that have been steady accelerating the pace and scale of development for decades ...
When I moved from Java to Rails, for example ... definitely at least 14% more productive, I'm guessing much much higher. Even just from load times.
Or going from printed documentation all the way to Stack Overflow - how many multiples was that?
68000 to M2 ... page reload to React's fast refresh ... etc. etc.
I feel like it's making me somewhat more productive because of how it gives me a kick in the butt and stops my mind wandering off. Getting immediate suggestions and ideas from a "third party" on whatever I'm doing leads me on to the next step of whatever I'm doing rather than getting tempted to procrastinate. I doubt this will benefit everyone, but I guess this is how pair programming can be quite effective for some people too.
I've written a few things (words) where one of the LLMs gave me a pretty good starting point. It needed some work but it saved me Googling around to get some sort of starting draft down on paper--which invariably turns into more Googling around and distractions. It's always been stuff I've known well enough that I knew what was right, wrong, and insipid. But it was a starting point (which can be useful, especially if you're just going for workmanlike anyway).
Both. v4 is fast during periods of low load, it is slower now than when I first got access. My usage patterns have shifted because of this, so I write longer prompts, have it do more work and switch to different tab. I'd pay more than twice as much to have it be at least 50% faster.
I feel it too, even if the suggestion is wrong it still trigger my brain to think "No, it should be like this." and then I write the thing. Rinse and repeat.
> Have there been any good studies on programmer productivity?
Yes, but mostly from the companies developing these products:
* The CoPilot productivity study by Peng et al. that is summarized in a Blog post [1] with some additional details in the Arxiv preprint [2]. They find "developers who used GitHub Copilot completed the task significantly faster–55% faster than the developers who didn’t use GitHub Copilot". (Grep for "but speed is important, too" in [1].)
* Amazon's Q1 earnings report [3] includes a paragraph stating that "participants [using] CodeWhisperer completed tasks 57% faster (on average) and were 27% more likely to complete them successfully than those who didn’t use CodeWhisperer.
* I seem to remember seeing something on Replit's blog a while back with a similar number, but can't find it anymore, so maybe I'm mistaken.
These speed numbers are on specific and well-specified programming tasks, which is of course only one part of a developer's job, so there's a lot of opportunity for impactful research in this space. I suspect that the Empirical Software Engineering community will be awash in third-party empirical studies asking all sorts of more specific productivity questions by this time next year.
I agree with the overall assessment, but there's a catch: even "highly skilled" workers are not skilled in "all the things". I've felt that ChatGPT hasn't increased my productivity in my core languages and frameworks, but it has helped immensely in areas where I'm not an expert.
To be concrete about it, I recently ported a project from SwiftUI on iOS (an ecosystem I am very comfortable with) to an Android app written in Kotlin (an ecosystem I mostly dread). I don't find ChatGPT very helpful with my day-to-day Swift stuff, but it was incredibly helpful with the Android work, from language syntax to idioms to outright translation of code.
What framework or approach did it recommend you use for the Android UI? Compose? XML layouts? From code?
I tried using the Bing chat interface, and it repeatedly pushed me in the direction of using Microsoft tools to accomplish tasks for a cross-compile mid tier solution. I had to explicitly tell it to exclude them.
In areas where I'm an expert, I find myself correcting ChatGPT constantly. But for a language I want to learn, it's been really helpful to get me started on a simple algorithm or debug an error message.
I can't help but think that it sounds like productised Gell-Mann Amnesia. If a system is useless and/or counterproductive in areas you're an expert in, but appears useful in areas that you aren't, shouldn't it be a red flag? How would you know if solutions it comes up with are bad, wrong, or just bad practices?
I really love your optimism here, but as an ex-executive I can see how this will be used -- to put wage pressure on any skilled/senior leaders by inflowing a churn of unskilled to replace them.
I don't think you grasp how this will play out over multi-game
Different companies and executives will play this out differently. Some will replace their unskilled workers, some will fire everyone, some will ignore AI altogether.
What matters is which of these strategies will result in actual success. This is honestly far too hard to tell at this point.
You can apply the law of "power wins" I think to this pretty well. It will be initially be used to put labor in a lower position and expand ownership margins -- until that creates an opposing power scenario. Which maybe be sooner... or could be later. Hopefully before robots become real, because otherwise it will be never.
I'll say here, I've spoken to and spent time with a few well known billionaires and I'd say deep down, they are exterminists no matter how nice a face they try to put on it. Over time they just come to believe most people don't matter at all. Its really dark, and the sooner we come to terms with that the better.
Software has been cannibalising itself for 70 years, and yet look and behold - being a programmer is still one of the best paid jobs. I don't believe a simple 20-50% boost for intermediate users is going to change things much.
70 years? I’d argue that programming has only recently (last 10-20 years) become a mainstream source of high paying jobs. Before that it was relatively low paying or inaccessible/undesirable to most people. Recent simplification through the movement of much CRUD work toward web frameworks has made it more accessible.
As others have remarked, it allows them to be marginally more skilled in areas where they are lacking. I am expert level in a couple domains, but now I am advanced novice to intermediate in a whole lot more due GenAI. I can now hang with junior folks in their domain, not mine.
While I agree with your assessment on wage pressure, the folks this is going to hurt the most are the new graduates that don't have the knowledge or experience. Their competition just got a whole lot stiffer.
It benefits two groups the most, someone with literally zero experience and experts.
Yes, I think that middle section will see enormous pressure from below (bootcamp folks probably doubled their productivity) and from above (skilled folks can roll up their sleeves rather than delegate).
That's the way with about any real leap in productivity applications. About every time you add "smart" to something, you take from skilled domain experts.
Just think of when Photoshop 4.0 added layers and now composing was for everyone, not just for those venturing into channel operations… (To be fair, here, things got much easier even for those who managed previously without this.)
> In other words, if you're skilled, there's nothing AI can do for you…
If the task you are working on is phone support. If you are a dev, you are perpetually learning new things, it's not possible to memorise the whole field so AI would have more opportunity to help.
The study group were mostly customer support agents. The nature of that work is very different from other knowledge workers so probably the results don't map to more creative fields.
> "Customer support agents using an AI tool to guide their conversations saw a nearly 14 percent increase in [overall] productivity, with 35 percent improvements for the lowest skilled and least experienced workers, and zero or small negative effects on the most experienced/most able workers... [out of] 5,000 agents working for a Fortune 500 software company."
AI quality is somewhere in the middle between highly skilled and neophyte (at present anyway).
This would mean companies can fire or put wage pressure much more effectively on highly
Skilled or experienced workers by onboarding low skill low experience much faster
AI will further consolidate mega corp power to a terrifying degree.
It could go that way, but another possibility is that the technology undermines the education and the informal apprenticeship situation to such a degree that only older workers are effective while younger workers can only achieve what the AI allows them to achieve. Cheating is already pervasive in education.
Consider the relationship between atmospheric nuclear testing and low-background steel [1] but for AI and older workers whose knowledge predates the introduction of LLMs in the workforce.
It can onboard low skilled labor faster, but they don't have the rest of the skills to round them out. I am not saying that AI isn't coming for your jobs, because it is. GenAI is extremely useful for high skilled folks, just not directly in the thing you are super skilled at, but it can play an enormous supporting role in everything else.
There are quite a few scary things about this. For one, who’s to say everyone will have access to this (or similar) technology? It will further concentrate wealth in the hands of the few. Also, since that increased productivity is not coming from the human, and is instead coming from the AI, workers’ share of profits will likely decrease, and with it, their bargaining power, and their overall position in the social hierarchy.
Elaborating further on the idea of productivity, as I said above, this doesn’t make employees more productive. It is doing some of the producing. It is not like an improved tool, where someone has to operate it to reap the benefits. It “upgrades productivity” more like how a kiosk at a McDonald’s upgrades productivity - not of the worker, but of the business. And much like the kiosks, we would be naive to believe they won’t replace the humans in due time.
This also has larger consequences for knowledge workers. Previously, to attain a higher level of productivity out of a worker, companies would have to invest in them so they could develop the necessary skills. Gone (or at least lessened) is that need. So workers will have less skill, less job freedom, a smaller share of profit, less social mobility. This is a nightmare for the working class.
The point of these models isn't to help poor people, it's to make as much money as possible. One of the most straightforward ways for that to happen is to drastically suppress salaries. This tech could very easily harm way more people than it helps.
Absolutely no one is spending tens of millions of dollars developing these tools in order to let random plebs capture much of the value produced by them.
For now. I really don't like where it's going long term. Right now your worth as a human is in what you are able to do. I am terrified to think of a time when for everything you could ever hope to learn there would be a model that would be able to do this 10 times better than you for $3/hr amortized cost. The few who would own the models would own the world and the rest would be rendered essentially worthless.
My gosh 'the robots that replaced the factory workers ... will just help the workers - so now worries!'
Do you know what happened to manufacturing when automation low-cost outsourcing happened?
It was wiped out.
It was devastating for certain sectors of the economy, even as 'net productivity rose' - the surpluses were acquired by some, not others.
A lot of 'ghettos' in the US are a direct result of mass factory closures.
Now - imagine that happening over the developing world, or rather, factories that were supposed to open, never did.
The developing world are 'services export' economy, with things like call centres etc. - and AI will more likely than not just evaporate those roles.
That those people will have 'access to ChatGPT' is besides the point when most of them don't even have computers (just mobile phones), or any way to apply that knowledge.
It's a bit like saying: 'The developing world has access to Wikipedia and all of Harvard courses online! They should all have great jobs!'
Unfortunately that's not how it works.
AI is going to help white collar workers, not pink or blue collar work which is low-skilled.
I don’t know where people are getting such low numbers. I’ve been a developer for 10 years. I care a lot about optimizing my workflow.
When copilot was released, I’d say I got a 15% increase. When ChatGPT was released it was like 50% at least and I can’t imagine going back. I remember how slow it was now.
My advice would be to force yourself to leverage it more or something. I hate googling now. I’ll find a page, copy the entire thing into gpt4 and the file i’m using with the error message and i have to do nothing.
14% better using a model from late 2020. I’d expect this to increase materially with most recent models and better tooling to pull in relevant data, particularly for text agent chats
Voice support will begin replacement or augmentation with AI once people are comfortable talking to a competent AI with good voice synthesis & response latency drops to milliseconds
Not good for senior agent wages short term, only good for senior agents long term when only hiring people for escalations. Likely a large reduction of the workforce in these roles overall.
> I could do insane things with a hypothetical version of GPT 4 that’s private, effectively free, and a few orders of magnitude faster.
You are describing a privately hosted LLaMA, Falcon or Starcoder instance (albeit with lower quality than GPT4). You pay for the speed you want, though TBH they are already very fast on modest GPUs.
I've personally found on net it negates ALL the benefits (on net) for areas I'm at least "modestly skilled" in. That is if I could only use GPT-4 vs. GPT-3, I'd take GPT-3 because the faster speed outweighs the higher accuracy on some tasks for GPT-4.
Could you really expect ASICs to increase performance exponentially if Moore's law is dead?
As for new algorithms, surely this would be more of a logarithmic increase for any specific problem as it gets harder and harder to improve on existing knowledge. Has, say, sorting become exponentially faster since Tony Hoare invented quicksort in 1959? My guess is no.
the NVIDIA A100 and H100 GPUs being used to run the GPT models are general purpose devices. If they had video ports, you could run 3D games on them!
They're like a CPU: highly programmable, flexible, multi-purpose. This means that when any single "part" of the GPT inference code is running, it's probably using only 10-40% of the die area while the rest is not used and idle.
Imagine a future where a nearly-optimal architecture for LLMs has been figured out and stops changing every few months. At that point, it would be possible to make a dedicated chip that does "nothing else but that". It would be 100% utilised and faster because it would have more units dedicated to the specific task at hand. The data paths could also be optimised to minimise wasteful trips to-and-from external memory, etc...
The ideal "inference chip" would be something akin to an FPGA. Huge numbers of simple cells that can do just the dozen-or-so instructions required, with local memory for the weights and biases. Make these on a cheaper process, run them at 500 MHz for power efficiency, and tile them out by their hundreds in a hybrid memory+compute architecture.
It would be relatively easy to get a factor of 10x, and I suspect 100x is possible.
Absolutely - the sibling comment gave a good rundown, but you could even go farther and to make an ASIC with a completely baked in model. Like an old text to speech chip kinda.
Customer service is a funny example because most of the issues are self inflicted. Take something like cancelling an order. Some websites do not allow you to do this via the website or touch tone. Sure you can use AI to figure out the request and cancel but it was unnecessary.
I’d be more curious to see the benefit vs good non AI UX
I previously worked on ecommerce and SaaS UX and have first-hand experience with longer customer support rotations in those fields. While I'm sure there's a lot of low-hanging UX fruit, this seems to be an overestimation of how sophosticated most of the customer base is and how much they are able to just read things and figure stuff out without hand-holding. Add to this how past a certain scale even freak issues affecting <0.01% of users/sessions suddenly become quite frequent in absolute terms, and I don't think customer support, automated or not, is going anywhere.
> The AIs behavior was trained on the company's most productive employees
I think this line explains the study results quite well. In a closed domain like customer support for a specific product you are bound to uncover patterns that inexperienced employees can copy. It is also kind of limited by itself as as you progress you will inevitably stop relying on it.
I think this sort of tool can have a great impact for onboarding in these domains but other than that, this and personal experience seem to show that open ended knowledge work seems to only benefit marginally in the moment.
The most interesting thing to me is how little of a productivity boost you are seeing among higher skilled agents. Top quintile is getting an expected 2% boost with no change in the confidence interval.
This jives with a lot of the heterogeneous commentary you see on HN among engineers. GPT-3/4 are a godsend when approaching new systems; the productivity gain is quite small if you are already skilled.
I have one dedicated browser window with 3 tabs for: GPT, Bing, Bard. I've been experimenting for the last 2 months, with increasing frequency. I tend to paste the same question into all 3, and compare the answers. None is a clear winner. As others have pointed out ad nauseum, you have to understand what these tools are doing: they are stochastic parrots that are optimized to generate the most plausible answer possible. Plausible and Correct have large venn diagram overlaps, but they are not strictly synonymous.
It's interesting to me why these tools are actually useful. I've been at this for a while. I used to use the web in general, and various community forums to help advance my knowledge. Comp.lang.* and mailing lists were so helpful. And they were very approachable (low cost and multiple client tools). In it's early days, before the organization nazis showed up with more focus on rules than helpfulness, StackOverflow was that way. But at this point the internet feels like one of the tourist bazaars when I visited Jamaica (loved Jamaica). It used to feel more like a public library. Various attempts to monetize/fund various things, an explosion in options, have all rendered the internet far less usable. It used to be so much easier to find answers to things I know are possible, but don't remember the exact syntax for (e.g. "how do I test for directory existence in a bash script?"). To some degree, I view the rise of these GAs as a condemnation of what the internet has become from its original aspirations.
The newer lower skilled workers improved the most. I know that my art skill has gone from far below average to pretty good with generative AI art tools.
Maybe the whole "people will need UBI because robots will replace all the marginal workers" argument will fail because marginally productive workers will get so much better at their jobs.
In reality, automation and mechanization of manufacturing got rid of the skilled craftsman. Using the assembly line, Ford replaced artisans with people who were much less skilled.
I still don't have good intuitions about overall economical impacts, but that part struck me as well because I've been wondering whether it'll be more "the best can use these tools better" or "the best will lose their competitive advantage".
What is important to consider though is that in this case here, the AI tool in question merely provided pre-written responses for agents to accept/reject/adapt, and how the effects are distributed may well shift as tools become more sophisticated and will just automatically handle most low-stakes/simple cases.
but things like google/amazon pushed the unskilled part onto the buyer. ordering online wasn't a thing before the internet, you needed make a call or go to a store.
Call center agent is no longer a job with a secure future I guess.
If 5000 call center agents saw an average increase in productivity by 14%, that means the company now needs 700 people less to handle all incoming calls.
I wouldn't be surprised if that 14% goes up to over 50% soon. In the past, I often asked on IRC when I had a question only someone with deep knowledge about a technology could answer. Nowadays I get a better answer from ChatGPT in over 50% of these cases.
You committed the cardinal sin that an X% increase is offset by a (100-X)% decrease. The number is a reduction of 12.3% or 614 agents. Your point is unchanged but I just want to call out a 50% productivity boost won't be a loss of half the agents. "Only" a reduction of 33%.
Quibble: It's probably not going to be a 1:1 loss of jobs.
A call center's goal is to find the right spot along the cost/satisfaction curve. You can pay more money to answer calls quickly and increase satisfaction, or a smaller amount of money to answer calls slowly with lower satisfaction. More satisfied customers are more likely to spend more money with you, so as an operator you're trying to find the level of support that optimizes your bottom line.
If AI reduces the cost per call resolution, you're shifting that cost/satisfaction curve. Unless the wait-time to satisfaction relationship is perfectly linear, the new optimum will result in not firing 100% of the workers who could be "replaced" by productivity gains. You'll instead keep some of them around, with a lower total cost, and higher total customer satisfaction.
You'll still reduce staffing, but probably less than the raw productivity gain number.
> They attribute the increase to three factors: agents, who could participate in multiple chats at once, spent about 9 percent less time per chat, handled about 14 percent more chats per hour, and successfully resolved about 1.3 percent more chats overall.
14% is good during focus time but it's interesting that this results in "only" 1.3% overall increase. One potential limiting factor that comes to mind is the energy of the support rep. Doing 14% more during the same time might result in more fatigue, more recovery time, smaller increase in overall productivity. I also consider if there's some other process bottleneck downstream but that's a thought that stuck out to me.
edit: Reading it again, I see "successfully" resolved. Maybe it's as boring as incomplete data.
14% actually seems impactful and about right. I can't say current AI has influenced my work in web dev much, as most of the stuff I do is one of:
- Not well documented, leaving AI with no idea what to do;
- Somewhat complicated where pure boilerplate code from LLM just doesn't cut it anymore;
- Requires a lot more context than current LLMs can handle;
That said, tools like GH Copilot certainly gave me at least 10% boost. With the code often being repetitive (but not similar enough to e.g. be extracted to a separate function) the auto-completions definitely save a lot of time.
While I'm sceptical as to how far LLMs will go, AI as a whole (probably with different architecture in the future) will definitely go far beyond any productivity improvements currently possible.
"AI will enhance jobs rather than replace them" has always been the more practical argument for AI in the workforce. Humans cannot be eliminated, and those that have tried to fully replace humans with chatbots like ChatGPT or artists with Midjourney/Stable Diffusion haven't had as much success.
Unfortunately discourse around this particular topic tends to be less nuanced than it should be.
>and those that have tried to fully replace humans with chatbots like ChatGPT or artists with Midjourney/Stable Diffusion haven't had as much success.
Being that it's an application that's really only a few months/years old it would be rather insane and worrying if we did replace people that fast. A better way to look at it would be farming in the US. In the 1850s we went from a very large percentage of people working in and around feeding the population. Today that number is rather small. This change lead to a massive amount of people moving from rural areas and a further increase in industrialization and factory workers.
The question we have today is, ok, we won't eliminate all humans in jobs, but what percent will we eliminate? How fast will it happen? What meaningful labor that actually pays enough for people to live will crop up in its place? If we look back at the history of the 1800s industrial revolution it was a very unstable time. There were labor wars in the US over unionization. Do we want to blindly stumble into another age of instability because of rapid changes in the labor market?
Never forget, it's not a companies job to have as many employees as possible, it's just currently that correlates with how much money they make. It is completely possible for AI to tip this balance where companies make huge profits, but those profits are tightly held and not distributed to a large labor pool. Income inequality is a huge problem in a world with expensive assets.
This conversation is rarely in good faith. AI will replace jobs is code for "we'll pay you less and you'll be happy to have a job. in fact, we could replace you but we are sooo generous!!"
Tell the fed to squash demand! We can't be having demand for all this new productivity.
Jokes aside, when it comes to programming this stuff will actually have even bigger boosts for people who really understand how all the pieces fit together. It'll make gluing stuff up much less tedious. 14% seems way understated.
Your joke caries some weight though. The Fed always seems to be doing exactly the right thing for yesterday and exactly the wrong thing for tomorrow. They play technocrat for yesterday and harm the future while back-dating their impact.
Well, I meant moreso how building blocks connect, e.g. how HTML, JavaScript, CSS and React work together. For someone without that knowledge the code generators are pure magic. But if you already know how things connect you have a huge advantage. If you're able to say to the code generator "I want to connect this, this and that", you're much more literate in your ability to interact with code generators.
IMHO generative AI is an automated way of generating bullshit, which is fine as long as the data used for training is not bs itself you can probably derive some value in a limited set of contexts. Once you have the BS make it into the wild and it is fed back into the generator we will enter a BS downspiral.
If you're old enough to remember the world before search engines and how awesome search engines were in the beginning you've already seen this in action. Try using a search engine today. SEO spam made them almost unusable with really bad results.
This is the future of generative "AI". Show it work really really well for a small set of cases, get the imagination going and you're going heads first into a situation involving and emperor and lack of clothes.
The AI hype is also not new. !e've had several cycles of "thinking machines" taking over. I don't see how this time it's different.
It's always something that gets hyped up and gets the hustlers going: vr, ml, selfdriving, crypto, llms. Look at the tech and how it works, use it, ignore the hype.
>Once you have the BS make it into the wild and it is fed back into the generator we will enter a BS downspiral.
This is already shown to not be an issue in research from the past year. It is possible to synthesize high quality data that still provides benefit when subsequently trained on.
It makes sense when you remember that this is what organic intelligence was doing all along.
I can't wait for the internet games of "jailbreak the call center Ai over the phone and get it to say offensive things or phone-sex you".
No really it's going to happen probably the same day that the system is implemented.
Let's keep in mind that current AI is probably similar to where where the internet was in 1993. Somewhat useful but still very far away from realizing its real potential.
Have there been any good studies on programmer productivity? 14% would be a just a blip in the constant flow of productivity-enhancing language, tools, architectures etc. that have been steady accelerating the pace and scale of development for decades ...
When I moved from Java to Rails, for example ... definitely at least 14% more productive, I'm guessing much much higher. Even just from load times.
Or going from printed documentation all the way to Stack Overflow - how many multiples was that?
68000 to M2 ... page reload to React's fast refresh ... etc. etc.