More

razcle · 2025-02-25T19:57:49 1740513469

Hey Tombert,

wrt did you read the article? I was quite specific about the ways I think LLMs are blurring the lines. I don't think its true for general engineering but I do think its true for applications being built with LLMs.

Also its still very early

nexus_six · 2025-02-25T21:14:46 1740518086

It's not early. It has reached a plateau. Are there betting odds for "AI" (LLM) benchmarks somewhere? I will bet money

tombert · 2025-02-25T21:25:39 1740518739

Reaching a plateau doesn't imply that it's not early. It's still entirely plausible that we come up with a newer better model in ten years that gives us true AGI or runs on cheaper hardware or just gives us a closer approximation to human reasoning.

razcle · 2025-02-25T21:37:12 1740519432

https://www.metaculus.com/questions/5121/date-of-artificial-...

nexus_six · 2025-02-26T19:35:01 1740598501

Is there a paid market? This seems to be "play money" or "play points". I can't find anything like Polymarket with similar contracts.

tombert · 2025-02-25T20:13:49 1740514429

I'm just saying that I don't think I agree with some of this; even if PMs are writing the prompts (and calling that "prompt engineering"), it's not equivalent because they don't know how to audit the code given to them.

A PM might generate that SQL thing I mentioned and just blindly cut and paste it. For any application with more than one user, that is a bug, it's incorrect, and it's not like this is some deep cut: upserts happen all over the place.

I didn't finish the entire article, I disagreed with the line, "Prompting Is Here To Stay and PMs—Not Engineers—Are Going to Do It", because I fundamentally do not think that is true unless AI models get considerably better.

It's possible they will, maybe OpenAI will crack AGI or maybe these models will just get a lot better at figuring out your intent or maybe there's another variable that I'm not thinking of that will fix it.

I hate the term "prompt engineer" because I don't think it's engineering, at least not really. I will agree that there's a skill to getting ChatGPT to give you what you want, I think I'm pretty good at it even, but I hesitate to call it engineering because it lacks a lot of "objectivity". I can come up with a "good" prompt that will 90% of the time give me a good answer, but 10% of the time give me utter rubbish, which doesn't really feel like engineering to me.

I saw the line: `As AI models become able to write complex applications end-to-end, the work of an engineer will increasingly start to resemble that of a product manager.`, and while I don't completely disagree, I also don't completely agree either. Even when I heavily abuse ChatGPT for code generation, it doesn't feel at all like I'm barking orders to a human. It might superficially resemble it but I'm not convinced that it's actually that similar.

I hope I'm not coming off as too much of a dick here, I apologize if I am, and obviously a blog post in which you wax philosophical about the implications of new technology is perfectly fine. I think I'm just a bit on edge with this stuff because you get morons like Zuckerberg claiming they'll be able to replace all their junior and mid level engineers with AI soon, and I think that's ridiculous unless they have access to considerably better models than I do.

arcsincosin · 2025-02-25T21:43:12 1740519792

My read–which may be wrong–is that much of the article is discussing applications where the end user is interacting with an interface that queries an LLM using baked-in prompts (in one case, a marketing content generation tool). These prompts are being written by the PM. The PM is not writing prompts LLMs to generate code, the PM is writing prompts which are hidden behind a web form or button or something in an interface, hence the prompts being part of the codebase. The author argues that when a PM is editing these prompts they are delivering an artifact that looks more like an engineer's artifact than a PM's artifact, traditionally.

rsynnott · 2025-02-25T22:59:08 1740524348

> Also its still very early

“Jam tomorrow” will only get you so far.

razcle · 2025-02-25T19:51:49 1740513109

I agree with that. What do you think about the point thought that for LLM agents and applications, prompts and tool definitions might matter more than code?

razcle · 2025-02-25T19:50:26 1740513026

Hi,

I totally agree that we're not at a point where AI can write most code. Though, I didn't ever say that. I just think its blurring the boundary between engineers and PMs with both taking on more of the others role.

Also, it shouldn't be surprising that the product we're building is aligned with what we believe about the world :)

R

razcle · 2025-02-25T19:48:09 1740512889

Hi Hexator,

OP here. Thanks for the (harsh!) feedback, I'll take it in a growth mindset.

The post does genuinely reflect my experiences and I do believe what I said.How would you advise I change the post to make it better?

Which parts do you think are untrue?

Thanks!

ryeguy_24 · 2025-02-25T19:57:56 1740513476

Your article starts off with a grand proclamation that isn't true in most cases. Then you talk about how anyone can prompt an LLM. Most of HN already knows that engineers aren't needed to prompt an LLM. Then you state:

"By allowing non-technical people and domain experts to use English as the programming language, AI blurs the line between specification and implementation."

This is a non sequitur. You are saying that some PMs can update the prompts for an AI application. But it does not follow that AI can now specify and implement software. If you are talking about specifically "LLM Applications that just pre-prompt a model can be updated by a PM instead of an engineer". Then yes, that I would agree with. But you've extrapolated this wildly and close out with marketing for your tool.

razcle · 2025-02-25T20:04:06 1740513846

Ok I think I need to go into more depth on the examples.

I think HN knows that anyone can prompt LLMs. I do think its interesting though that this has allowed PMs/SMEs to direclty influence products that are deployed to millions of people. That seems genuinely novel. Maybe I over egged it

swesour · 2025-02-25T19:51:19 1740513079

[flagged]

razcle · 2025-02-25T19:54:32 1740513272

I think I'm just trying hard to be overly polite in the face of negative criticism and that sounds a lot like ChatGPT!

razcle · 2025-02-25T19:53:51 1740513231

Nope :(

But I guess I need to up my game if you can't tell the difference

razcle · on Sept 6, 2023

Not the OP but can confirm that Humanloop has full support for OpenAI function calling.

razcle · on May 31, 2023

I think I worded this poorly. What he said was that a lot of people say they want open-source models but they underestimate how hard it is to serve them well. So he wondered how much real benefit would come from open-sourcing them.

I think this is reasonable. Giving researchers access is great but for most small companies they're likely better off having a service provider manage inference for them rather than navigate the infra challenge.

roganartu · on May 31, 2023

The beauty of open source is that the community will either figure out how to make it easier, or collectively decide it’s not worth the effort. We saw this with stable diffusion, and we are seeing it with all the existing OSS LLMs.

“It’s too hard, trust us” doesn’t really make sense in that context. If it is indeed too hard for small orgs to self host then they won’t. Hiding behind the guise of protecting these people by not open sourcing it seems a bit disingenuous.

choppaface · on June 1, 2023

Here is how hard it is to serve and use LLMs: https://github.com/ggerganov/llama.cpp

doctor_eval · on June 1, 2023

“The original implementation of llama.cpp was hacked in an evening.”

solarkraft · on June 1, 2023

You're saying the same thing.

"I'm not sharing my chocolate with you because you probably wouldn't like it"

antupis · on June 1, 2023

If it goes same way as other open sourced models it takes about 5 days that someone will get it running at m1.

rushingcreek · on May 31, 2023

If he says he's inclined to open-source GPT-3, I don't see any good arguments not in favor of giving startups the choice of how they can run inference.

razcle · on May 9, 2022

Reading this atm. About half way through and already it's one of my favourite books. Would love to contribute to the notes if you're accepting PRs

maccaw · on May 9, 2022

Absolutely! You're more than welcome. Thank you.

https://github.com/team-reflect/beginning-of-infinity

razcle · on April 8, 2022

Hi Raza here, one of the other co-founders.

I know that HN likes to nerd out over technical details so thought I’d share a bit more on how we aggregate the noisy labels to clean them up.

At the moment we use the great Skweak [1] open source library to do this. Skweak uses an HMM to infer the most likely unobserved label given the evidence of the votes from each of the labelling functions.

This whole strategy of first training a label model and then training a neural net was pioneered by Snorkel. We’ve used this approach for now but we actually think there are big opportunities for improvement.

We’re working on an end-to-end approach that de-noises the labelling function and trains the model at the same time. So far we’ve seen improvements on the standard benchmarks [2] and are planning to submit to Neurips.

R

[1]: Skweak package: https://github.com/NorskRegnesentral/skweak [2] Wrench benchmark: https://arxiv.org/abs/2109.11377

razcle · on Sept 1, 2021

Humanloop | Infrastructure for AI | Backed by YC and Index | London + Remote Hiring

- Software Engineers: front-end specialist

- Machine Learning Engineer

- Interaction designer

(see jobs.humanloop.com for full details)

We're a team of ML researchers and Engineers who've worked at Google, Amazon and Microsoft research on some of the biggest ML projects out there.

ML and deep learning are a new software paradigm that needs new tools. We're building a platform for Human-in-the-loop ML that drastically reduces data needs and accelerates time to deployment. In the future people will program by teaching and curating datasets. (https://medium.com/@karpathy/software-2-0-a64152b37c35), we're making software 2.0 possible.

Team: humanloop.com/about

Contact the founders at founders@humanloop.com

razcle · on July 23, 2021

Hi all, I wrote this piece and will be around for the next hour or two if anyone fancies a chat about GPT-3 and large scale language models!