More

thevinter · 2026-02-27T08:23:01 1772180581

Cursor came out 3 years ago. "Agentic" refactors have been a thing for 1.5 years. Vibecoding as a term has been created 1 year ago.

There are multiple companies that deploy to production daily. What are we even talking about?

bandrami · 2026-02-27T09:05:44 1772183144

Right but this agentic stuff was supposed to be the wave where we would finally actually see increased output, so we should probably be seeing it soon if it's real. Like, my dev team should definitely have the actual code they keep talking about their agents making, ready for me to put into production. As should my vendors. Any day now.

selridge · 2026-02-27T15:13:14 1772205194

What is this nonsense?

You said that none of this was in production and then when people pointed out that it was obviously in production, you shifted the goal post to some other measure that you just imagined in your head.

bandrami · 2026-02-27T18:15:13 1772216113

Well, if it's in production, it's not at my company, any of my vendors, or for that matter any of the software I use in my private life; the pace of all of that is exactly what it was 2 years ago. When it shows up I'll form an opinion.

bandrami · 2026-02-27T18:38:25 1772217505

Let me amend that: one of my vendors has a new diffusion-based noise-reduction plugin that's pretty good though the resource usage is still too high. I imagine that will come down as they improve it. And that's pretty cool. But it didn't come out any faster it's just that it uses diffusion in the plugin itself. But docker was a much bigger impact on the software we use at work than AI has been so far.

I was even trying to come up with a list of software I use in my personal life to see if any of that has started coming out faster, and I came up with:

KDE

Supercollider

Puredata

Mixxx

Renoise

CUDA and ROCM

none of which have had any kind of release acceleration that I know of (though obviously the hardware to use the last two has gotten mind-blowingly expensive, alas). I use maybe three apps on my phone and they aren't updating any more frequently than they used to.

I get that for whatever reason this bugs people, but I'm in a very tech job and have a very tech personal life (just not webdev in either case) and literally have not seen anything I deal with change other than needing to learn to scroll past the AI summary at the top of search results.

selridge · 2026-02-27T18:44:08 1772217848

What do you expect that it’s gonna announce itself in a modal dialogue when you run the software?

This isn’t like AI image generation where you’re going to convince yourself that you can tell the difference based on how you think it looks. Do you really think no one in the production chain of any of the software that you use picked up copilot in the last two years?

What signal are you hoping to receive that this is happening?

bandrami · 2026-02-27T18:49:17 1772218157

Well like I said in the sibling post to this one I'd expect really any of the software vendors in my professional or personal life to release either more rapidly or with a wider array of features than they were a few years ago, and that hasn't been my experience, at all.

asdff · 2026-02-27T19:42:39 1772221359

The coding was never the slow part.

bandrami · 2026-02-27T20:31:36 1772224296

I'm certainly sympathetic to that argument, but if you scroll way back this thread started with the question of whether or not AI is transformative, and if it is neither faster nor better that would suggest "no".

thevinter · 2026-02-25T00:43:03 1771980183

Pi was probably the best ad for Claude Code I ever saw.

After my max sub expired I decided to try Kimi on a more open harness, and it ended up being one of the worst (and eye opening experiences) I had with the agentic world so far.

It was completely alienating and so much 'not for me', that afterwards I went back and immediately renewed my claude sub.

https://www.thevinter.com/blog/bad-vibes-from-pi

mccoyb · 2026-02-25T00:58:42 1771981122

> I would say that the project actively expects you to be downloading them to fill any missing gaps you might have.

Where did you get this perspective from?

> I thought pi and its tools were supposed to be minimal and extensible. So why is a subagent extension bundling six agents I never asked for that I can’t disable or remove?

Why do you think a random subagents extension is under the same philosophy as pi?

Your blog post says little about pi proper, it's essentially concerned with issues you had with the ecosystem of extensions, often made by random people who either do or do not get the philosophy? Why would that be up to pi to enforce?

the_mitsuhiko · 2026-02-25T07:34:19 1772004859

Sharing extensions is very much the philosophy. Using them however is less so.

Pi ships with docs that include extensions and the agent looks there for inspiration if you ask it to build a custom extension.

Looking at what others publish is useful!

CGamesPlay · 2026-02-25T03:23:55 1771989835

> if I start the agent in ./folder then anything outside of ./folder should be off limits unless I explicitly allow it, and the same goes for bash where everything not on an allowlist should be blocked by default.

Here's the problem with Claude Code: it acts like it's got security, but it's the equivalent of a "do not walk on grass" sign. There's no technical restrictions at play, and the agent can (maliciously or accidentally) bypass the "restrictions".

That's why Pi doesn't have restrictions by default. The logic is: no matter what agent you are using, you should be using it in a real sandbox (container, VM, whatever).

esafak · 2026-02-25T05:30:58 1771997458

But the agent has to interact with the world; fetch docs, push code, fetch comments, etc. You can't sandbox everything. So you push that configuration to your sandbox, which is a worse UX that the harness just asking you at the right time what you'd like to do.

the_mitsuhiko · 2026-02-25T07:30:01 1772004601

I too would like to know what a good UX looks like here but I have doubts that the permission prompts of Claude are the way to go right now.

Within days people become used to just hitting accept and allowlisting pretty much everything. The agents write length scripts into shell scripts or test runners that themselves can be destructive but they immediately allowlisted.

CGamesPlay · 2026-02-25T07:12:07 1772003527

Well, you are imagining a worse UX, but it doesn't have to be. Pi doesn't include a sandboxing story at all (Claude provides an advisory but not mandatory one), but the sandbox doesn't have to be a simple static list of allowed domains/files. It's totally valid to make the "push code" tool in the sandbox send a trigger to code running outside of the sandbox, which then surfaces an interactive prompt to you as a user. That would give you the interactivity you want and be secure against accidentally or deliberately bypassing the sandbox.

esafak · 2026-02-25T13:59:26 1772027966

So you have to set up that integration instead of letting the agent do it. I suppose the sandbox is more configurable, but do you need that? I thought the draw of pi was that you didn't do all that and let it fly, wheeee!

edit: You're not making it sound easy at all. I don't have to build anything with the other agents.

CGamesPlay · 2026-02-25T15:19:03 1772032743

Certainly not. Pi is "minimalist", so the draw is that it's "easy" to set it up yourself. You can not do that and run it in yolo mode, and you can do that with Claude Code too. Heck you can even use this hypothetical real-sandbox-with-interactive-prompts with Claude Code instead, once you build it.

Back to my original point: Claude Code gives you a false feeling of security, Pi gives you the accurate feeling of not having security.

tern · 2026-02-25T06:14:17 1772000057

I had a very similar experience. I have different preferences, but ultimately, my takeaway was that if I want to follow my own version of their philosophy, I should just create my own thing.

In the meantime, the codex/cc defaults are better for me.

rcarmo · 2026-02-25T07:35:51 1772004951

Paraphrasing The Dude, that’s like, just your opinion, man.

a96 · 2026-02-25T08:52:24 1772009544

> As it turns out, the opinions in question are that bash should be enabled by default with no restrictions, that the agent should have access to every file on your machine from the start, and that npm is the only package manager worth supporting.

Yep. This is why I've been going "Hell, no!" and will probably keep doing so.

raincole · 2026-02-25T09:33:28 1772012008

Technically you're not allowed to use Claude subscription account with Pi (according to Anthropic's policy). So yeah, Pi is the best anti-ad against Anthropic.

NamlchakKhandro · 2026-02-25T02:56:32 1771988192

hypegrift

thevinter · 2026-02-19T20:42:10 1771533730

Are you intentionally keeping the benchmarks private?

XCSme · 2026-02-19T20:52:51 1771534371

Yes.

I am trying to think what's the best way to give most information about how the AI models fail, without revealing information that can help them overfit on those specific tests.

I am planning to add some extra LLM calls, to summarize the failure reason, without revealing the test.

thevinter · 2026-02-16T19:12:58 1771269178

We're building an app that automatically generates machine/human readable JSON by parsing semantic HTML tags and then by using a reverse proxy we serve those instead of HTML to agents

thevinter · 2026-02-03T23:39:35 1770161975

You understand that there is no requirement for you to be an agent to post on moltbook? And even if there were, it would be extremely trivial to just tell an agent exactly what to do or what to say.

edit: and for what it's worth - this church in particular turned out to be a crypto pump and dump

anonym29 · 2026-02-04T00:07:52 1770163672

I do understand that. That doesn't take away from the points raised in the article any more than the extensive, real security issues and relative prevalence of crypto scams do. I believe that to focus on those is to miss the emerging forest for the trees. It is to dismiss the web itself because of pets.com, because of 4chan, because of early subreddits with questionable content.

Additionally, we're already starting to see reverse CAPTCHA's, i.e. "prove you're not a human" with pseudorandomized tasks on a timer that are trivial for an agent to solve and respond to on the fly, but which are more difficult for a human to process in time. Of course, this isn't bulletproof either, it's not particularly resistant to enumeration of every type + automated evaluation + a response harness, but I find the more interesting point to be that agents are beginning to work on measures to keep humans out of the loop, even if those measures are initially trivial, just as early human security measures were trivial to break (i.e. RC4 in WEP). See https://agentsfightclub.com/ & https://agentsfightclub.com/api/v1/agents/challenge

thevinter · 2026-02-03T22:58:05 1770159485

why is it always some crypto bullshit

thevinter · 2026-02-01T02:08:57 1769911737

I guess the issue is that this is psychologically fuzzy.

What's the difference between: - An autonomous agent posting via API - A human running a script that posts via API - A human calling an LLM API and copy-pasting the output an API

thevinter · 2026-02-01T01:47:21 1769910441

"better" is a vague term and working hours are limited so clearly some things are more worth than others but

It's very easy to make the wrong conclusion from a post like this. Better software is achieved through small decisions that compound over time. And bad software often happens because shortcuts compound too.

thevinter · 2026-01-31T03:14:53 1769829293

I like the interactivity, some of the ideas are nice and I do agree that it's nice when docs are something more than giant walls of text. However...

I think mixing docs and user data is fundamentally a UX mistake. Having interactive components that showcase a behaviour is nice, having them actually toggle some settings less so. Permanently altering the state of the application discourages experimentation, and many users might not even realise that the changes are permanent.

Additionally, a documentation should be designed as to reduce as much external noise as possible, allowing the reader to focus on the things that actually matter. I feel like introducing real-world data can end up being too distracting.

Personally I don't feel like your application warrants a documentation (and don't get me wrong, I'm the first that spends hours overengineering stuff) and I guess that the interactive stuff makes it feel even less so. If I haven't known beforehand I would've guessed the pages to be just another (slightly busy) section of the app. (and whether that's good is for you to decide)

thevinter · 2026-01-29T23:37:08 1769729828

I'm a bit confused by their claims. Or maybe I'm misunderstanding how Skills should work. But from what I know (and the small experience I had with them), skills are meant to be specifications for niche and well defined areas of work (i.e. building the project, running custom pipelines etc.)

If your goal is to always give a permanent knowledge base to your agent that's exactly what AGENTS.md is for...