Hacker Newsnew | past | comments | ask | show | jobs | submit | mritchie712's commentslogin

the "Prompt Management" part of these products always seemed odd. Does anyone use it? Why?

I do understand why it’s a product - it feels a bit like what databricks has with model artifacts. Ie having a repo of prompts so you can track performance changes against is good. Especially if say you have users other than engineers touching them (ie product manager wants to AB).

Having said that, I struggled a lot with actually implementing langfuse due to numerous bugs/confusing AI driven documentation. So I’m amazed that it’s being bought to be really frank. I was just on the free version in order to look at it and make a broader recommendation, I wasn’t particularly impressed. Mileage may vary though, perhaps it’s a me issue.


I thought the docs were pretty good just going through them to see what the product was. For me I just don't see the use-case but I'm not well versed in their industry.

I think the docs are great to read, but implementing was a completely different story for me, ie, the Ask AI recommended solution for implementing Claude just didn’t work for me.

They do have GitHub discussions where you can raise things, but I also encountered some issues with installation that just made me want to roll the dice on another provider.

They do have a new release coming in a few weeks so I’ll try it again then for sure.

Edit: I think I’m coming across as negative and do want to recommend that it is worth trying out langfuse for sure if you’re looking at observability!


Iterating on LLM agents involves testing on production(-like) data. The most accurate way to see whether your agent is performing well is to see it working on production.

You want to see the best results you can get from a prompt, so you use features like prompt management an A/B testing to see what version of your prompt performs better (i.e. is fit to the model you are using) on production.


We use it for our internal doc analysis tool. We can easily extract production genrrations, save them to datasets and test edge cases. Also, it allows prompt separation in folders. With this, we have a pipeline for doc abalysis where we have default prompts and the user can set custom prompts for a part of of the pipeline. Execution checks for a user prompt before inference, if not, uses default prompt, which is already cached on code. We plan to evaluate user prompts to see which may perform better and use them to improve default prompt.

I made something called `ultraplan`. It's is a CLI tool that records multi-modal context (audio transcription via local Whisper, screenshots, clipboard content, etc.) into a timeline that AI agents like Claude Code can consume.

I have a claude skill `/record` that runs the CLI which starts a new recording. I debug, research, etc., then say "finito" (or choose your own stopword). It outputs a markdown file with your transcribed speech interleaved with screenshots and text that you copied. You can say other keywords like "marco" and it will take a screenshot hands-free.

When the session ends, claude reads the timeline (e.g. looks at screenshots) and gets to work.

I can clean it up and push to github if anyone would get use out of it.



Definitely interested in that!

Added link above!

Sounds interesting I would love to use it if you get a chance to push to github


to keep your mac awake:

    caffeinate -di

Thank you! Did not know about this command before - good to know

tacking on to the "New Kind Of" section:

New Kind of QA: One bottle neck I have (as a founder of a b2b saas) is testing changes. We have unit tests, we review PRs, etc. but those don't account for taste. I need to know if the feature feels right to the end user.

One example: we recently changed something about our onboarding flow. I needed to create a fresh team and go thru the onboarding flow dozens of times. It involves adding third party integrations (e.g. Postgres, a CRM, etc.) and each one can behave a little different. The full process can take 5 to 10 minutes.

I want an agent go thru the flow hundreds of times, trying different things (i.e. trying to break it) before I do it myself. There are some obvious things I catch on the first pass that an agent should easily identify and figure out solutions to.

New Kind of "Note to Self": Many of the voice memos, Loom videos, or notes I make (and later email to myself) are feature ideas. These could be 10x better with agents. If there were a local app recording my screen while I talk thru a problem or feature, agents could be picking up all sorts of context that would improve the final note.

Example: You're recording your screen and say "this drop down menu should have an option to drop the cache". An agent could be listening in, capture a screenshot of the menu, find the frontend files / functions related to caching, and trace to the backend endpoints. That single sentence would become a full spec for how to implement the feature.


> Why LLMs Suck at OCR

I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it".

So saying they "Suck" makes me not take your opinion seriously.


yeah models are definitely improving, but we've found even the latest ones still hallucinate and infer text rather than doing pure transcription. we carry out very rigorous benchmarks against all of the frontier models. we think the differentiation is in accuracy on truly messy docs (nested tables, degraded scans, handwriting) and being able to deploy on-prem/vpc for regulated industries.


they need to convince customers its what they need


The only SaaS that AI has eaten for us is Retool. It wasn't the cost (we were paying < $200 per month). Retool has become more clunky to use vs. writing code with AI to solve the problems we used Retool for.


Curious if you can share more about the types of things you were using retool for that you've migrated off?


We released a meltano target for DuckLake[0]. dlt has one now too. Pretty easy to sync pg -> ducklake.

I've been really happy with DuckLake, happy to answer any questions about it.

DuckDB has always felt easier to use vs. Clickhouse for me, but both are great options. If I were you, I'd try both options for a few hours with your use case and pick the one that feels better.

0 - https://www.definite.app/blog/target-ducklake


I love DuckDB from a product perspective and appreciate the engineering excellence behind it. However, DuckDB was primarily built for seamless for in-process analytics, data science, data-preparation/ETL workloads than real-time customer facing analytics.

ClickHouse’s bread and butter is real-time analytics for customer-facing applications, which often come with demanding concurrency and latency requirements.

Ack, totally makes sense that both are amazing technologies - you could try both and test them at the scale your real-time application may reach, and then choose the technology that best fits your needs. :)


I tested DuckDB and even Motherduck and this was my takeaway. Square hole, round peg situation.


Nice, what would be your typical setup?

You keep like 1 year's worth of data in your "business database", and then archive the rest in S3 with parquet and query with DuckDB ?

And if you want to sync everything, even "current data", to do datascience/analytics, can you just write the recent data (eg the last week of data or whatever) in S3 every hours/days to get relatively up-to-date data? And doesn't that cause the S3 data to grow needlessly (eg does it replace, rather than store an additional copy of recent data each hour?)

Do you have kind of "starter project" for a Postgres + DuckLake integration that I could look at to see how it's used in practice, and how it makes some operations easier?


Once you have meltano installed, it's just be:

    ```
    meltano run tap-postgres target-ducklake
    ```
Setting up meltano would be a bit more involved[0]

0 - https://www.notion.so/luabase/Postgres-to-DuckLake-example-2...


> commoditized

not for disney content. Disney can pick OpenAI as the winner for this by not signing deals and suing anyone else.


I just had a sales call with someone from Microsoft who was looking for an AI tool to automate some Excel work they were doing. I doubt they'll buy our product, but it gave me a good laugh.


Cursor promises to do this[0] in the product, so, especially on HN, it'd be best to start with "why this is better than Cursor".

> favorite doc sites so I do not have to paste URLs into Cursor

This is especially confusing, because cursor has a feature for docs you want to scrape regularly.

0 - https://cursor.com/docs/context/codebase-indexing


The goal here is not to replace Cursor’s own local codebase indexing. Cursor already does that part well. What Nia focuses on is external context. It lets agents pull in accurate information from remote sources like docs, packages, APIs, and broader knowledge bases


That’s what GP is saying. This is the Docs feature of Cursor. It covers external docs/arbitrary web content.

`@Docs` — will show a bunch of pre-indexed Docs, and you can add whatever you want and it’ll show up in the list. You can see the state of Docs indexing in Cursor Settings.

The UX leaves a bit to be desired, but that’s a problem Cursor seems to have in general.


yeah ux is pretty bad and overall functionality. it still relies on a static retrieval layer and limited index scope.

+ as I mentioned above there are many more use cases than just coding.Think docs, APIs, research, knowledge bases, even personal or enterprise data sources the agent needs to explore and validate dynamically.


As an AI user (claude code, rovo, github copilot) I have come across this. In code it didnt build something right where it needed to use up to date docs. Luckily those people have now made an MCP but I had to wait. For a different project I may be SOL. Suprised this isnt solved, well done for taking it on.

From a business point of view I am not sure how you get traction without being 10x better than what Cursor can produce tomorrow. If you are successful the coding agents will copy your idea and then people being lazy and using what works have no inventive to switch.

I am not trying to discourage. More like encourage you to figure out how you get that elusive moat that all startups seek.

As a user I am excited to try it soon. Got something in mind that this should make easier.


thanks! will be waiting for ur feedback


This is different because of the background refresh, the identifier extraction and the graph. I know because I use cursor and am building the exact same thing oddly enough.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: