> The cynic in me feels like the former could probably be done by chatgpt off the shelf.
Hello! I'm the owner of the feature in question who experimented with chatgpt last year in the course of building the feature (and working with Hamel to improve it via fine-tuning later).
Even today, it could not work with ChatGPT. To generate valid queries, you need to know which subset of a user's dataset schema is relevant to their query, which makes it equally a retrieval problem as it does a generation problem.
Beyond that, though, the details of "what makes a good query" are quite tricky and subtle. Honeycomb as a querying tool is unique in the market because it lets you arbitrarily group and filter by any column/value in your schema without pre-indexing and without any cost w.r.t. cardinality. And so there are many cases where you can quite literally answer someone's question, but there are multitudes of ways you can be even more helpful, often by introducing a grouping that they didn't directly ask for. For example, "count my errors" is just a COUNT where the error column exists, but if you group by something like the HTTP route, the name of the operation, etc. -- or the name of a child operation and its calling HTTP route for requests -- you end up actually showing people where and how these errors come from. In my experience, the large majority of power users already do this themselves (it's how you use HNY effectively), and the large majority of new users who know little about the tool simply have no idea it's this flexible. Query Assistant helps them with that and they have a pretty good activation rate when they use it.
Unfortunately, ChatGPT and even just good old fashioned RAG is often not up to the task. That's why fine-tuning is so important for this use case.
Thanks for the reply. Huge fan of honeycomb and the feature. Spent many years in observability and built a some of the large in use log platforms. Tracing is the way of the future and hope to see you guys eat that market. I did some executive tech strategy stuff at some megacorp on observability and it’s really hard to unwedge metrics and logs but I’ve done my best when it was my focus. Good luck and thanks for all you’re doing over there.
Hello! I'm the owner of the feature in question who experimented with chatgpt last year in the course of building the feature (and working with Hamel to improve it via fine-tuning later).
Even today, it could not work with ChatGPT. To generate valid queries, you need to know which subset of a user's dataset schema is relevant to their query, which makes it equally a retrieval problem as it does a generation problem.
Beyond that, though, the details of "what makes a good query" are quite tricky and subtle. Honeycomb as a querying tool is unique in the market because it lets you arbitrarily group and filter by any column/value in your schema without pre-indexing and without any cost w.r.t. cardinality. And so there are many cases where you can quite literally answer someone's question, but there are multitudes of ways you can be even more helpful, often by introducing a grouping that they didn't directly ask for. For example, "count my errors" is just a COUNT where the error column exists, but if you group by something like the HTTP route, the name of the operation, etc. -- or the name of a child operation and its calling HTTP route for requests -- you end up actually showing people where and how these errors come from. In my experience, the large majority of power users already do this themselves (it's how you use HNY effectively), and the large majority of new users who know little about the tool simply have no idea it's this flexible. Query Assistant helps them with that and they have a pretty good activation rate when they use it.
Unfortunately, ChatGPT and even just good old fashioned RAG is often not up to the task. That's why fine-tuning is so important for this use case.