Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Claude vs. GPT Agent Comparison (github.com/kadoa-org)
2 points by hubraumhugo on April 11, 2024 | hide | past | favorite
Anthropic's recent announcement of tool use/function calls caught my attention, specifically their claim that the Claude models can correctly handle 250+ tools with >90% accuracy. I've been working with GPT function calling for a while and noticed that the recall for larger and more complex functions is quite low. So, I decided to compare GPT and Claude's performance in using different tools for tasks like web scraping and browser automation.

Learnings:

- AI agents still work best for simple, well-constrained tasks.

- To create a successful agent, you need to provide it with good tools. The LLM can then figure out the correct sequence of tool calls itself, which feel like a promising direction.

- Tool use is still quite slow and often very expensive. I've spend around $50 just on experimenting with Claude for one day. Imagine what the testing would cost for a production-scale system. Making the unit economics work is difficult but will improve as LLM costs continue to drop.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: