More

bcherry · 2025-11-07T06:04:05 1762495445

wow thanks for leaving this comment - i now realize two things:

1. the farmer's almanac i thought of when i saw the title and even read the article is not going anywhere 2. i have never before heard of the farmer's almanac referred to in this notice

bcherry · 2025-10-27T22:40:46 1761604846

they'd have to be extra careful with cpython, it's got a lot of include

bcherry · 2025-08-07T22:25:09 1754605509

yeah i think they shot themselves in the foot a bit here by creating the o series. the truth is that GPT-5 _is_ a huge step forward, for the "GPT-x" models. The current GPT-x model was basically still 4o, with 4.1 available in some capacity. GPT-5 vs GPT-4o looks like a massive upgrade.

But it's only an incremental improvement over the existing o line. So people feel like the improvement from the current OpenAI SoTA isn't there to justify a whole bump. They probably should have just called o1 GPT-5 last year.

bcherry · 2025-05-19T17:08:13 1747674493

"The sculpture is already complete within the marble block, before I start my work. It is already there, I just have to chisel away the superfluous material."

- Michelangelo

bcherry · 2025-02-04T18:02:40 1738692160

Chat is a great UX _around_ development tools. Imagine having a pair programmer and never being allowed to speak to them. You could only communicate by taking over the keyboard and editing the code. You'd never get anything done.

Chat is an awesome powerup for any serious tool you already have, so long as the entity on the other side of the chat has the agency to actually manipulate the tool alongside you as well.

skydhash · 2025-02-04T20:52:57 1738702377

The real powerup is scripting. And if the actions are precise enough, macros. Much more efficient for a lot of tasks.

bcherry · on Oct 31, 2024

a little glossed over, but they do point out that most important improvement o1 has over gpt-4o is not it's "correct" score improving from 38% to 42% but actually it's "not attempted" going from 1% to 9%. The improvement is even more stark for o1-mini vs gpt-4o-mini: 1% to 28%.

They don't really describe what "success" would look like but it seems to me like the primary goal is to minimize "incorrect", rather than to maximize "correct". the mini models would get there by maximizing "not attempted" with the larger models having much higher "correct". Then both model sizes could hopefully reach 90%+ "correct" when given access to external lookup tools.

bcherry · on Oct 30, 2024

disagree - good products meet their users where they are and bury complexity under the hood. i can't imagine trying to use a calendar app (or any app really) that refuses to operate in any mode other than UTC.

ak217 · on Oct 30, 2024

OK but most people would agree that "only UTC" is not an ergonomic default. There is a balance.

Also, are the users where they are because they want to be there, or because long ago some government or religious leader forced something through and they go along with it because of some kind of inertia?

bcherry · on Oct 23, 2024

It's kind of interesting because I think most people implementing RAG aren't even thinking about tokenization at all. They're thinking about embeddings:

1. chunk the corpus of data (various strategies but they're all somewhat intuitive)

2. compute embedding for each chunk

3. generate search query/queries

4. compute embedding for each query

5. rank corpus chunks by distance to query (vector search)

6. construct return values (e.g chunk + surrounding context, or whole doc, etc)

So this article really gets at the importance of a hidden, relatively mundane-feeling, operation that occurs which can have an outsized impact on the performance of the system. I do wish it had more concrete recommendations in the last section and code sample of a robust project with normalization, fine-tuning, and eval.

bcherry · on Oct 2, 2024

hey sorry about that - ran into a snag with the API but we got it back online an hour ago! hope you get another chance to take a look! reply

bcherry · on Oct 2, 2024

hey sorry about that - ran into a snag with the API but we got it back online an hour ago! hope you get another chance to take a look!

kolchinski · on Oct 2, 2024

Thanks yep got it working, really great stuff!!