More

kodablah · 2025-12-19T21:42:50 1766180570

> The thing is, if you want people to understand durability but you also hide it from them, it will actually be much more complicated to understand and work with a framework.

> The real golden ticket I think is to make readable intuitive abstractions around durability, not hide it behind normal-looking code.

It's a tradeoff. People tend to want to use languages they are familiar with, even at the cost of being constrained within them. A naive DSL would not be expressive enough for the turing completeness one needs, so effectively you'd need a new language/runtime. It's far easier to constrain an existing language than write a new one of course.

Some languages/runtimes are easier to apply durable/deterministic constraints too (e.g. WASM which is deterministic by design and JS which has a tiny stdlib that just needs a few things like time and rand replaced), but they still don't take the ideal step you mention - put the durable primitives and their benefits/constraints in front of the dev clearly.

vouwfietsman · 2025-12-20T18:12:50 1766254370

This still assumes an all encompassing transparent durability layer, what I'm arguing for is the opposite: something that can just be a library in any language, and any runtime, because it does not try to be clever about injecting durability in otherwise idiomatic code.

kodablah · 2025-12-19T15:23:04 1766157784

> that your entire workflow still needs to be idempotent

If just meaning workflow logic, as the article mentions it has to be deterministic, which implies idempotency but that is fine because workflow logic doesn't have side effects. But the side-effecting functions invoked from a workflow (what Temporal dubs "activities") of course _should_ be idempotent so they can be retried upon failure, as is the case for all retryable code, but this is not a requirement. These side effecting functions can be configured at the callsite to have at-most-once semantics.

In addition to many other things like observability, the value of durable execution is persisted advanced logic like loops, try/catch, concurrent async ops, sleeping, etc and making all of that crash proof (i.e. resumes from where it left off on another machine).

kodablah · 2025-11-21T20:37:18 1763757438

> The author's point about the friction from explicit step wrappers is fair, as we don't use bytecode generation today, but we're actively exploring it to improve DX.

There is value in such a wrapper/call at invocation time instead of using the proxy pattern. Specifically, it makes it very clear to both the code author and code reader that this is not a normal method invocation. This is important because it is very common to perform normal method invocations and the caller needs to author code knowing the difference. Java developers, perhaps more than most, likely prefer such invocation explicitness over a JVM agent doing byte code manip.

There is also another reason for preferring a wrapped-like approach - providing options. If you need to provide options (say timeout info) from the call site, it is hard to do if your call is limited to the signature of the implementation and options will have to be provided in a different place.

gunnarmorling · 2025-11-21T20:45:22 1763757922

I'm still swinging back and forth which approach I ultimately prefer.

As stated in the post, I like how the proxy approach largely avoids any API dependency. I'd also argue that Java developers actually are very familiar with this kind of implicit enrichment of behaviors and execution semantics (e.g. transaction management is weaved into applications that way in Spring or Quarkus applications).

But there's also limits to this in regards to flexibility. For example, if you wanted to delay a method for a dynamically determined period of time, rather than for a fixed time, the annotation-based approach would fall short.

kodablah · 2025-11-21T20:54:20 1763758460

At Temporal, for Java we did a hybrid approach of what you have. Specifically, we do the java.lang.reflect.Proxy approach, but the user has to make a call instantiating it from the implementation. This allows users to provide those options at proxy creation time and not require they configure a build step. I can't speak for all JVM people, but I get nervous if I have to use a library that requires an agent or annotation processor.

Also, since Temporal activity invocations are (often) remote, many times a user may only have the definition/contract of the "step" (aka activity in Temporal parlance) without a body. Finally, many times users _start_ the "step", not just _execute_ it, which means it needs to return a promise/future/task. Sure this can be wrapped in a suspended virtual thread, but it makes reasoning about things like cancellation harder, and from a client-not-workflow POV, it makes it harder to reattach to an invocation in a type-safe way to, say, wait for the result of something started elsewhere.

We did the same proxying approach for TypeScript, but we saw as we got to Python, .NET, and Ruby that being able to _reference_ a "step" while also providing options and having many overloads/approaches of invoking that step has benefits.

kodablah · 2025-11-21T19:58:42 1763755122

> is it the dev ergonomics that's cool here?

Yup. Being able to write imperative code that automatically resumes where it left off is very valuable. It's best to represent durable turing completeness using modern approaches of authoring such logic - programming languages. Being able to loop, try/catch, apply advanced conditional logic, etc in a crash-proof algorithm that can run for weeks/months/years and is introspectable has a lot of value over just using queues.

Durable execution is all just queues and task processing and event sourcing under the hood though.

kodablah · 2025-11-20T21:44:41 1763675081

I think correlating "pushes per repository" to certain languages is interesting. The top "pushes per repository" are C++, TeX, Rust, C, and CSS. I guess it's no surprise many would also consider those the most guess-and-check or hard-to-get-right-upfront-without-tooling languages too.

Etheryte · 2025-11-20T22:34:50 1763678090

It's unclear if that's the takeaway here. Pushes per repository can just as well indicate a project that's just old, or active, or popular, or etc.

IshKebab · 2025-11-20T22:33:07 1763677987

Really? I don't think Rust is like that because it has such strong compile time checking. More likely because Rust 1.0 hadn't even been released in 2014 so by definition every Rust project was extremely new and active.

kodablah · 2025-11-20T22:40:45 1763678445

Yes, maybe the causation assumption here is inaccurate.

kodablah · 2025-09-11T12:35:33 1757594133

Deterministic output is needed when LLMs are used for validations. This can be anything from input validation at runtime to a CI check leveraging LLMs. It can be argued this is not an acceptable use of AI, but it will become increasingly common and it will need to be tweaked/tested. You cannot tweak/test a response you don't know you're going to get.

dahcryn · 2025-09-11T15:38:13 1757605093

yeah indeed, regression testing for chatbots that use RAGs would involve making sure the correct response comes from the RAG.

Today we have a extremely hacky workaround by ensuring that at least the desired chunk from the RAG is selected, but it's far from ideal and our code is not well written (a temporary POC written by AI that has been there for quite some months now ...)

kodablah · 2025-09-05T13:30:23 1757079023

As someone that has had to build libraries for the nuances of coroutine vs thread async in several languages (Python, .NET, Java, Ruby, etc), I believe how Ruby did fibers to be the best.

Ruby's standard library was not littered with too many sync helpers, so making them fiber capable without much standard library effect is a big win. In Python, explicit coloring is required and it's easy to block your asyncio coroutines with sync calls. In .NET, it is nice that tasks can be blocking or not, but there is one fixed global static thread pool for all tasks and so one is tacitly encouraged to do CPU bound work in a task (granted CPU bound fibers are an issue in Ruby too), not to mention issues with changing the default scheduler. In Java, virtual threads take a Ruby-esque approach of letting most code work unchanged, but the Java concurrency standard library is large with slight potential incompatibilities.

Ruby is both 1) lucky it did not have a large standard library of thread primitives to adapt, and 2) smart in that they can recognize when they are in a fiber-scheduler-enabled environment or not.

Granted that lack of primitives sure does hurt if you want to use concurrency utilities like combinators. And at that point, you reach for a third party and you're back in the situation of not being as portable/obvious.

kodablah · 2025-08-26T12:15:48 1756210548

I use RBS/steep with great success to catch plenty of nil issues early, but it's similarly not great from a dev POV to have to maintain a completely separate set of rbs files (or special comments with rbs-inline). Also in my experience, modern editors don't leverage it for typing/intellisense.

kodablah · 2025-06-18T01:44:13 1750211053

I believe the definition of workflows in this article is inaccurate. Workflows in modern engines do not take predefined code paths, and agents are effectively the same as workflows in these cases. The redefinition of workflows seems to be an attempt to differentiate, but for the most part an agent is nothing more than a workflow that is a loop that dynamically invokes things based on LLM responses. Modern workflow engines are very dynamic.

sothatsit · 2025-06-18T02:25:56 1750213556

I think the distinction is more about the "level of railroading".

Workflows have a lot more structure and rules about information and control flow. Agents, on the other hand, are often given a set of tools and a prompt. They are much more free-form.

For example, a workflow might define a fuzzy rule like "if customer issue is refund, go to refund flow," while an agent gets customer service tools and figures out how to handle each case on its own.

To me, this is a meaningful distinction to make. Workflows can be more predictable and reliable. Agents have more freedom and can tackle a greater breadth of tasks.

gwd · 2025-06-18T11:18:35 1750245515

Just to emphasize your point, below is a workflow I wrote for an LLM recently, to do language tagging (e.g., of vocab, grammar structures, etc). It's very different than what you'd think of as an "agent", where the LLM has tools and can take initiative.

LLMs are amazingly powerful in some ways, but without this kind of "scaffolding", simply not reliable enough to make consistent choices.

---

1. Here are: a) a "language schema" describing what kinds of tags I want and why, with examples, b) The text I want you to tag c) A list of previously-defined tags which could potentially be relevant (simple string match)

List for yourself which pre-existing tags you plan to use when doing tagging.

[LLM generates a list of tags]

2. Here is a,b,c from above, and d) your own tag list

Please write a draft tag.

[LLM writes a draft]

3. Here is a-d from above, plus e) your first draft, and f) Some programmatically-generated "linter" warnings which may or may not be violations of the schema.

Please check over your draft to make sure it follows the schema.

[LLM writes a new draft]

Agent checks for "hard" rules, like making sure there's a 1-1 correlation between the text and the tags. If no rules are violated move to step 5.

4. Here is a-e from above, plus g) your most recent draft, and h) known rule violations. Please fix the errors.

[LLM writes a new draft]

Repeat 4 until no hard rules are broken.

5. [and so on]

kodablah · 2025-06-18T16:48:29 1750265309

> Agents, on the other hand, are often given a set of tools and a prompt. They are much more free-form.

This defines how workflows are used with modern systems in my experience. Workflows are often not predictable, they often execute one of a set of tools based on a response from a previous invocation (e.g. an LLM call).

simonw · 2025-06-18T04:36:38 1750221398

You appear to be making the mistake of assuming that the only valid definition for the term "workflow" is the definition used by software such as https://airflow.apache.org/

https://www.merriam-webster.com/dictionary/workflow thinks the word dates back to 1921.

There no reason Anthropic can't take that word and present their own alternative definition for it in the context of LLM tool usage, which is what they've done here.

kodablah · 2025-06-18T16:46:39 1750265199

Right, I am saying I don't think their definition is an accurate one with the modern use of the term. It's an artificially limited definition to fit a narrative. An agent is nothing more than a very limited workflow.

kodablah · 2025-05-09T17:04:57 1746810297

> I hate this idea that Ruby needs to be more like Python or Typescript

It's not be more like those, it's be more like helpful, author-friendly programming which is very much Ruby's ethos.

Every time I think about ripping out all of the RBS sig files in my project because I'm tired of maintaining them (I can't use Sorbet for a few reasons), Steep catches a `nil` error ahead of time. Sure we can all say "why didn't you have test coverage?" but ideally you want all help you can get.

PaulHoule · 2025-05-09T17:30:02 1746811802

As a Pythoner the most direct value I get out of types is the IDE being smart about autocompletion, so if I'm writing

   with db.session() as session:
      ... use the session ...

I can type session. and the IDE knows what kind of object it is and offers me valid choices. If there's a trouble with it, it's that many Pythonic idioms are too subtle, so I can't fully specify an API like

   collection = db["some_collection"]
   collection.filter(lambda doc: doc["amount"].as_integer() > 500)
   collection.filter({"name": "whatever"})
   collection.filter({"field": lambda f: f["subfield"] = jsonb({"a":1, "b":"2})})

not to mention I'd like to be able to vary what gets returned using the same API as SQLAlchemy so I could write

   collection.filter(..., yield_per=100)

and have the type system know it is returning an iterator that yields iterators that yields rows as opposed to an iterator that yields rows. It is simple, cheap and reusable code to forward a few fetch-control arguments to SQLAlchemy and add some unmarshalling of rows into documents but if I want types to work right I need an API that looks more like Java.

maleldil · 2025-05-10T09:22:39 1746868959

If I understand correctly, you can do this with overloads. They don't change the function implementation, but you can type different combinations of parameters and return types.

jimbokun · 2025-05-09T19:58:19 1746820699

> Sure we can all say "why didn't you have test coverage?"

Well types are a form of test performed by the compiler.

jaredsohn · 2025-05-09T21:51:42 1746827502

LLMs can probably help maintain them so that probably could be solved if you start using LLMs more.

This maybe already exists, but it would be nice if RBS or Sorbet had a command you could run that checks that all methods have types and tries to 'fix' anything missing via help from an LLM. You'd still be able to review the changes before committing it, just like with lint autofixing. Also you'd need to set up an LLM API key and be comfortable sharing your code with it.