I'm sad programmers lacking a lot of experience will read this and think it's a ...

SoKamil · 2025-12-29T23:24:47 1767050687

I’m more afraid that some manager will read this and impose rules on their team. On the surface one might think that having more test coverage is universally good and won’t consider trade offs. I have a gut feeling that Goodhart’s Law accelerated with AI is a dangerous mix.

KurSix · 2025-12-30T09:48:40 1767088120

Goodhart's Law works on steroids with AI. If you tell a human dev "we need 100% coverage," they might write a few dummy tests, but they'll feel shame. AI feels no shame - it has a loss function. If the metric is "lines covered" rather than "invariants checked," the agent will flood the project with meaningless tests faster than a manager can blink. We'll end up with a perfectly green CI/CD dashboard and a completely broken production because the tests will verify tautologies, not business logic

zem · 2025-12-29T23:31:24 1767051084

"fast, ephemeral, concurrent dev environments" seems like a superb idea to me. I wish more projects would do it, it lowers the barrier to contributions immensely.

zimpenfish · 2025-12-30T10:02:34 1767088954

> "fast, ephemeral, concurrent dev environments" seems like a superb idea to me.

I've worked at one (1) place that, whilst not quite fully that, they did have a spare dev environment that you could claim temporarily for deploying changes, doing integration tests, etc. Super handy when people are working on (often wildly) divergent projects and you need at least one stable dev environment + integration testing.

Been trying to push this at $CURRENT without much success but that's largely down to lack of cloudops resources (although we do have a sandbox environment, it's sufficiently different to dev that it's essentially worthless.)

frio · 2025-12-30T00:36:57 1767055017

Yeah, this is something I'd like more of outside of Agentic environments; in particular for working in parallel on multiple topics when there are long-running tasks to deal with (eg. running slow tests or a bisect against a checked out branch -- leaving that in worktree 1 while writing new code in worktree 2).

I use devenv.sh to give me quick setup of individual environments, but I'm spending a bit of my break trying to extend that (and its processes) to easily run inside containers that I can attach Zed/VSCode remoting to.

It strikes me that (as the article points out) this would also be useful for using Agents a bit more safely, but as a regular old human it'd also be useful.

manmal · 2025-12-29T22:43:30 1767048210

What’s bad about them? We make things baby-safe and easy to grasp and discover for LLMs. Understandability and modularity will improve.

nathan_f77 · 2025-12-30T07:48:04 1767080884

I have almost 30 years of experience as a programmer and all of this rings true to me. It precisely matches how I've been working with AI this year and it's extremely effective.

baobun · 2025-12-29T22:46:10 1767048370

Could you be more specific in your feedback please.

jaredcwhite · 2025-12-29T22:57:38 1767049058

100% test coverage, for most projects of modest size, is extremely bad advice.

CuriouslyC · 2025-12-29T23:36:41 1767051401

Pre-agents, 100% agree. Now, it's not a bad idea, the cost to do it isn't terrible, though there's diminishing returns as you get >90-95%.

marcosdumay · 2025-12-30T00:42:56 1767055376

LLMs don't make bad tests any less harmful. Nor they write good tests for the stuff people mostly can't write good tests for.

zahlman · 2025-12-30T04:28:08 1767068888

Okay, but is aiming for 100% coverage really why the bad tests are bad?

marcosdumay · 2025-12-30T13:22:49 1767100969

Aiming for 100% coverage is almost certain to cause bad tests, yes.

But not all bad tests come from a goal of 100% coverage.

jeltz · 2025-12-30T09:57:54 1767088674

In most cases I have seen bad tests, yes.

PunchyHamster · 2025-12-30T12:47:18 1767098838

You just end up writing needless tests trying to trigger or mock error state from a 3rd party library that's never actually returning error, just the lib had a rule of "every call returns error code" in case something changes and it's needed.

pca006132 · 2025-12-30T00:36:41 1767055001

The problem is that it is natural to have code that is unreachable. Maybe you are trying to defend against potential cases that may be there in the future (e.g., things that are yet implemented), or algorithms written in a general way but are only used in a specific way. 100% test coverage requires removing these, and can hurt future development.

sgk284 · 2025-12-30T01:20:55 1767057655

It doesn't require removing them if you think you'll need them. It just requires writing tests for those edge cases so you have confidence that the code will work correctly if/when those branches do eventually run.

I don't think anyone wants production code paths that have never been tried, right?

bdangubic · 2025-12-29T23:02:47 1767049367

laziness? unprofessionalism? both? or something else?

spc476 · 2025-12-30T00:49:58 1767055798

You forgot difficult. How do you test a system call failure? How do you test a system call failure when the first N calls need to pass? Be careful how you answer, some answers technically fall into the "undefined behavior" category (if you are using C or C++).

zahlman · 2025-12-30T04:28:39 1767068919

... Is that not what mocking is for?

rvz · 2025-12-29T23:11:52 1767049912

all of the above.