Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm sad programmers lacking a lot of experience will read this and think it's a solid run-down of good ideas.




I’m more afraid that some manager will read this and impose rules on their team. On the surface one might think that having more test coverage is universally good and won’t consider trade offs. I have a gut feeling that Goodhart’s Law accelerated with AI is a dangerous mix.

Goodhart's Law works on steroids with AI. If you tell a human dev "we need 100% coverage," they might write a few dummy tests, but they'll feel shame. AI feels no shame - it has a loss function. If the metric is "lines covered" rather than "invariants checked," the agent will flood the project with meaningless tests faster than a manager can blink. We'll end up with a perfectly green CI/CD dashboard and a completely broken production because the tests will verify tautologies, not business logic

"fast, ephemeral, concurrent dev environments" seems like a superb idea to me. I wish more projects would do it, it lowers the barrier to contributions immensely.

> "fast, ephemeral, concurrent dev environments" seems like a superb idea to me.

I've worked at one (1) place that, whilst not quite fully that, they did have a spare dev environment that you could claim temporarily for deploying changes, doing integration tests, etc. Super handy when people are working on (often wildly) divergent projects and you need at least one stable dev environment + integration testing.

Been trying to push this at $CURRENT without much success but that's largely down to lack of cloudops resources (although we do have a sandbox environment, it's sufficiently different to dev that it's essentially worthless.)


Yeah, this is something I'd like more of outside of Agentic environments; in particular for working in parallel on multiple topics when there are long-running tasks to deal with (eg. running slow tests or a bisect against a checked out branch -- leaving that in worktree 1 while writing new code in worktree 2).

I use devenv.sh to give me quick setup of individual environments, but I'm spending a bit of my break trying to extend that (and its processes) to easily run inside containers that I can attach Zed/VSCode remoting to.

It strikes me that (as the article points out) this would also be useful for using Agents a bit more safely, but as a regular old human it'd also be useful.


What’s bad about them? We make things baby-safe and easy to grasp and discover for LLMs. Understandability and modularity will improve.

I have almost 30 years of experience as a programmer and all of this rings true to me. It precisely matches how I've been working with AI this year and it's extremely effective.

Could you be more specific in your feedback please.

100% test coverage, for most projects of modest size, is extremely bad advice.

Pre-agents, 100% agree. Now, it's not a bad idea, the cost to do it isn't terrible, though there's diminishing returns as you get >90-95%.

LLMs don't make bad tests any less harmful. Nor they write good tests for the stuff people mostly can't write good tests for.

Okay, but is aiming for 100% coverage really why the bad tests are bad?

Aiming for 100% coverage is almost certain to cause bad tests, yes.

But not all bad tests come from a goal of 100% coverage.


In most cases I have seen bad tests, yes.

You just end up writing needless tests trying to trigger or mock error state from a 3rd party library that's never actually returning error, just the lib had a rule of "every call returns error code" in case something changes and it's needed.

The problem is that it is natural to have code that is unreachable. Maybe you are trying to defend against potential cases that may be there in the future (e.g., things that are yet implemented), or algorithms written in a general way but are only used in a specific way. 100% test coverage requires removing these, and can hurt future development.

It doesn't require removing them if you think you'll need them. It just requires writing tests for those edge cases so you have confidence that the code will work correctly if/when those branches do eventually run.

I don't think anyone wants production code paths that have never been tried, right?


laziness? unprofessionalism? both? or something else?

You forgot difficult. How do you test a system call failure? How do you test a system call failure when the first N calls need to pass? Be careful how you answer, some answers technically fall into the "undefined behavior" category (if you are using C or C++).

... Is that not what mocking is for?

all of the above.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: