zhd's comments

zhd · on July 11, 2021

These are basically the problems that https://hypofuzz.com is designed to solve.

- You write standard property-based tests with Hypothesis, which fit into a traditional unit-test workflow (run with pytest, or unittest, or whatever)

- then HypoFuzz runs indefinitely, using feedback-guided fuzzing to find bugs that random generation is too slow for

- restarting the search on each new commit (or daily, etc) begins by replaying every distinguishable input found earlier so you don't have to start from scratch. There is _some_ loss of context, but that's inherent in testing different code.

Plus other nice stuff like allocating more compute time to tests which are finding new behaviour faster, for maximum efficiency. The roadmap includes exploiting SAT solvers to find very very rare behaviour, git history to focus on recently changed code, statistical risk analysis ("what's the expected time to next bug"), and also a budget-mode ("my compute costs $. 08/hr, stop when you expect a bug to cost $100 or more to find"). It's a good time to be working on testing!

zhd · on June 11, 2021

I've recently launched HypoFuzz - https://hypofuzz.com/ - which solves this by running it on a different server.

You run the tests briefly in CI to check for regressions, and then leave them running permanently on a fuzz server to search for new bugs. Nelson Elhage has a good writeup of this approach at https://blog.nelhage.com/post/two-kinds-of-testing/

eru · on June 14, 2021

Awesome!

zhd · on Feb 9, 2021

Yep! And it's also possible to run fuzzers [1, 2] or SAT-based verifiers against the same test harness :-)

1: https://hypofuzz.com/docs/literature.html 2: https://google.github.io/oss-fuzz/getting-started/new-projec...

zhd · on Feb 9, 2021

Hypothesis can report statistics on user-defined events, as well as the usual timing stuff: https://hypothesis.readthedocs.io/en/latest/details.html#tes...

I'd just check that when you're writing or changing the tests though; for nontrivial conditions it can take a very long time to get neglibible probability of any metatest failing in a given run, and flaky metatests are just as bad as the usual kind.

If this split is particularly important, we'd usually recommend just writing separate tests for data that satisfy A or B; you can even supply the generators with pytest.mark.parametrize if copy-pasting the test body offends.

zhd · on Feb 1, 2021

Re: takeover by US firms, note that while Murdoch was born Australian he gave up Australian in favor of US citizenship... precisely because it would have been illegal to own such a stake in US media firms as a non- or dual-citizen.

We would also like the Murdoch media and/or tech giants to pay tax in Australia, yes, but neither do at the moment.

908B64B197 · on Feb 1, 2021

Why does Australia allows such a large percentage of it's medias to be owned by a foreigner?

zzedd · on Feb 1, 2021

because 'the foreigner' controls the media. it's a race condition.

zhd · on Jan 8, 2021

Alas, Numpy arrays are limited to thirty-two dimensions.

This is fine, because with high numbers of dimensions you really can't afford to store a dense-matrix representation in RAM and 32 is plenty for any low-dimensional problem.

zhd · on Jan 8, 2021

Numpy arrays do indeed support the @-operator for matrix multiplication.

In fact, one of the common design questions when adding the @-operator to Python was "why do this, when it's not used anywhere in the core language or standard library?" - and the answer was "because it's important for Numpy and the numerical user community".

(and finally, adding __matmul__ and __rmatmul__ is entirely backwards compatible; you can use objects with those methods on Python 3.4 - it's only `A @ B` which is a syntax error)

zhd · on Nov 22, 2020

Bisect is nice, but it's not the fastest option.

If binning data, discretize it and then use a dict lookup - `grades_to_letters[grade//10]`, for example.

For insort, and indeed anything with sorted collections, just use the http://www.grantjenks.com/docs/sortedcontainers/ module. Inserting an element is worst-case sublinear time, and also faster than C-extensions. It's one of the very few data-structure libraries I use regularly.