Hacker Newsnew | past | comments | ask | show | jobs | submit | taintegral's commentslogin

(I made rkyv) I took a look at lite3/tron because I was interested in seeing how it approached some of the issues I ran into with rkyv. I ended up with some additional information related to points 1 and 2 I figured I'd write down:

1) You're correct, rkyv uses a B-tree with a default branching factor of 6 [1]. I don't think I've ever implemented a binary tree in any version of rkyv - maybe one of the earliest versions used a sorted vec instead of a B-tree? I hate the B-tree map implementation because it is so complicated, even though it actually used to be even worse in 0.7 when it was implemented as a B+ tree. Sorry for making you read any of it!

2) Arbitrary mutations would definitely be an improvement on rkyv, and I was most interested to see what kind of approach lite3/tron took! Unfortunately growing the buffer is not supported [2] and freed memory isn't reclaimed [3]. These are the two actually difficult problems standing in the way of rkyv getting more comprehensive mutation support, so unfortunately I think there's nothing valuable to take away here.

The issue with growing the buffer is that you invalidate any pointers into the buffer and have to manually fix them up (e.g. by re-crawling the buffer). There's not a decent approach I'm aware of which manages these pointer fixups in a way that limits complexity and avoids really nasty API surfaces.

The issue with memory reclamation is that you effectively have to add an allocator to your buffer, and that allocator has to also be persistable. I actually got this part working in rel while that was still a project worth hacking on [4]. I just ran out of steam and willingness to play the type-tac-toe required to get a safe API out of it. The main problem I ran into there was related to value branding, since you shouldn't be allowed to make references to objects which cross between contiguous buffers.

This project is neat, but it does basically just look like an rkyv-converted `BTreeMap<String, JsonValue>`.

[1]: https://github.com/rkyv/rkyv/blob/main/rkyv/src/collections/... [2]: https://github.com/fastserial/lite3/blob/main/src/lite3.c#L5... [3]: https://lite3.io/design_and_limitations.html#autotoc_md29 (see note at end of section) [4]: https://github.com/rkyv/rel/blob/main/rel_example/src/main.r...


I'm the one who tried to make a low-mem builder for it that would work for merge sort like tasks that do some on-the-fly consolidation, e.g. LSM trees.

> This project is neat, but it does basically just look like an rkyv-converted `BTreeMap<String, JsonValue>`.

Sure sounded like that to me; I typically don't give C developers that use it for things like lite3/tron these days the trust needed to handle their implicit-`unsafe`-block mess with worse syntax/types.


> 'AI safety' is a meaningless term

I disagree with this assertion. As you said, safety is an attribute of action. We have many of examples of artificial intelligence which can take action, usually because they are equipped with robotics or some other route to physical action.

I think whether providing information counts as "taking action" is a worthwhile philosophical question. But regardless of the answer, you can't ignore that LLMs provide information to _humans_ which are perfectly capable of taking action. In that way, 'AI safety' in the context of LLMs is a lot like knife safety. It's about being safe _with knives_. You don't give knives to kids because they are likely to mishandle them and hurt themselves or others.

With regards to censorship - a healthy society self-censors all the time. The debate worth having is _what_ is censored and _why_.


Almost everything about tool, machine, and product design in history has been an increase in the force-multiplication of an individual's labor and decision making vs the environment. Now with Universal Machine ubiquity and a market with rich rewards for its perverse incentives, products and tools are being built which force-multiply the designer's will absolutely, even at the expense of the owner's force of will. This and widespread automated surveillance are dangerous encroachments on our autonomy!


I mean then build your own tools.

Simply put the last time we (as in humans) had full self autonomy was sometime we started agriculture. After that point the idea of ownership and a state has permeated human society and have had to engage in tradeoffs.


Mixing milliseconds down to picoseconds made the columns too wide. All of the benchmark data is also available as JSON in the benchmark_results directory if you wanted to drop it into sheets or some other analysis.


Author here, you’re correct. You can customize your validation context for your specific needs. For example, if you don’t have allocation available (i.e. `#![no_std]` without the alloc crate) then you’ll probably need to write your own mapping system to handle shared pointers. Or you can just not use them if that works better for you. That’s also a large part of why rkyv uses generics so heavily.

If your data is read-only then pointing to the same object from two locations is (usually) fine. But rkyv also supports in-place mutability, which requires validating that no two pointers will overlap each other. Otherwise you could have simultaneous mutable borrows to the same value which is UB.


> 6) Morality will degrade. Prostitution, drug use, nihilism, and degeneracy of every sort will explode.

> 7) America’s military will weaken dramatically as the quality of recruits plunges and we no longer have the money to properly maintain and supply our forces.

Tried and true right-wing media strategy: sucker people by talking about economic stress on the middle class (real issue), and gradually transition to moral hand-wringing and pumping the military-industrial complex (fake issues).

> 12) The government will institute a wealth tax. It will be sold as something to target the rich, but in practice, over time, they will be taking large amounts of money from people no one considers rich.

Interesting that they start by talking about how Campbell's soup is expensive and finish by talking about how the real threat is a wealth tax...


They start by talking about growing costs of living


This analysis is correct but the solution is not feasible. Changing a modification through `x` to a modification through `y` does indeed violate the semantics of `restrict`. The problem is that in order to detect this situation, we'd have to track the provenance of integers. In this specific example, we'd have to know that replacing `xaddr` with `y2addr` affects `x` and `y`. There is general consensus that tracking provenance for integers causes many more problems than it solves, so although this would solve the problem it is not feasible. This is why weak and strict provenance are being pursued instead.


With regards to optimization passes, does it matter if the analysis is infeasible? We agree the optimization/transformation from the first version to the second isn't valid, so I think it shouldn't have been done.

The original version with two stores and one load doesn't seem to have a problem. Having the optimizer punt when it gets confused by integer to pointer casts seems acceptable.


It's not sufficient to say "there's an int2ptr cast, so stop optimization." You can break the code up across several function calls, which means the code doing the optimization can't tell that there's an int2ptr cast around to shut down the optimization. (Most optimizations are intraprocedural, and even if they could look interprocedurally, there's no guarantee that the function body is even available for inspection--they could be in different C source files).

Instead, you'd have to instead prove that there's no possible int2ptr cast around everywhere, which is generally an impossible analysis.


> It's not sufficient to say "there's an int2ptr cast, so stop optimization."

Complicated or not, it's necessary that optimizations do not break correct code.

There doesn't seem to be a problem (UB or otherwise) in the first function at the top of the article, but the second one has a clear aliasing problem that violates the promise the `strict` makes. That translation was invalid.


If you can't track provenance to un-restrict the pointers because it's infeasible, then you have to give up on at least one of the optimization passes. In this case, the optimizations used are very fundamental and giving up on any one of them unilaterally would be catastrophic for performance. The provenance models being suggested add more nuance to the model (pointer provenance) so that we can keep all of the optimization passes while preventing these cases from being optimized incorrectly. Weak provenance says we can't optimize away the pointer to integer cast, strict provenance says we must provide provenance for integer to pointer casts. Weak provenance is broadly compatible with existing code (compiler semantics change) whereas strict provenance is not (language semantics change). The tradeoff is that strict provenance leads to better optimization in general.


Catastrophic sounds strong. As far as `restrict` goes, C was never that far behind Fortran in performance.

And if maintaining `restrict` for other passes is really important, maybe the order of the passes should be changed. I'm not pretending compiler optimization is simple, but I can't see any situation where having an incorrect optimization pass run first is the right thing to do. The broken pass needs to be fixed, and it shouldn't have emitted code with incorrect `restrict` on it.


`restrict` pointers have nothing to do with the underlying "object" they point into (an array in this case). `restrict` lets the compiler assume that reads through a restricted pointer or any pointer expressions derived from it are only affected by writes through that pointer or expressions derived from it. There are only two writes in this example: `*x = 0`, which is trivially correct; and `*ptr = 1`, where `ptr` is derived from `x` (via `ptr = (int *)(uintptr_t)x`) so this is also correct. However, it's now easy to see that the optimization replacing `xaddr` with `y2addr` causes undefined behavior since it changes `ptr` to be derived from `y`. The post addresses this in "The blame game" and mentions that integers could carry provenance but that it's infeasible to actually do this.

The weak provenance solution is to ban optimizing the pointer to integer casts since they have a (conceptual) side-effect. The strict provenance proposal points out that the side effect is only observable if the integer is cast back to a pointer, so we can keep optimizing pointer to integer casts and instead ban integer to pointer casts. For example, an operation like `(int *)xaddr` is banned under strict provenance. Instead, we provide a casting operation that includes the pointer to assume the provenance of; something like `(int * with x)xaddr`. With this new provenance-preserving cast, we can see that the final optimization of replacing `*x` with its previously assigned value of `0` is no longer possible because the code in between involves `x`.


> However, it's now easy to see that the optimization replacing `xaddr` with `y2addr` causes undefined behavior since it changes `ptr` to be derived from `y`.

Yeah this article is great and the framing is pretty perfect. It really shows that optimization passes can't remove information, else they run the risk of tricking later passes. I definitely agree with OP that "the incorrect optimization is the one that removed xaddr"; that optimization seems wild to me. You only know y is x + 1 because of the way it's constructed in the calling function (main). So the compiler... inlines an optimized version of the function that removes most use of x? Isn't that optimizer fundamentally broken? Especially in a language with `volatile` and `restrict`?


It's a static function, so the compiler knows main is the only caller. gcc -O optimizes the whole program down to printf("%d\n", 1).


Sure, but that requires compilation unit level analysis or inlining (when inlined you can include pointer provenance from main), otherwise you can't guarantee the relationship between x and y.

I guess what bugs me about optimizations is that it feels like something _I_ should be doing. Like if GCC told me this code optimizes down to printf 1 and why, I'd question what I was doing (and rightly so). Doing it automatically feels like too much spooky action at a distance.


In the case of the code we're talking about here, gcc/clang do rely on inlining to optimize down to the single printf. I don't think there's any actual compiler that does the dangerous and invalid optimization in the article.


OH! I've clearly misunderstood then. Rereading, it does look like this is just a hypothetical to illustrate the tension between allowing pointer-int-pointer round-trips and foiling analysis based on pointer provenance. OK I'm caught up, thank you haha.


Really not sure why the state space would only grow as n^k / k!. As well as what n and k are in this case. Adding more tiles or more actors would both dramatically increase the size of the state space for that input.


That question should go to the parent comment. I only corrected the assumption.

On your comment though, I don't think there's much of "drama" in increasing the state space. Really it is just under 2 bits per cell by width by height. I would say it grows exponentially to the size of the board.


I would really enjoy a proof that it is not!


I gave the thought some idle time, but it's been so long since I've constructed a proper hardness proof. If I do, I'll definitely make a post about it!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: