Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author here!

I wanted to share a follow-up to this post. https://bluuewhale.github.io/posts/further-optimizing-my-jav...

This time I went back with a profiler and optimized the actual hot path.

A huge chunk of time was going to Objects.equals() because of profile pollution / missed devirtualization.

After fixing that, the next bottleneck was ARM/NEON “movemask” pain (VectorMask.toLong()), so I tried SWAR… and it ended up faster (even on x86, which I did not expect).



FYI, we ended up implementing a _really_ nice SWAR version in the Carbon derivative of SwissTable that might be worth looking at for inspiration: https://github.com/carbon-language/carbon-lang/blob/trunk/co...

Can see the rest of that file and the adjacent `raw_hashtable.h` for the rest of the SwissTable-like implementation and `hashing.h` for the hash function.

FWIW, it consistently out-performs SwissTable in some respects, but uses a weaker but faster hash function that is good enough for the hash table, but not good for other use cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: