This is an interesting article, and I'll probably take a look at the go language profiling capabilities and built-in machine independent assembler.
The idea of using bitmaps for retrieval on multiple indexes isn't new. Not surprisingly Knuth has a good discussion of the technique [Knuth], it was explained very well with a nice illustration of using notched recipe cards that is worth taking a peek at.
-- straying off topic below this line --
One surprising note, Calvin N. Mooers invented zatocoding in 1947. This technique used the notched cards for information retrieval and cited by Knuth under the subsection superimposed coding. Calvin N. Mooers went on to invent the "Reactive Typewriter" in the 1960s. This was an visionary implementation of how users might interact with computers using a terminal device (like a teletype). Mooers' Reactive Typewriter was programmed using a language he invented named TRAC.
TRAC is a macro based language. It, like the roughly contemporaneous GPM of Christopher Strachey (1965), is a Turing complete language based on macros that can define other macros, etc. I learned about TRAC in the mid 1970s from Ted Nelson's book Computer Lib/Dream Machines [Nelson]. It was one of three languages featured in that book. A fun little book, Etudes for Programmers, which I bought in 1978 inspired me to implement a version of TRAC. It's a nice exercise.
Many years later (I believe it was 1991), during an Open Software Foundation meeting in Cambridge, Massachusetts, I got to meet Mooers in person, he was sitting in the front row during a presentation, and I got to speak with him briefly afterwards.
By the way, Christopher Strachey wrote the first paper on the concept of time-sharing in 1959 and in the 1970's with Dana Scott is credited with coming up with the important development in the theory of programming languages known as Denotational Semantics.
[Knuth] Donald Knuth. (1973).The Art of Computer Programming, Vol 3/Sorting and Searching. Section 6.5, Retrieval on Secondary Keys. Addison-Wesley.
Writing an object indexer is a fun project. My Python indexer, ducks [1], went through a lot of the same development stages, such as supporting only exact-value matches at first. I tried binning too, but ultimately using a BTree was simpler and performed well enough (500ns for a log-n tree traversal, vs 50ns hash).
Still looking for a way to apply bitmap indexing in ducks; it would surely speed up multi-index queries, but I haven't found an implementation that's worth the tradeoffs yet.
I was using Python in-memory vector search engine called Annoy [1] to do semantic search on various kinds of data. It worked great for finding "similar" objects. Story A has similar text to story B, image A looks like image B, etc.
So Annoy solved the hard part. But doing basic metadata lookups was surprisingly hard in Python. How do I get all images matching some criteria (say, size range, or tags)? I'd have to serialize them all into a DB, and use a DB index. Databases are great, but they add code bloat and overhead; I'm usually working Jupyter notebooks and I like keeping as few external dependencies as possible.
So I wrote ducks as a quick, convenient way to index anything.
There's lots of other usage patterns of course, it's very generic. It makes a great Wordle / crossword solver too. "Find me words where the first letter is A and the fifth letter is L" is very fast in ducks.
Indexing is just one of those things you always need. Python didn't have a good way to do it, and now it does!
The slowest step in ducks right now is combining the results of a multi-index query. Suppose I get 1000 objects matching one attribute and another 1000 objects from another, I've now got to intersect those two sets. Right now I'm using sets, or in some cases, sorted lists that act as sets [1]. But intuitively, a bitwise "and" ought to be faster, due to SIMD etc.
It's really just a matter of thinking about how to write it. This video was quite helpful as it mentioned that Postgres uses a similar approach, so probably I'll have a look at their implementation first and take inspiration from there.
> Suppose I get 1000 objects matching one attribute and another 1000 objects from another, I've now got to intersect those two sets. Right now I'm using sets, or in some cases, sorted lists that act as sets [1]. But intuitively, a bitwise "and" ought to be faster, due to SIMD etc.
I worked on this exact problem today.
This is probably super obvious and I bet you're already doing something similar (or smarter still), but I'm excited about it so I'm telling you anyway.
I was doing the naive O(m log n) algorithm where you fetch one set of items and then test them each against the other index one by one (both implemented as B-trees).
I got an enormous speedup[1] after I realized that you can get a discount on index look-ups by testing a sorted list of multiple items in the same tree traversal operation. Not only does each item tested allow you to reduce the search scope, you can often omit repeating several of the calculations closer to the root.
[1] Eventually 10x, but that's in part due to naively parallelizing the workload. The single threaded approach was still something like 4-5x compared to the original approach. 20k ops/s for thousands of items tested against a 800 Mb index.
100%, and are freaking fast...and pretty efficient, especially for high throughput applications tracking tons of changing data. We've found bitmaps handle updates, inserts, and schema changes really elegantly - and quickly (like sub-second). Depending on the use case, looking at cohorts MoM or YoY that freshness doesn't entirely matter, but computing cohorts in real-time hour-over-hour, or even minute-over-minute, for in-app actions/personalization is insanely valuable to keep users engaged (or what ever entity type you are analyzing and taking action on, whether it's devices, nodes across a network, etc).
I think it's fair to say that the use cases for bitmaps have not yet been fully explored. I believe their use in areas of analytics and OLAP/RTOLAP will continue to grow as the desire to infer and target demo's across households increases.
I didn't realize Lucene used their own flavor of roaring bitmaps! Super Cool. I certainly think of search when it comes to Lucene, where as the cited material from Vikram & FeatureBase are targeting more general analytics.
Lucene is fast, but probably can't handle analytical type questions on massive amounts of data as fast as a pure binary index can. I could be wrong about that though. I do know Solr has it because Lucene does...
Sure, I completely get it. However, does that mean bitmaps have been fully explored? For example, OLAP is only growing as a market as the need for realtime analysis on the most fresh data. I mean, we never going to have 'less' data nor are we ever going to seek results 'slower'. Where I am going with this, utilizing technologies like bitmaps in OLAP services is an area I believe will see continued growth. While Lucene is a known commodity, I think we can agree it hasn't solved the sector, right?
Yes. Molecula, the company behind Pilosa, has rebranded itself to FeatureBase. It's offering both a managed cloud/serverless option and a downloadable Open Source build for implementing binary indexes. Both ARM and Intel versions are available (but not for Windows, yet). I just joined the team and am working on some interesting demos for machine learning applications.
While FeatureBase is great for doing analytics on massive data sets, it's also well suited for being used as a model's feature store, for both training and inference.
Very approachable introduction to this data structure.
The assembly implementation, though, reminded me of the stone soup story. Where a stranger is boiling a rock in a pot of water, and the townspeople intrigued by the idea of stone soup bring various ingredients to add to it and everyone is shocked that "stone soup" could taste so good.
The "Go" implementation ends up being pretty much all assembly by the end, in an article supposedly about a fast bitmap index implementation written in Go.
Three soldiers wander into a town and watch as the townspeople shutter themselves into their homes.
The first two soldiers, out of rations, start to knock on doors and asking if anyone has any food, but no one answers.
The third says: "I have an idea, start a cooking fire in the town square and get some water boiling. I've got to find the right ingredient."
They do as they were told, and the third makes a show of trying to find just the right stone. Making sure the town hears and sees them. They bring the "perfect" stone back, and place it in the pot of boiling water.
The other two soldiers start to question, but the third tells them to just shut up and watch.
After boiling a stone for 10 to 15 minutes, one of the villagers comes out to ask what the soldiers are doing. "Making stone soup." replies the soldier, "Though it is usually better with some potatoes or carrots."
The villager lights up, "I have carrots!" They run to their home, come back with a handful of carrots, and add them to the soup.
Seeing this, other villagers begin to emerge from their homes, and each suggesting a new ingredient.
Over and over this happens until someone tries to put raw salmon in and is thrown out of the town.
I mean, until they have a full and hardy stew going, with more than enough to feed the entire town.
The morale of the story being to always have enough rations.
You can start with Ensopado de Borrego, slowly cooked in black pots like in the Beiras regions, or to stay with fish, Bacalhau à Minhota, from the Minho region.
I'm not sure I follow what it is they're benchmarking (although I struggle reading text that has too many images so I may be confused as to what they're even doing).
I'd expect the bottleneck in a database index to be your disk or RAM access patterns. It seems very counterintuitive to see a real-world speed-up from SIMD if that is the case. The fact that they're seeing one in the benchmark would suggest maybe the benchmark isn't entirely capturing the right thing.
Generally speaking, the nice thing about bitmap indexes is that you're able to access the data in a very granular way. If you have a WHERE clause that's calling out specific values, you only access the data which is pertinent to those values within a column, you don't have to scan the whole column. This is simply due to the structure of a bitmap index where you have a separate bitmap for each value in the domain of a column.
Furthermore, access patterns for bitmaps tend to be very linear and cache/prefetch friendly.
I think it's very feasible that adding SIMD could result in a real-world speedup in an otherwise well-optimized in-memory system. I agree if you need to go to disk, that will likely dominate the overall performance of a single query, but it may still be overall more efficient which can still help in a multi-user situation.
Disk and Memory can both be bottle necks, however with compression on Roaring bitmaps you typically keep your entire index in memory and compute in a compressed state, avoiding slow disk. You do in fact become CPU bound on queries, even intelligent multi-threading, and CPU efficiency comes into play (especially so as a bitmap begins expanding across several shards).
Thank you for sharing this. As someone who is working to learn more about bitmaps, id like to ask how you came across this? Is there a specific community or other that you follow? Thanks!
I don't think they're unknown to most db developers, just not the best fit for a common query loads. You can scan through a bitmap really fast yes, but you can also read a list of document IDs from a BTree very quickly for any specific value.
Would like to see some benchmarks of this database for comparison.
The idea of using bitmaps for retrieval on multiple indexes isn't new. Not surprisingly Knuth has a good discussion of the technique [Knuth], it was explained very well with a nice illustration of using notched recipe cards that is worth taking a peek at.
-- straying off topic below this line --
One surprising note, Calvin N. Mooers invented zatocoding in 1947. This technique used the notched cards for information retrieval and cited by Knuth under the subsection superimposed coding. Calvin N. Mooers went on to invent the "Reactive Typewriter" in the 1960s. This was an visionary implementation of how users might interact with computers using a terminal device (like a teletype). Mooers' Reactive Typewriter was programmed using a language he invented named TRAC.
TRAC is a macro based language. It, like the roughly contemporaneous GPM of Christopher Strachey (1965), is a Turing complete language based on macros that can define other macros, etc. I learned about TRAC in the mid 1970s from Ted Nelson's book Computer Lib/Dream Machines [Nelson]. It was one of three languages featured in that book. A fun little book, Etudes for Programmers, which I bought in 1978 inspired me to implement a version of TRAC. It's a nice exercise.
Many years later (I believe it was 1991), during an Open Software Foundation meeting in Cambridge, Massachusetts, I got to meet Mooers in person, he was sitting in the front row during a presentation, and I got to speak with him briefly afterwards.
By the way, Christopher Strachey wrote the first paper on the concept of time-sharing in 1959 and in the 1970's with Dana Scott is credited with coming up with the important development in the theory of programming languages known as Denotational Semantics.
[Knuth] Donald Knuth. (1973).The Art of Computer Programming, Vol 3/Sorting and Searching. Section 6.5, Retrieval on Secondary Keys. Addison-Wesley.
[Nelson] Theodor H Nelson. (1974). Computer lib : you can and must understand computers now. ISBN 0893470023. OCLC 11182412. https://en.wikipedia.org/wiki/Computer_Lib/Dream_Machines
[Strachey] https://en.wikipedia.org/wiki/Christopher_Strachey#cite_note...
[Wetherell] Charles Wetherell, Etudes for Programmers, Prentice Hall, 1977, https://www.amazon.com/Etudes-Programmers-Charles-Wetherell/...