Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I spent way too large a portion of my last position teaching developers about indexes, query plans and underlying join types and their impact on performance and memory consumption.


Just curious do you have such knowledge for columnar database such as Vertica?


Not Vertica though that looks very interesting. I do have a lot of experience with Redshift though. The difficulty is most implementations of data warehouses are fairly bespoke, even down to query plan and execution so knowledge on Redshift may not completely transfer to Vertica for instance.


Thanks. But how does one approach to learn the internals for these things? It's not like MySQL or SQL Server or PostgreSQL that we have tons of books and very detailed documentation. For Vertica we only have a doc, no books, just provided as is.

It seems to be the norm for everything that takes flight around 2010. Of course many are open sourced so those are OK I guess.


The thing is finding the terminology, in the case of Redshift that is Sort Key, Distribution Key and primary key (though these aren't true primary keys they do influence the query planner).

It took me a few minutes but I found indexes are called projections in Vertica and are more like materialized views than true indexes, here are the docs with a breakdown, https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/Ad...

And here is a general walk through of the architecture including key concepts such as projections. https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/C...

In a few cases I have had to go to published white papers on the technologies as well.

But honestly it's all searching for the right words and then crawling through docs and papers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: