More

parrt · on June 9, 2020

Here's the abstract: Practitioners use feature importance to rank and eliminate weak predictors during model development in an effort to simplify models and improve generality. Unfortunately, they also routinely conflate such feature importance measures with feature impact, the isolated effect of an explanatory variable on the response variable. This can lead to real-world consequences when importance is inappropriately interpreted as impact for business or medical insight purposes. The dominant approach for computing importances is through interrogation of a fitted model, which works well for feature selection, but gives distorted measures of feature impact. The same method applied to the same data set can yield different feature importances, depending on the model, leading us to conclude that impact should be computed directly from the data. While there are nonparametric feature selection algorithms, they typically provide feature rankings, rather than measures of impact or importance. They also typically focus on single-variable associations with the response. In this paper, we give mathematical definitions of feature impact and importance, derived from partial dependence curves, that operate directly on the data. To assess quality, we show that features ranked by these definitions are competitive with existing feature selection techniques using three real data sets for predictive tasks.

parrt · on May 5, 2020

Indeed; lagrange multipliers are your friend! My problem initially was just the disconnect between the picture and the soft-constraint lagrange multiplier. Just couldn't figure out how that thresholded like that. :)

parrt · on May 5, 2020

The world certainly doesn't need yet another article on the mechanics of regularized linear models. What's lacking is a simple and intuitive explanation for what exactly is going on during regularization. The goal of this article is to explain how regularization behaves visually, dispelling some myths and answering important questions along the way.

antipaul · on May 5, 2020

Totally. Often, I see a copy paste of that canonical regularization figure in TL,DR 1. which I’ve never seen adequately explained.

Until this post.

Edit: Ok and now I see what you did there ;)

parrt · on May 5, 2020

haha. :) Yeah, I got stuck for SOOooo long trying to reconcile the standard picture from ESL book with the math. Turns out they don't match! The picture is for a hard constraint whereas the math penalty term just makes bigger coefficients more expensive. Man, that really threw me.

parrt · on Oct 2, 2018

Just a note that dtreeviz works cross-platform now! Mac, Windows, Linux. "pip install -U dtreeviz" See more at https://github.com/parrt/dtreeviz

parrt · on Sept 28, 2018

From an educational point of view, I think it's best not to do a survey of models; rather, it's best to pick one and learn all of the stuff surrounding the model first (training, testing, preparing data etc...)

parrt · on Sept 27, 2018

howdy! We use a pie chart for classifier leaves, despite their bad reputation. For the purpose of indicating purity, the viewer only needs an indication of whether there is a single strong majority category. The viewer does not need to see the exact relationship between elements of the pie chart, which is one key area where pie charts fail.

parrt · on Sept 27, 2018

Thanks for that link. Super useful. Looks like BigML uses same layout I did for ANTLR parse trees. Really packs stuff in; e.g., https://cdn-images-1.medium.com/max/1760/1*k0mO4kJyQvPCyyev0...

jdonaldson · on Sept 27, 2018

Yeah the general algorithm is "Reingold-Tilford" with some tweaks from Buchehim.

https://en.m.wiktionary.org/wiki/Reingold-Tilford_algorithm

Really good algorithm to have in an arbitrary visualization toolkit.

parrt · on Sept 27, 2018

Heh that’s a cool idea. Fly through the tree like a maze

parrt · on Sept 26, 2018

I sat in on this course last Fall. Excellent.

parrt · on Sept 26, 2018

Indeed. They were the inspiration for this visualization. I wanted to do something for my book with Jeremy Howard https://mlbook.explained.ai/ and those guys show the way, but of course it isn't a general library. Love that r2d3.us page.