Here's the abstract: Practitioners use feature importance to rank and eliminate weak predictors during model development in an effort to simplify models and improve generality. Unfortunately, they also routinely conflate such feature importance measures with feature impact, the isolated effect of an explanatory variable on the response variable. This can lead to real-world consequences when importance is inappropriately interpreted as impact for business or medical insight purposes. The dominant approach for computing importances is through interrogation of a fitted model, which works well for feature selection, but gives distorted measures of feature impact. The same method applied to the same data set can yield different feature importances, depending on the model, leading us to conclude that impact should be computed directly from the data. While there are nonparametric feature selection algorithms, they typically provide feature rankings, rather than measures of impact or importance. They also typically focus on single-variable associations with the response. In this paper, we give mathematical definitions of feature impact and importance, derived from partial dependence curves, that operate directly on the data. To assess quality, we show that features ranked by these definitions are competitive with existing feature selection techniques using three real data sets for predictive tasks.
Indeed; lagrange multipliers are your friend! My problem initially was just the disconnect between the picture and the soft-constraint lagrange multiplier. Just couldn't figure out how that thresholded like that. :)
The world certainly doesn't need yet another article on the mechanics of regularized linear models. What's lacking is a simple and intuitive explanation for what exactly is going on during regularization. The goal of this article is to explain how regularization behaves visually, dispelling some myths and answering important questions along the way.
haha. :) Yeah, I got stuck for SOOooo long trying to reconcile the standard picture from ESL book with the math. Turns out they don't match! The picture is for a hard constraint whereas the math penalty term just makes bigger coefficients more expensive. Man, that really threw me.
From an educational point of view, I think it's best not to do a survey of models; rather, it's best to pick one and learn all of the stuff surrounding the model first (training, testing, preparing data etc...)
howdy! We use a pie chart for classifier leaves, despite their bad reputation. For the purpose of indicating purity, the viewer only needs an indication of whether there is a single strong majority category. The viewer does not need to see the exact relationship between elements of the pie chart, which is one key area where pie charts fail.
Indeed. They were the inspiration for this visualization. I wanted to do something for my book with Jeremy Howard https://mlbook.explained.ai/ and those guys show the way, but of course it isn't a general library. Love that r2d3.us page.