Random Forest Regression, Negative Variance Explained mechanism
Jeffery Evans , Senior Landscape Ecologist, The Nature Conservancy, Global Lands Science Team, Affiliate Assistant Professor, Zoology & Physiology, University of Wyoming explains a negative percent variance explained in a random forest regression in hilarious way - I have recently been asked the question: “why do I receive a negative percent variance explained in a random forest regression”. Besides the obvious answer “because your model is crap” I thought that I would explain the mechanism at work here so the assumption is not that randomForests is producing erroneous results. For poorly supported models it is, in fact, possible to receive a negative percent variance explained. Generally, explained variance (R²) is defined as: R² = 1 - sum((ŷ-mean(y))²) / sum((mean(y)-y)²) However, as indicated by Breiman (2001) and the R randomForest documentation the (regression only) “pseudo R-squared” is derived as: R² = 1 – (mean squared error) / var(y) Which, mathe...