Random Forest

Predict outcomes and identify the most important variables.

Definition

Random Forest is a supervised machine learning algorithm based on an ensemble of decision trees. It is robust, insensitive to outliers, and automatically provides a measure of variable importance.

When to use it

Requirements

What StatsLab computes

Worked example

Context : Predicting school dropout (Yes/No) from 12 socio-demographic and academic variables.

Result : AUC = 0.89 · OOB error = 8.2% · Top variable: Absenteeism (importance = 0.31)

Interpretation : Excellent predictive power (AUC = 0.89). Absenteeism is by far the most predictive variable. The model correctly classifies 91.8% of students.

Run this analysis