Unmasking Black-Box Bias: Interpretable ML for Socioeconomic Inequality in Urban-Rural U.S. Income Prediction
Keywords:
Interpretable Machine Learning, Socioeconomic Inequality, Urban-Rural Divide, Income Prediction, Algorithmic Bias, Explainability, Fairness, SHAP, LIMEAbstract
Income inequality between urban and rural populations in the United States remains a persistent socio-economic challenge, with significant implications for public policy and equitable resource distribution. This study investigates the use of interpretable machine learning (ML) models to predict income disparities across urban and rural settings while uncovering potential algorithmic biases inherent in traditional black-box models. The primary aim is to enhance both predictive performance and fairness in classifying income levels by leveraging socio-demographic and geographic features. To achieve this, we utilized a range of traditional machine learning classifiers, including Logistic Regression, Random Forest, and Gradient Boosting, alongside interpretable counterparts such as Decision Trees and Post-hoc explanation tools including SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations). These models were evaluated not only on standard classification metrics such as precision, recall, and F1-score, but also on fairness and bias-oriented measures, including disparate impact and demographic parity. This dual focus enables a holistic understanding of both model performance and ethical robustness. The results demonstrate that while black-box models offer superior predictive power, interpretable models reveal nuanced patterns of income stratification linked to geographic and demographic variables. SHAP and LIME explanations exposed critical features influencing predictions, such as employment type, education level, and location category, thereby illuminating latent structural inequalities. Moreover, interpretable models provided more transparent decision-making pathways, making them valuable for stakeholders interested in diagnostic and prescriptive analytics. In conclusion, this study underscores the importance of integrating interpretable ML in socioeconomic modeling, not merely as a technical enhancement but as a necessary step toward ethical and accountable AI systems. These findings support the adoption of interpretable ML frameworks for socially impactful applications, particularly where fairness, trust, and transparency are paramount. Policymakers can leverage these insights to guide data-driven decisions that promote equity across geographic boundaries