Machine Learning-Driven Credit Risk Modeling: Transforming Loan Default Prediction and Portfolio Management in U.S. Commercial Banking
DOI:
https://doi.org/10.63125/0z894070Keywords:
Machine Learning, Credit Risk, Loan Default, Predictive Modeling, BankingAbstract
This study examined the effectiveness of machine learning-driven credit risk modeling in improving loan default prediction and portfolio management within U.S. commercial banking. A quantitative, retrospective longitudinal research design was employed using a dataset of 12,480 loan-level observations, incorporating borrower demographics, financial indicators, and behavioral variables. The dataset exhibited a class imbalance, with 81.9% non-default cases and 18.1% default cases, reflecting realistic credit risk conditions. The study compared traditional statistical methods, particularly logistic regression, with advanced machine learning algorithms including decision trees, random forests, gradient boosting, support vector machines, and neural networks. Descriptive statistics, correlation analysis, and predictive modeling techniques were applied, followed by validation using cross-validation and performance metrics such as accuracy, recall, F1-score, and area under the curve. The findings demonstrated that machine learning models significantly outperformed the baseline logistic regression model across all evaluation metrics. Logistic regression achieved an accuracy of 78.6% and an AUC of 0.74, whereas gradient boosting reached the highest accuracy of 90.1% and an AUC of 0.93. Random forest followed closely with an accuracy of 89.3% and an AUC of 0.91. Recall for default prediction improved substantially from 65.2% in logistic regression to 82.4% in gradient boosting, indicating enhanced capability in identifying high-risk borrowers. Effect size analysis further confirmed large practical improvements, with ensemble models demonstrating strong gains in predictive discrimination. Subgroup analysis revealed that machine learning models performed particularly well in high-risk borrower segments, achieving accuracy levels above 91%, while performance differences were smaller in low-risk segments. The study also highlighted the importance of behavioral variables, such as payment delay frequency and credit utilization, which emerged as the most influential predictors of default. Overall, the results confirmed that machine learning approaches provide a more accurate, robust, and scalable framework for credit risk assessment, supporting improved decision-making and portfolio risk management in commercial banking environments.
