Machine learning in credit scoring
Many financial institutes use scoring models to lower credit risk in credit appraisals, and in the granting and supervision of credit. Credit scoring models based on classical statistical theories are widely used. However, these models are less resilient when it comes to large amounts of data input; as a consequence, some of the assumptions in the classical statistics analysis fail. This influences the accuracy of prediction and of model generalizations. In this blog post, we will explain how machine learning can be used in credit scoring to achieve a more accurate scoring from large amounts of data.
According to a large number of empirical studies, machine learning techniques – along with other data-mining algorithms based on computational innovation and transformation – seem to perform better when fitting data and forecasting. Machine learning algorithms are designed to learn from large amounts of historical data and then make a forecast. Take the credit scoring for loans from retail banks as an example. The typical business process for the provision of a loan service is: accept loan applications, evaluate the credit risk, make the decision on the granting of the loans, and supervise the repayment of principles and interests. Then problems may materialize, such as how to accelerate the credit appraisal process and how to supervise the repayment process and make adjustments in time once a possible defaulting has been detected.
To solve the above two problems, we could build two models during the loan origination process and the supervising process.
In the origination process, our research population consists of all the applicants who want to apply for loans. By using the historical data of application records, the model could be trained to judge whether a new applicant is sufficiently reliable to be granted the loan if the characteristic indicators of the applicant have been provided, such as their income, marital status, age, previous actions of default, etc.
In the supervising process, our research target is the successful applicant. By using the historical data of repayment records and the characteristics status of customers who have completed the entire loan process, we could train another model to make a judgment regarding whether or not this new customer has a large probability of defaulting; by observing the applicant’s repayment record for the first few payback periods and the change of characteristics, this model would make new adjustments based on the updated information. This automated process is more time efficient and accurate compared with the traditional ways.