Credit Scoring: Part 6 – Segmentation and Reject Inference
By: Natasha Mashanovich, Senior Data Scientist at World Programming, UK
Part 6: Segmentation and Reject Interference
“Segmentation and reject inference, or keep it simple? – That is the question!” This article explores two additional aspects that often need to be addressed during the scorecard development process: segmentation and reject inference (RI).
How many scorecards? What are the criteria? What is the best practice? – are the common questions we try to answer early in the scorecard development, starting with the process of identifying and justifying the number of scorecards – known as segmentation.
Figure 1. Scorecard Segmentation
The initial segmentation pre-assessment is carried out during the business insights analysis. At this stage, the business should be informed about any identified heterogeneous population segments that might have different characteristics impossible to treat as a single group to enable an early business decision about accepting multiple scorecards.
The business drivers for segmentation are: (1) marketing, such as product offerings or new markets, (2) different treatments across different groups of customers, for example, based on demographics, and (3) data availability, meaning that different data might be available through different marketing channels or some groups of customers might not have available credit history.
The statistical drivers for segmentation assume that there are a sufficient number of observations in each segment, including “good” and “bad” accounts, and each segment contains interaction effects where predictive patterns vary across the segments.
Typically, the segmentation process includes the following steps:
- Identify a simple segmentation schema using supervised or unsupervised segmentation.
- For supervised segmentation, a decision tree is often used to identify the potential segments and capture interaction effects. Alternatively, residuals from an ensemble model can be used to detect interactions in the data.
- Unsupervised segmentation, such as clustering, can be used to create the segments, but this method does not necessarily capture the interaction effects.
- Identify a set of candidate predictors for each of the segments.
- Build a separate model per segment.
- If the segmented models have different predictive patterns. Failure to identify new predictive characteristics across segments indicates that the data scientist should search for a better segmentation split or build a single model.
- If the segmented models have similar predictive patterns but with significantly different magnitudes or opposing effects across the segments.
- If the segmented models produce superior lift in predictive power, comparing to a single model built on the entire population.
Segmentation is an iterative process that requires constant judgement to determine whether to use single or multiple segments. From the practitioners’ experience, segmentation rarely results in a significant lift and every effort should be made to produce a single scorecard. The common methods used to avoid segmentation include adding additional variables in the logistic regression to capture interaction effects or identifying the most predictive variables per segment and combining them into a single model.
Separate scorecards are usually built independently. However, if the reliability of model factors is an issue, a parent/child model may offer an alternative approach. In this approach, we develop a parent model on the common characteristics, and use the model output as a predictor into its children models to supplement unique characteristics across children segments.
The primary aim of multiple scorecards is to improve the quality of risk assessment when compared to a single scorecard. Segmented scorecards should only be used if they offer significant value to the business that outweighs the higher development and implementation cost, the complexity in the decision management process, additional management of scorecards and greater use of IT resources.
Application scorecards have naturally-occurring selection bias if the modelling is based solely on the accepted population with known performance. However, there is a significant group of rejected customers excluded from the modelling process because of their unknown performances. In order to address the selection bias, application scorecard models should include both populations. This means that unknown performance of the rejects needs to be inferred, which is completed using the Reject inference (RI) method.
Figure 2. Accepts and Rejects Populations
There are a few extra steps required during the scorecard development if using RI:
- Build a logistic regression model on the accepts – this is the base_logit_model
- Infer the rejects using a reject inference technique
- Combine the accepts and the inferred rejects into a single dataset (complete_population)
- Build a new logistic regression model on complete_population – this is the final_logit_model
- Validate the final_logit_model
- Create a scorecard model based on the final_logit_model
Figure 3. Scorecard Development using Reject Inference
Reject inference is a form of missing values treatment where the outcomes are “missing not at random” (MNAR), resulting in significant differences between accepted and rejected populations. There are two broad approaches used to infer the missing performance: assignment and augmentation, each having a different set of techniques. The most popular techniques within the two approaches are proportional assignment, simple and fuzzy augmentation and parcelling.
|Assignment techniques||Augmentation techniques|
|Ignore rejects, do not use RI||Simple augmentation|
|Assign “bad” status to all rejects||Fuzzy augmentation|
|Proportional assignment||Case-based inferring|
Table 1. Reject Inference Techniques
Proportional assignment is random partitioning of the rejects into “good” and “bad” accounts with a “bad” rate two to five times greater than in the accepted population.
Simple augmentation assumes scoring the rejects using the base_logit_model and partitioning it into “good” and “bad” accounts based on a cut-off value. The cut-off value is selected so the “bad” rate in the rejects is two to five times greater than in the accepts.
Fuzzy augmentation assumes scoring of the rejects using the base_logit_model. Each record is effectively duplicated containing weighted “bad” and weighted “good” components, both derived from the rejects’ scores. Those weights, along with the weights equal to “1” for all the accepts, are used in the final_logit_model. A “bad” rate in the rejects of two to five times greater than in the accepts would be the recommended strategy.
Parcelling is a hybrid method encompassing simple augmentation and proportional assignment. Parcels are created by binning the rejects’ scores, generated using the base_logit_model, into the score bands. Proportional assignment is applied on each parcel with a “bad” rate two to five times greater than the “bad” rate in the equivalent score band of the accepted population.
Figure 4. Proportional Assignment
Figure 5. Simple Augmentation
Figure 6. Fuzzy augmentation
Figure 7. Parcelling
Have a question?
Try or buy