Insurance: Part 1 - Predictive Analytics in Insurance





Data Science


By: Natasha Mashanovich, Senior Data Scientist at World Programming, UK

Insurance is "a promise to provide compensation in the future if certain events take place during a specified time period" (source: Unlike many other products whose cost is known before the product is sold, insurance is a very different 'beast' as the price of insurance policies is unknown at the time of purchase. Hence, selling an insurance product carries a great financial risk.

At its simplest mathematical notation, a product price is defined as the sum of cost and profit. The primary aim, and biggest challenge, in the insurance sector is the accurate estimation of the product cost. Over the years, insurers have developed a plethora of tools, methodologies and mathematical models to calculate the cost. The big data revolution – along with advances in data processing, predictive analytics and artificial intelligence – have made this effort more achievable. Nevertheless, the fact that in 2015 the UK motor insurance market made underwriting profit for the first time since 1994 shows that insurance is an extremely challenging business sector (source:

The key facts stated in the latest UK Insurance & Long-term Savings annual report from the Association of British Insurers (ABI) confirms the importance of the insurance industry for the UK's economic strength. The UK insurance industry is the largest in Europe and the fourth largest in the world with the total premium income of £300 billion generated in 2016. There are over 900 authorized general insurers in the UK with more than 300,000 employees. The value of premiums written is constantly growing with motor and contents insurance being the largest products. Over 75% of the UK's households have had motor and/or contents insurance. Despite the total revenue measured in tens of billions, fine margins and fraudulent claims totaling £800 million led to a £200 million underwriting loss in motor insurance.

Figure 1. UK Insurance Key Facts (source:, 2017)

The paramount objective in the insurance market is, therefore, to set adequate, fair and competitive premiums. With a customer-centric approach, an insurance pricing system should be easy to understand, provide stable rates over time, be agile to economic drifts, and include loss control that would ultimately provide affordable rates. These are very challenging and opposing requirements that place a large financial burden on the insurers.

To be able to provide premiums, insurers try to answer many unknowns throughout the customer journey (Figure 2) such as: how risky is a customer; should they receive a discount offer; how much discount to offer; how to acquire more customers; how to retain existing customers; what is the likelihood of a customer making a claim and would it be possible to predict the total claim amount; can we identify fraudulent customers; how to encourage customers to purchase other products, and so on.

Figure 2. Customer Interactions throughout the Customer Journey

Calculation of adequate, fair, and competitive insurance policies is the key to answering these questions and ensure the long-term customer relationship. Hence, insurance pricing – often referred as ratemaking – is the key driver in the insurance industry and the art of data science in this industry. Two most important insurance concepts responsible for adequate-fair-competitive insurance policies are Pricing and Claims. These concepts, supported by Fraud Detection are the key analytics elements that contribute to rapid advances in insurance technology innovations (Figure 3).

Figure 3. Insurance Analytics Framework

Table 1 illustrates how Data Science can be utilized across these three insurance concepts and assist in dealing with various business challenges at different stages in the customer lifetime cycle.

SegmentChallengesAnalytics solutionTypical modeling approachBusiness benefits
PRICINGThe ultimate cost of an insurance policy is not known at the time of saleCustomer level Ratemaking (risk-based pricing)Generalized linear models (e.g. the GENMOD procedure in the SAS programming language)Adequate and fair pricing so the Premium = Loss + Profit
Understanding competitive market and its dynamicsMarket-based pricing models including: conversion, demand and retention modelsPropensity modelsExpanding the customer base, competitive advantage
How valuable are my customers?Customer lifetime valueSurvival analysis, segmentation, propensity models Optimal marketing campaigns
What is a customer pricing tolerance?Price elasticityOptimizationMaximizing profit
CLAIMSReduce high operational/IT cost and maintain customer satisfaction Claims management framework including first notification of lossHolistic approach utilizing: propensity and regression models including bodily injury, claim cost, write-offReal-time decision making, monetization
FRAUD DETECTIONApplication and claims fraud detectionFraud detection frameworkHolistic approach utilizing: propensity models, anomaly detection, fraud rules, black lists, link analysisMinimizing loss

Table 1. Leveraging Data Science for InsurTech

Successful development, implementation and utilization of insurance predictive models is heavily dependent on a selected analytics platform that must satisfy an extensive range of requirements including ETL (extract, transform, load) capabilities; data manipulation, preparation and visualization; model building and validation; model deployment; testing; production; and monitoring. Insurers often opt for a mixture of commercial and open-source tools to justify the cost of implementation. However, careful consideration is necessary as this often could lead to a suboptimal solution, as the integration process can be time and resource-consuming.

Figure 4. WPS Analytics Platform for Insurance

Would you like to discuss your requirements or arrange a demo?