10:44 AM
Big Data for Fraud Detection
It is widely recognized that the volume of data is increasing at breakneck speed, with some research estimating that the world’s information is doubling every two years. But volume is only part of the equation. Big data also refers to variety and velocity.
James Ruotolo SAS
Insurers have access to an ever-increasing variety of information they can use across every part of the business. Social media interactions, unstructured text, machine-to-machine (M2M) data, and visual media content like photos and videos are all rich sources of information that insurers want to explore.
[For more of James Ruotolo's insights on the application of analytics to the fraud challenge, see Operationalizing a Fraud Detection Solution: Buy or Build? .]
And this information is being created and updated with increasing frequency. For example, telematics programs are producing content far more frequently than insurers can consume it. Dealing with this velocity can present as big a challenge as the volume of data itself.
How does all this help detect insurance fraud?
New data sources
One of the most exciting aspects of high-performance analytics is the ability to use data sources that were previously ignored because they were too large or changed too frequently for more traditional analytical approaches. For example, insurers can refresh fraud scoring in real time with streaming telematics data or social media updates. Or insurers could look for aberrant patterns of employee behavior within huge log files from claims or bill processing systems.
Historically these sources were too large or changed too often to be helpful in traditional fraud scoring, which uses overnight batch processing and often takes hours or days to run. With high-performance analytics, models that evaluate billions of rows of data can be run in mere seconds.
The death of sampling
The insurance industry has long relied on red-flag business rules to drive the detection of suspicious claims. In recent years, there has been a slow evolution toward more advanced analytical techniques, like supervised predictive modeling, to improve this process. With the advent of big data, high-performance analytics technology represents an opportunity to completely revolutionize the way fraud is detected.
Since advanced models on large data sets can be run in seconds, many organizations are foregoing sampling – where models are built or run with a subset of data – in favor of simply using all the data. This eliminates the error introduced through sampling methods.
Wash, rinse, repeat
One of the other big benefits to high-performance analytics is the ability to test certain approaches in real time, on production data. In the context of insurance fraud detection, this means that analysts or administrators could test modifications to parameters, thresholds or new detection scenarios in real time against their entire claim population. This process shows what impact those changes would have to referral volume, false-positive rates and investigation counts.
With this capability, insurers can constantly tweak their fraud detection algorithms to maximize results. Instead of retuning a model once or twice per year, analysts can constantly test and deploy fraud detection scenarios.
Big benefits
High-performance analytics is not just another technology fad. It represents a revolutionary change in the way organizations harness data. With new distributed computing options like in-memory processing on commodity hardware, insurers can have access to a flexible and scalable real-time big data analytics solution at a reasonable cost. This is sure to change the way insurance companies manage big data across their business – especially in detecting fraud.
About the Author: James Ruotolo is an insurance fraud technologist, thought leader and the principal for Insurance Fraud Solutions at SAS. Readers can connect with him on Twitter or LinkedIn.