From Terabyte to Petabyte: Finding the Bad Guys in the Age of Big Data

Information overload has become a two-way street. We are exposed to a constant stream of data every day—consuming it from social networks, mobile phones, PDAs, the Internet and other sources. But we also give back. Data is collected from each interaction with technology. These massive stores of information and data sets have become so large and so unwieldy that they cannot be processed by traditional database management tools within an acceptable time frame. Welcome to the age of big data.

Big data is a general term used to describe this voluminous amount of unstructured and semi-structured data. Unstructured data, which accounts for approximately 80 percent of an organization's data, is mostly found in text files. Semi-structured data is available electronically in database systems, Web data and data exchange formats. Examples include XML, email and EDI. In order to process these enormous data sets, organizations are turning to non-traditional, innovative technologies that can capture, manage and process the information in its entirety. Technologies being applied to big data include massively parallel processing (MPP) databases, distributed file systems, distributed databases, cloud computing platforms, the Internet, scalable storage systems, data mining and advanced analytics.

The definition of big data is three-dimensional: volume (measured in terabytes and petabytes); velocity (the speed by which it is collected); and variety (range of data types and sources). Big data has come about for a simple reason—data collection infuses nearly everything we do from opening a bank account to searching Google to dialing your best friend in Texas. As our lives become ever more technology based, this will continue to feed the explosion of big data.

Extracting Value from Big Data

Walmart logs one million transactions per hour. Twitter logs more than 90 million tweets per day. Facebook users create more than thirty billion pieces of content each month ranging from Web links, news and blogs to photos. Every website click, device-to-device communication and social media interaction contains critical and valuable information. Big data churns out business intelligence.

For example, PayPal and Amazon have had years to amass databases containing transaction details for hundreds of millions of customers across thousands of merchants. Using this information, they have developed fraud detection tools that depend on huge data sets containing not only financial details for transactions, but IP addresses, browser information, and other technical data to predict, identify, and prevent fraudulent activity.

In a world that has become increasingly more digital, aggregating and analyzing large data sets will bring huge benefits to organizations across a broad spectrum of industries including financial services.

Forward thinking financial institutions are exploring solutions that will help them analyze the massive amount of information collected. This information represents granular details of their business operations, customer behavior and interactions. Value can be applied to a number of areas within the institution including:

  • Risk assessment and management
  • Money laundering and fraud detection
  • Compliance and regulatory reporting
  • Credit risk scoring
  • Customer relationship management (CRM)
  • Trade surveillance and pattern analysis

IT departments face the technical challenge of analyzing and reporting on very large amounts of data in a reasonable amount of time. This time period is dictated by the business users who rely on the analysis and reporting to support compliance, risk management and other strategic functions. To realize maximum value, IT and business decision makers must collaborate on a holistic approach to managing an institution's data.

Recognizing that traditional databases, architecture and methodologies are no longer sufficient, financial institutions are beginning to adopt the new technologies needed to process, discover and analyze large data sets. The advanced analytics that can be performed on the increasing volumes, velocity and variety of data generated provide real value to an organization.

Creating Actionable Intelligence to Combat Financial Crime

Advanced analytics is a collection of related techniques and tools based on artificial intelligence, data mining, statistics, predictive analytics, data visualization and natural language processing. Often referred to as discovery analytics, advanced analytics explores business operations and customer interactions at a very granular level that seldom makes it into databases or standard reports. While all these techniques have been around for several years, their use has grown exponentially with the onslaught of big data. According to a 2009 survey by the Data Warehousing Institute, 85 percent of organizations surveyed indicated they would be using some type of advanced analytics within three years.

All industries deal with money in one form or another, be it cash, check, credit card, or electronic funds transfers. Banks and other financial institutions use all mediums. Building effective AML programs is not an easy task because money laundering crimes are well hidden and usually mimic normal behavior. Large data sets and the nature of financial crime present challenges to first-generation, rules-based AML solutions, which rely on pre-defined sets of fixed thresholds. Data quality issues such as missing values, misspellings and abbreviations pose additional challenges. However, discovery and predictive analysis are effective in detecting fraud and money laundering in raw-source data, non-standard and poor quality data.

The ability to retrieve interrelated data for e-discovery purposes is of primary importance to compliance. Exploratory and discovery-oriented methods of advanced analytics, such as data mining, can quickly retrieve information and facilitate learning from big data. Mining processes also provide the ability to create relationships between data. When applying data mining algorithms and techniques to financial transactions, hidden implicit patterns of funds flow can be identified. This makes it possible to uncover scenarios for investigation that can detect money laundering and fraud.

AML solutions that employ mining and link analysis help investigators relate a large number of objects of different types such as people, bank accounts, businesses and transactions. Powerful algorithms and analysis techniques can help institutions:

  • Detect hidden links between financial transactions based on their co-occurrence
  • Uncover transaction patterns that occur frequently
  • Classify accounts into pre-determined risk categories depending on the risk profiles of account holders
  • Cluster transactions and accounts by similarities and build risk profiles of suspicious transactions and customer accounts
  • Predict the possibility of money laundering activity based on demographic and behavioral variables
  • Identify hidden connections between different accounts based on funds transfer activity and account interactions

The complex web of disparate systems, geographies and functions within financial institutions makes it challenging to understand and manage big data. A holistic approach to risk combined with advanced analytics provides financial institutions with the leading edge needed to stay ahead of money launderers and fraudsters in a dynamic environment.

Join the Revolution

Ultimately, financial institutions will turn to big data platforms in order to meet compliance requirements. Getting on the big data bandwagon is not a function of just collecting data. In order to turn information into actionable intelligence without additional resource costs financial institutions must create data and process transparency, enable experimentation and replace some human-decision making with executable models. Systems have become powerful and subtle enough to reduce human bias in decision making with self-learning algorithms—and they can do it in real-time. This means fewer hunches and more facts. Early adopters will be the first to reap the anti-money laundering and risk mitigation benefits the technology delivers.

Carol Stabile, CAMS, senior business manager, Safe Banking Systems LLC, Mineola, NY, USA,

Leave a Reply