Social media data: The next frontier for identity verification?

In the United States, 2.3 billion people1 are using social media. From the beginnings in 1999, when Six Degrees.com2 reached 3.5 million users, the number of recognized social media sites has exploded to over 200 with 69 percent of all U.S. Internet users participating on a social networking site.3 Social media creates ways for businesses and people to connect in a manner and to a degree unlike any previous communication channel, generating an enormous amount of information. With increasing pressure to conduct transactions on the Internet for everything from purchasing cell phone minutes to buying a car, the question for many businesses is obvious: "Can social media data be used for identity verification?"

Two very diverse schools of thinking, fueled by existing legislation, historical practices and current trends and technologies are emerging on this topic. On one side, companies providing services in the industry today recognize and appreciate the rigor needed to qualify appropriate data sources. At the same time, providers acknowledge that technical advances have historically been the motivators of efficiency and change. Therefore, as with all new technologies and data, considerable analysis must be completed prior to full adoption.

At the core, the discussion of this issue must be an analysis of whether social media data could be used to meet AML/KYC requirements. It is not clear as yet how the regulating authorities in various regions will regard social media or aspects of the data for valid identity sources. Due to varying regulations and AML requirements globally, a universal guideline is not likely. However, some generalizations can be made by looking at a cross section of data requirements across the globe, focusing on countries that are known to have well-defined requirements.

Country AML supported data should be from qualified sources representing the following requirements ­
United States obtained from consumer reporting agency, public database, or other source
United Kingdom sufficiently extensive, reliable and accurate
Australia reliable and independent electronic data
Netherlands independent source.
Singapore reliable, independent sources

Global Sample of AML Data Requirements table: Requirement language sourced from governments' relevant AML legislations

This global sample of AML data requirements table shows key descriptive language from several countries' AML/KYC legislation around the requirements of data suitable for identity verification. The data descriptors provide a framework for discussing the use of social media data for risk management processes. Specifically, is social media data:

  • Public?
  • Independent?
  • Extensive?
  • Reliable?
  • Accurate?

While language is largely open to interpretation, social media data generally appears to meet the requirements.

Public, independent, and extensive

Social media data is public by nature, which will be discussed in more detail further in this article. In addition, social media data is clearly independent as it is not generally derived from another source. The volume of data is extensive and growing at a staggering rate — Facebook reported 1.01 billion users as of September 30, 2012, a 26 percent increase over the previous year.4 In the six months ending in January 2012, Google+ amassed over 90 million registered users. While estimates of the total number of users on social media vary widely and are subject to daily revision, it can safely be stated that a sizeable percentage of the world's population of 7 billion5 are active on social media sites. Hence, social media data is arguably the most extensive and global consumer source in the world, as it has few barriers to growth and no geographical boundaries. While the above-mentioned traits of social media data are certainly necessary for its use in identity verification, the most interesting debate regarding social media data suitability for AML/KYC use is likely to focus on the accuracy of social media data.

social_mediagraph

The accuracy debate

Although many social media sites prohibit posting of false information, there is little independent verification of the information submitted or posted by users. Researchers studying the accuracy questions are finding mixed results. A University of Texas study6 revealed that, essentially, who you are on Facebook is who you are in real life, at least in terms of personality traits and observable characteristics such as gender and race. A rather intriguing piece of evidence regarding accuracy of social site information can be gleaned from the increase in identity fraud perpetrated through finding valid identity information on social sites. Identity fraud increased 13 percent in 2011, and consumer use of social media has been found to be a contributing factor.7

Other evidence, however, suggests that the accuracy of user information is declining. Danah Boyd is a senior researcher at Microsoft Research8 and sees a trend of users asserting control of their social media profiles through making and destroying Facebook accounts. Microsoft released data from a survey where 91 percent of respondents reporting doing something to manage their overall profile, with 14 percent reporting that they have experienced negative consequences, such as being fired from a job or losing their health insurance, as a result of information on social sites. It is no wonder that, as reported by Pew Internet,9 more people are moving to unfriending, deleting comments, and untagging photos. This trend on the part of users toward managing their personal information may increase the relevancy and reliability of the data, while also decreasing the availability of information. Undoubtedly, further studies and discussion of this topic will continue as social media expands, and while there is debate about level of accuracy, it is clear that some percentage of social media data provides accurate information about the user.

Social media and privacy

Assuming that social media data meets risk mitigation parameters, and potentially approaches the ability to fit also AML/KYC legislation parameters in that it is public, independent, extensive, and somewhat accurate and reliable, the next question is whether its use is prohibited by privacy legislation. The logical conclusion on this topic may be surprising to many. A well written and exhaustively cited summary of the facts surrounding privacy and social media has been written by Lothar Determann for the Stanford Technology Law Review,10 with Determann covering 12 myths concerning privacy and social media. To summarize this work, there are limited privacy protections and Determann generally concludes that based on the growing number of users, people prefer the benefits of free media over the protection of their personal information. While this may not always be the case, as many experts in the field predict the enactment of additional regulation, very few specific privacy concerns exist surrounding the use of public social media data for identity verification.

Availability of social media data

At this point in the discussion, it can be reasonably concluded that social media data may be suitable for identity verification based on requirements, standards, and the apparent lack of privacy restraints. However, two additional questions remain unanswered: Is social media data available and why use it over traditional sources?

Most social media sites do not sell their information, and many are increasingly providing assurances to users that their private information will be protected. Information that is in public profiles, however, is open for use and companies are using sophisticated web scraping and data mining technologies to harvest social media data. Clearly, social media data is now available; companies have it and the technology used to obtain it is growing in maturity and capabilities.

Some argue that, despite the legality and increasing feasibility of using social media data for identity verification, there is no reason to use it. This may be true in countries such as the United States or Australia where ample public and government data sources exist to support technology services that require identity verification; however, it may not be true in many other parts of the world that lack such verification sources. Social media data may be the key that provides the ability to verify the identities of many individuals, providing access to the world of technology services to people in regions where technology is emerging such as APAC, South America and Eastern Europe. Social media data may also provide a verification alternative for low-risk consumer transactions worldwide.

The next frontier for identity verification?

The bottom line is that verification is best when different types of sources are used, a fact leading an increasing number of countries to require at least two separate sources for verification. Social media data appears to be a strong candidate for augmenting current processes and could be leveraged in conjunction with traditional sources. Incorporating social media data into verification processes with well-defined source disclosures may be the next frontier for verification providers.

Colleen Howell, Managing Director, Global Data Company, Bozeman, Montana, USA, chowell@globaldatacompany.com

  1. Number of people using social media is difficult to quantify since a person can be on more than one social media site. 2.3 billion is the number of US users on primary sites, but the three largest social media sites in China have collectively 1.1 billion users, the number of 2 billion is defensible. Samantha Felix, "CHARTS: See How Massive Social Media Is Now, By Users and Dollars" Business Insider, 27 September 2012, <http://www.businessinsider.com/how-big-social-media-has-become-2012-9>
  2. Danah M. Body, "Social Network Sites: Definition, History, and Scholarship, in Journal of Computer-Mediated Communication, School of Information, University of California-Berkeley.< http://www.postgradolinguistica.ucv.cl/dev/documentos/90,889,Social_network_boyd_2007.pdf>
  3. Joanna Brenner, 'Pew Internet: Social Networking (full detail)', in PewInternet: Pew Internet & American Life Project <http://pewinternet.org/Commentary/2012/March/Pew-Internet-Social-Networking-full-detail.aspx>
  4. Drew Olanoff, 'Facebook Announces Monthly Active Users Were At 1.01 Billion As Of September 30th, An Increase Of 26% Year-Over-Year', in techcrunch.com <http://techcrunch.com/2012/10/23/facebook-announces-monthly-active-users-were-at-1-01-billion-as-of-september-30th/>
  5. United States Census Bureau < http://www.census.gov/population/international/data/idb/worldpopinfo.php>
  6. Tony Fish, 'Sorry there is no difference — You really are the same physcially and digitally!', in My Digital Footprint <http://www.mydigitalfootprint.com/sorry-there-is-no-difference-you-really-are-t> [accessed January 2013]
  7. Jennifer Waters, 'Why ID Thieves Love Social Media', in The Wall Street Journal <http://online.wsj.com/article/SB10001424052702304636404577293851428596744.html>
  8. Quentin Hardy, 'Rethinking Privacy in an Era of Big Data', in Bits <http://bits.blogs.nytimes.com/2012/06/04/rethinking-privacy-in-an-era-of-big-data/> [accessed January 2013]
  9. Mary Madden, 'Privacy management on social media sites', in Pew Internet & American Life Project <http://www.networkworld.com/community/blog/data-privacy-day-social-media-private-data-fair-game-e-discovery-court> [accessed January 2013]
  10. Lothar Determann, 'Social Media Privacy: A Dozen Myths and Facts', Stanford Technology Law Review, 7 (2012), 1-14.

Leave a Reply