Impossible Evidence: Words, Identity and AI—Introduction

“I don’t have a good feeling. I feel scared,” the voice of a terrified 17-year-old girl pleaded over the phone.1 “You know the borders are closed right now, so how am I going to get out?”2

The voice was Kadiza Sultana, and she was desperately trying to plan her escape.3 A year before, in February 2015, the former student at Bethnal Green Academy and two of her classmates left their homes in the U.K. to join the Islamic State group.4 Now, trapped in the organization’s clutches, she feared for her life.

On the other end of the line was Halima Khanom, her sister. Racking her brain, Halima asked Kadiza the chances of escape.5

“Zero,” said Kadiza.

Not long after the call, Kadiza was killed in an airstrike.6

The Devil Is in the Details

Kadiza did not leave without warning. The trio’s departure in spring 2015 left a trail of clues that natural language processing technology can detect and unite, but it just was not in place (or possible) at the time.

Scotland Yard interviewed the young women three months before their disappearance.7 A classmate and friend had vanished in the same manner, and law enforcement wrote up reports on the three as well as other at-risk students.

The girls’ social media presence leading up to their departure also bore troubling signs: Their posts had rapidly transformed from normal, life-of-a-student commentary to ideological rhetoric.8

Taken together, the data forms a perilous narrative of radicalization. If the full story had been available to airport security staff, the students would likely still be safe at home, complaining about teachers and prepping for exams.

However, these story fragments were housed in disparate data sources—state and school records, police reports and social media posts—and the pieces literally could not be put together in time.

The story of Kadiza Sultana is a tragic case of a problem that appears in many spaces that are less tragic, but that still have wide-reaching impacts. Money laundering is one such area. An estimated 2-5 percent of global GDP, or $800 billion–$2 trillion in current U.S. dollars, is laundered annually worldwide.9 Financial institutions (FIs) are tasked with identifying potential and current criminals in the masses of people they onboard every day, a task which becomes more difficult with time.

The trouble is not that the needed information in these scenarios does not exist. In fact, quite the opposite is true: For the most part, all the intelligence FIs need to determine criminality exists in spades.

Instead, the difficulty lies in the organizing and locating this information. It is usually in unstructured text, buried in other unstructured text and scattered throughout the internet and other sources.

Needle in a Stack of Needles: Structured and Unstructured Data

For machines, not all textual data is created equal. Overall, computers like structured data.

Structured data has some sort of a known, unambiguous order that can be easily understood by computers. The information contained in tables, spreadsheets, taxonomies and protocols are all examples of structured data. This data announces (“labels”) each row and column with what it is inside and how to interpret it (e.g., “date of birth” or “annual revenue”).

However, much of the identity-relevant information is found in unstructured data.

Unstructured data does not have a formal, clear order that can be easily understood. Unstructured does not mean “no rules”: Prose generally follows the rules of grammar while being unstructured. Instead, unstructured data describes information whose format is difficult for a computer to interpret.

For example, readers can understand the sentence, “The dog jumped,” because they know what type of word “the” is, and they know that verbs like “jump” get certain endings attached and come at the middle or end of a sentence. This simple example becomes more complex when it is changed to, “The quick brown fox jumped over the lazy dog,” which is still not a very complex sentence. However, the rules of grammar that guide both of these sentences make complete sense to people but are not easily understood by computers.

Because computer systems are really the only systems capable of processing data at internet scale, a very real problem presents itself: Much of the information that is cared about is found in a form that computers have been traditionally poor at handling.

Needle in a Stack of Needles: Here, There and Everywhere

It does not stop there. The quality of the information also presents a huge challenge to discovery, analysis and unification of identities:

  • Names often vary
  • Attributes can conflict
  • Different languages and scripts will confuse

As information quality poses a challenge, so does information location. The identities of people, organizations and places are stories and, these stories are broken into difficult-to-reconcile fragments and spread across the digital landscape. And all of these fragments can be found buried in databases, data lakes, log files, historical archives, content systems, social media posts, website pages, document libraries. The list goes on.

Taken together, these challenges present the parameters of a solution. If they are going to assemble comprehensive identities derived from textual data, FIs need technology capable of scanning a variety of different structured and unstructured data sources, finding relevant fragments and resolving them into unified identity stories.

To understand this information created by human minds and meant for consumption by other human minds, technology that mimics human understanding is needed–enter artificial intelligence.

 Editor’s note: This is an excerpt of the article that will appear in the ACAMS Today March-May 2019 print edition magazine. Please look for the article in its entirety on March 1 on or for your copy in the mail. 

Steve Cohen, COO, Basis Technology, Cambridge, MA, U.S.A.,

  1. Katie Forster, “London schoolgirl who ran away to join Isis ‘killed in air strike in Syria,’” Independent, August 11, 2016,
  2. Ibid.
  3. Rohit Kachroo, “Bethnal Green schoolgirl Kadiza Sultana who joined Islamic State ‘killed in airstrike in Syria’, ITV News reveals,” ITV, August 11, 2016
  4. Katie Forster, “London schoolgirl who ran away to join Isis ‘killed in air strike in Syria,’” Independent, August 11, 2016,
  5. Rohit Kachroo, “Bethnal Green schoolgirl Kadiza Sultana who joined Islamic State ‘killed in airstrike in Syria’, ITV News reveals,” ITV, August 11, 2016
  6. Ibid.
  7. David Barrett and Martin Evans, “Three ‘Jihadi brides’ from London who travelled to Syria will not face terrorism charges if they return,” The Telegraph, March 10, 2015
  8. Erin Marie Saltman and Melanie Smith, “‘Till Martyrdom Do Us Part’ Gender and the ISIS Phenomenon,” Institute for Strategic Dialog, February 2016,
  9. “Money-Laundering and Globalization,” United Nations Office on Drugs and Crime,

Leave a Reply