Like just about everything else in the world of information technology here in 2019, the impact of artificial intelligence and machine learning is being felt in the relatively new sector of graph databases.
Redwood City, Calif.-based TigerGraph, which bills itself as “the only scalable graph database for the enterprise,” on March 21 will introduce its latest release, TigerGraph 2.4. This is way more than a simple point release; it’s the first time 7-year-old TigerGraph has combined graph pattern matching with real-time deep link analytics — a mix ideal for fraud and money laundering detection, security analytics, personalized recommendation engines, artificial intelligence and machine learning.
The company said the new release makes it easier for enterprises to use deep computational analytics to gain insights from data.
TigerGraph stores all data sources in a single, unified multiple-graph store that can scale out and up easily and efficiently to explore, discover and predict relationships.
Graph databases are a key ingredient in the secret development sauce that makes mega-websites such as Facebook, Google, LinkedIn and others like them work so fast and accurately. A graph database uses graph structures for semantic queries with nodes, edges and properties to represent and store data. Everything in a graph database is connected to everything else, thus data is able to be summoned much faster than from a conventional structured database.
Other graph database providers in this market are Neo4J, Cassandra, ArangoDB, OrientDB, FlockDB, Titan, Amazon Neptune, IBM Graph, Apache Giraph and Azure Cosmos.
Lots of Tech Coming Here, but It’s All Valuable
This gets pretty techy from here on, but well-informed database administrators and IT managers will obtain value from it.
Pattern matching has been around for a long time, but business insights from the technique have been dogged by two problems: difficulty in scaling the computational requirements for large datasets and an inability to perform deep-link analytics, which requires going more than three hops or levels deep into the dataset.
Deep-link analytics is a concept that allows mobile apps to reach outside of their respective walled gardens so that users can search and navigate between specific places within them.
A data hop is a connection point for a dataset on its journey from the originating node to its final destination. Data packets pass through bridges, routers and gateways as they travel between source and destination. Each time packets are passed to the next network device, a hop occurs. The hop count refers to the number of intermediate devices through which data must pass between source and destination.
Determining Data Ownership in Financial Services
For example, determining ultimate beneficiary ownership in banking and financial services means traversing from each subsidiary to its parent business unit all the way up to the corporate headquarters, looking up the key stakeholders for each organization and adding up the ownership portions for each stakeholder across the corporate structure. With every one of these data hops, the size of data in the search expands exponentially, requiring massively parallel computation to traverse the data, TigerGraph said.
Each new hop opens up a new world of information, but up to now, graph databases have only been able to scratch the surface because of their inability to handle these increasingly complex computations.
AI and ML developers, for example, have long sought deeper analysis of interconnected data. The deeper the insights, the better the patterns and corresponding features, which leads to more accurate outcomes for business initiatives, TigerGraph said.
“Unlike other graph databases that delve two to three levels deep into the connected data, TigerGraph’s pattern analytics is tuned to be efficient and tractable with the ability to go 10 or more levels deep into the interconnected entities and calculate risk or similarity scores based on multi-dimensional criteria in real-time,” said Dr. Yu Xu, CEO and founder of TigerGraph. “Efficient graph analytics is more than just a great massively parallel processing engine; it’s understanding what users want to know and focusing on that, and pruning away the rest.”
Defined Starting Point
Standard pattern-matching solutions have a defined starting point, such as a specific customer account or payment, and a well-defined pattern with a fixed number of hops, such as traversal from a customer account to all the payments originating from the account to recipients of those payments, etc.
For example, discovering fraud or money laundering loops is complex, because it does not have a defined starting point. The payment may originate from any customer account, and it also does not have defined number of hops; this is because fraudsters or money launderers often use 10-plus layers of synthetic accounts to hide their activities, TigerGraph said.
With its massively parallel processing (MPP) engine, TigerGraph2.4 addresses both standard as well as complex pattern matching for datasets of all sizes, the company said.
TigerGraph’s GSQL (graph structured query language) pattern-matching support enables users to express multi-hop queries in a compact, easy-to-read format. By expressing the multi-hop patterns in one line, the transparency of patterns used in analytics and feature engineering for machine learning is improved. Savvy database administrators understand this.
Massively Parallel Graph Engine
In addition, TigerGraph’s Massively Parallel Processing (MPP) graph engine guarantees scalable and efficient performance on any size graph analytics by combining the newly added pattern matching query syntax with its unique GSQL feature called accumulators. Accumulators allow data scientists and developers to define multi-dimensional criteria for computing a score or ranking to express how well the two patterns match.
Examples of how the accumulators work with the new pattern matching queries in GSQL include:
- Next-Generation Recommendation Engine: Traditional recommendation engines look at products purchased by a customer, find other “similar” customers who have purchased these items and consider other products or items purchased by these customers as recommendations for the customer in question. Accumulators can be used to define more comprehensive and discerning criteria for selecting “similar” customers, based on the demographics of the customers, recency of the shared or common purchased items with the customer in question, the total spend in the shared items category and a similarity or likeness score based on the characteristics of the purchased items.
- Fraud and Money-Laundering Detection: Fraud detection looks for transactional patterns similar to those of known cases of fraud or money laundering. Accumulators combined with the pattern matching in TigerGraph allows data scientists to define multi-dimensional criteria for fraud or money laundering detection. As new payments come in every second, accumulators recalculate a new fraud or money laundering risk score for each payment, as well as for each account sending or receiving the payment based on the multi-dimensional scoring criteria such as size, frequency and percentage of payments with other accounts suspected of being involved in fraud or money laundering.
This is combined with pattern matching as many as 10 levels deep in the payment and customer account graph to flag potentially fraudulent transactions and accounts that have crossed the threshold of acceptable risk and need to be investigated by the fraud or anti-money laundering analysts.
Only the speed of a graph database can enable this much processing within microseconds.
- Powering AI and Machine Learning with Real-Time Deep Link-Pattern Analytics: Explainable AI demands traceability of every decision–whether it’s a recommendation of a particular product or service to a customer or flagging an account for being involved in fraud or money laundering. The accumulators “show the math” involved in arriving at every decision, thereby allowing companies and government agencies to roll out the explainable AI solution to the customer-facing employees as well as to the end consumers. Graph-based features are computed for each pattern and these are fed into the machine learning solution to improve the accuracy for multiple use cases including recommendation engine, fraud and money laundering detection, customer 360 and cyber security.
In addition to unveiling v.2.4, TigerGraph announced that AWS users can use their S3 (Simple Storage Service) data natively in GraphStudio, significantly improving the efficiency of the AWS cloud business user.
GraphStudio can map data stored in local files into the graph schema, using a drag-and-drop GUI. The same ease of use is now available for AWS users who have data in their S3 files. Native S3 Import from GraphStudio offers better synergy with popular cloud data store and easy to use data import, making it simpler to run TigerGraph on AWS.
The company also announced the new TigerGraph JDBC connector, making it easier for Java developers to integrate TigerGraph into their applications.
TigerGraph 2.4 and the new TigerGraph JDBC Connector is scheduled for release in Q2. For more information, email the company at info@tigergraph.com.