Just about every enterprise in the world makes use of a database, whether it’s a parallel, distributed, flat file, document-oriented, operational, embedded or graph database. In this article, we’re discussing graph databases, which are riding a sales updraft right about now.
There are several graph database languages on the market, including Neo4j’s Cypher, Google Cayley, TIBCO, Apache TinkerPop Gremlin, Amazon’s Neptune and TigerGraph’s GSQL, to name a few. And they are all being tested and bought by companies around the world for their speed and ease of use.
Graph search, an open-source database project built on all the networking people around the world do online every day, is the most far-reaching search IT to go mainstream since Google started storing up and ranking websites in 1999. Basically, a graph search database anonymously uses all the contacts in all the networks in which you work to help you find information.
Anything you touch, any service you use and anything people in your networks touch eventually can help speed information back to you. It avoids anything non-relevant that would slow down the search process.
Making a Decision on a Graph DB
When making a decision on which graph DB in which to invest, admins must take into account the development language the enterprise is going to use. There are different nuances among the database vendors for different use cases (some are more scale-out than others, for example), so organizations must scope this out ahead of time to make sure the most effective language and DB is selected for the corporate purpose.
Instead of discussing which graph language is the best, or fusing the best aspects of each graph language into one new, unified option, it’s worth taking a step back to ask a more fundamental question: What are the prerequisites for a graph query language in the first place?
In a special feature for eWEEK, Mingxi Wu, Vice-President of Engineering for TigerGraph, offers several key requirements based on years of experience working on real-world graph management problems. These are important features to consider for using any graph query language.
Data Point 1: Schema-Based with Capability of Dynamic Schema Change
The success of SQL for relational database programming is largely attributed to data independence. This is where a user works with a logic level data schema independent of the underlying storage mechanism, and any changes to the logic level data schema are transparent to existing applications (so they won’t be broken). With data independence, all the lower level changes are transparent to the upper level.
Applications written on a prepostulated schema (or data model) can be ported to any data management software and hardware that understand the schema. By the same token, a graph model and graph query language should embrace data independence. To be more precise, a graph query language should support the definition of logical graph schemas. Because data models change over time in the real world, the language and system should also support dynamic schema changes.
Data Point 2: High-Level Control of Graph Traversal
The SQL language is purely declarative, meaning users ask questions without worrying how the database software processes the queries. This relieves the user from coding a high-performance algorithm to perform their task. The same fashion can be applied to designing an elegant graph query language. For simple queries, it can be declarative, such as a pattern match query.
Data Point 3: Fine Control of Graph Traversal
In addition to high-level queries, we see large demand for coding sophisticated iterative graph mining algorithms and traversal algorithms, where they want to traverse the graph data, add tags and side effects (runtime attributes), with iterative updates until a final query result is computed.
For example, consider algorithms for collaborative filtering recommendation, PageRank, community detection, label propagation, similarity-based ranking, etc. Based on field experience, customers come to graph management to design and execute complex algorithms that SQL cannot solve. Therefore, fine control of both the graph query and the graph model’s elements – vertex and edge instances – is a must for a standard graph query language.
Data Point 4: Built-in Parallel Semantics to Guarantee High Performance
Graph algorithms are expensive, as each hop could potentially increase the data complexity exponentially. Let’s consider how: as we start from one vertex, the first hop can yield n neighbors, the next hop on the graph from the n immediate neighbors can yield n² more neighbors, and so on. As a result, each traversal step in the graph should have built-in parallel processing semantics to ensure high performance.
Data Point 5: A Highly Expressive Loading Language
The world is a graph. Ingesting data silos into a central graph schema needs an expressive and flexible schema-mapping loading language. Otherwise, even integrating a couple of data sources into the logical graph schema will be a daunting task. A comprehensive graph language needs to include an easy-to-use loading functionality to help onboard heterogeneous data quickly. This is one of the highest priority requirements for a graph query language and associated system.
Data Point 6: Data Security and Privacy
Enterprise customers are keen on facilitating their own collaboration with graph data. On one hand, they want a graph model that can share selected parts of the data among multiple departments and even among partners. At the same time, they also want to keep certain sensitive data private and accessible only by specific roles, departments and partners, based on their business needs. A MultiGraph with a role-based security model is required for a successful graph query language in the real world for large enterprises.
Data Point 7: Support For Queries Calling Queries (recursively)
This is a less obvious requirement in terms of implementing a graph-based solution that generates business value. Quite often, customers want to ask the same query for all vertices in the graph, the so-called batch processing. For example, for a product recommendation problem, they want to find recommendations for each and every customer. To write a batch query like this in one graph query is much harder than writing a single customer recommendation query, because the data structures and algorithmic flow are much more complex. To solve this problem, a query-calling-query feature comes to rescue.
Data Point 8: SQL User-Friendly
As we work in the field, we see that most customers have SQL developers. The graph query language, therefore, should stay close to SQL syntax and concepts as much as possible so that relational SQL developers can learn and implement graph applications with minimal ramp-up time.