Everyone in the enterprise is now talking about how to leverage big data, and a good chunk of that discussion includes the evolution of NoSQL database technologies. Experts are saying that 2012 is the year when IT departments start adopting NoSQL in earnest, but is the enterprise ready yet? What needs to happen in NoSQL's evolution to make it prepare it for highly complex data requirements? To find out, eWEEK asked Robert Greene, vice president of technology at Versant and a 20-year veteran of the database industry, to break down what needs to happen for NoSQL to become enterprise-ready. Versant is an object-oriented database provider, and the company is taking its own approach to the NoSQL movement. For instance, Greene said that NoSQL solutions need to leverage more of the classic database techniques for concurrency control and design their internals to take full advantage of modern multi-core hardware architecture. In addition, Greene said NoSQL is learning what the object database industry learned several years ago as it sought to deal with soft schema over a relational storage engine. However, enterprises will not change all their internal processes and replace existing systems for the sake of NoSQL. To evolve, NoSQL must address interoperability with existing systems, to couple through ETL, to facilitate data manipulation through enterprise tools, and it needs to present itself as a well-defined resource to existing monitoring and management processes.
of
IT Pros Don’t Want to Learn Multiple New Programming Languages
If a shift away from traditional RDBMSes based on SQL programming is to occur, developers and DBAs have to learn other design and programming techniques to deal with the database. Right now, most NoSQL vendors are building their solution around a proprietary API that creates adoption headaches and cost for the enterprise to train DBAs and programmers on these new techniques. Instead, widely known soft-schema relational database interfaces, such as the Java Persistence API, or JPA, should be adopted by the NoSQL vendors. This will allow enterprises to lower the cost and risk of adoption by leveraging existing employee skill sets. At the same time, businesses can capitalize on the architectural scale-out advantages of a NoSQL solution. Most developers know these interfaces already, and this makes it easier for enterprises to switch to a new DB platform.
Ability to Efficiently Handle Complex, Linked Information Models
Early NoSQL technologies succeeded in ushering in a shift in architecture to deal with the performance-at-scale problems of the relational database. Unfortunately the world—and especially enterprises versus smaller businesses—cannot ignore modern information model complexity, and the early NoSQL technologies are scarcely addressing this point. For example, in a relational database (RDBMS), the database engine itself knows how to do the set-based operations, which re-establish a link between data. The term "re-establish" is important here, because in both RDBMS and Hadoop, this operation needs to happen again and again each time the same question is asked or queried; as that "link" in the information model is not stored data in the database, it is a calculated result. In any case, the RDBMS in essence does successive queries and applies the set operators to find data matches re-establishing the links, pipelining a set operation for each subsequent query into the results of the current query. While inefficient due to the RDBMS’ need for constant JOIN recalculation, that “linking” of data is critical to most enterprise application domains. Early NoSQL technologies need to expand to allow this “linking” to take place to accommodate true information model complexity. Document stores are headed in the right direction, but restricting data links to only embedded information models will not suffice for the enterprise. Instead, NoSQL needs to address the scale issue but also deal with the reality of complex, structured data.
Concurrency Demands, Multi-core Architectures Need to Scale Up
Within the next 12 to 18 months, the multi-core node will have been commoditized. To remain efficient, databases need to scale up before they scale out, or the new generation of hardware will be wasted and unnecessary operational costs will creep into production deployments. However, NoSQL has almost exclusively been focused on the ability to support scale-out. Some have argued that the cost of computing in databases is tied to the problems of dealing with concurrency, and how implementations leverage such things as locking, latching and buffer management to overcome concurrency design issues. In short, the argument is that if you move to a single thread and only scale out, you can somehow scale as needed by removing those aspects of implementation. The reality is that today’s computing simply doesn’t work that way, and it really won’t work that way in the future.
Production Management Requires Unification, Simplification
NoSQL’s inherent scale-out architecture leads to a new kind of production environment that is difficult to manage. Distributed file systems, distributed databases, distributed caches and distributed virtual machines all combine to form an intensely complicated production runtime environment. NoSQL technology tooling needs to mature to enterprise-class standards. Unified production-monitoring consoles and cluster-wide provisioning capabilities are needed to simplify the view and management of computational resources. It is not reasonable to expect that system administrators need to use six different tools for six different distributed technologies. These tools already exist in the traditional enterprise space. NoSQL technologies will need to adapt to these tooling standards for seamless integration into the larger enterprise production context.
Enterprises Don’t Want to Reinvent the Wheel
NoSQL was created to solve very specific point problems in highly competitive markets with companies such as Google, Yahoo, Amazon, Facebook and many others. It was an urgent attempt to get these kinds of businesses to a new level of scalability before they collapsed under the weight of their user growth and lost market share to the competition. However, these base technologies were not fully productized, and they don’t provide many of the capabilities needed for enterprises to interface with other existing tools and data sources. The NoSQL platforms are evolving programmatically to address the realities of information model complexity. For instance, Hadoop is getting an HBase solution to give it SQL-like abilities. And Cassandra is evolving to use a super-column to look more like a document store.
Windows Azure is a public cloud platform for building, hosting and scaling applications. Try Windows Azure free for 90 days and get 20GB outbound and unlimited inbound data transfer.
Everyone in the enterprise is now talking about how to leverage big data, and a good chunk of that discussion includes the evolution of NoSQL database technologies. Experts are saying that 2012 is the year when IT departments start adopting NoSQL in earnest, but is the enterprise ready yet? What needs to happen in NoSQL's evolution to make it prepare it for highly complex data requirements? To find out, eWEEK asked Robert Greene, vice president of technology at Versant and a 20-year veteran of the database industry, to break down what needs to happen for NoSQL to become enterprise-ready. Versant is an object-oriented database provider, and the company is taking its own approach to the NoSQL movement. For instance, Greene said that NoSQL solutions need to leverage more of the classic database techniques for concurrency control and design their internals to take full advantage of modern multi-core hardware architecture. In addition, Greene said NoSQL is learning what the object database industry learned several years ago as it sought to deal with soft schema over a relational storage engine. However, enterprises will not change all their internal processes and replace existing systems for the sake of NoSQL. To evolve, NoSQL must address interoperability with existing systems, to couple through ETL, to facilitate data manipulation through enterprise tools, and it needs to present itself as a well-defined resource to existing monitoring and management processes.