Suppose I brought into work an Apple II computer from 1977. This one-time state-of-the-art machine sports a 1MHz processor, 4KB of RAM and a monochrome video controller. It comes from the same era as the Atari, the Commodore 64 and the TRS-80. Obviously, these computers (as well as their operating systems, floppy drives, keyboards and displays) are now nostalgic antiques, not business computers.
But before the first microprocessor emerged, which made affordable home computers such as the Apple II possible, Edgar Frank Codd introduced a breakthrough in his landmark paper, “A Relational Model of Data for Large Shared Data Banks,” which would mark the invention of the rows-and-columns database dubbed “the classical relational model.”
The classical relational model, which predates the commercialization of the Internet, is today a crucial element in the mission-critical systems of nearly every Global 2000 company. But consider three important things that have transpired during the lifetime of this rows-and-columns paradigm:
1. The largest databases in the world have gone from virtually nothing to 100 terabytes to multiple petabytes.
2. Throughout the 1980s and 1990s, structured data dominated databases. But today, unstructured data is growing at nearly double the pace of structured data. Additionally, the way we use data is increasingly focused on analytics.
3. Computerized analytics, which did not exist at the time the rows-and-columns database was invented, has become a market of over $20 billion.
Despite the challenges the classical relational model imposes on scalability, performance and manageability in the face of modern data volumes and applications, it continues to stick as the status quo. The rest of the technology ecosystem has developed at a torrid pace, but the rows-and-columns paradigm remains virtually unchanged after decades. It’s like using millions of floppy drives in an age of streaming high-definition movies or playing pong in an era of supercomputers.
Limitations of Row-and-Column Database Systems
Limitations of row-and-column database systems
Allow me to review four reasons why rows and columns are as inappropriate in a modern IT environment as bringing an Apple II computer to work.
Reason No. 1: Rows and columns are not scalable for modern data volumes
As the amount of data in the enterprise continues to double each year, so do the sizes and numbers of tables. As the tables get larger, queries from analytics applications must scan through an increasing number of rows and columns to find the selected data. If current trends continue, this will become utterly crippling for IT in this decade because tables will be too large to search despite advances in hardware performance. These same trends will also make tables incredibly burdensome to manage.
Reason No. 2: Tables are a full-time job
Today we’ve come to accept that large enterprises need a team of IT staff managing tables-creating them, loading them, joining them, reading them into memory, scanning them, sorting them, storing them and reorganizing them. All this table housekeeping is becoming increasingly burdensome for three reasons.
First, large volumes of unstructured data must be loaded into tables and indexed. Second, as data volumes grow, so do the number of tables that must be maintained. And third, performance requirements push IT to constantly, manually structure and restructure tables to achieve acceptable performance.
Reason No. 3: Rows and columns were never designed for analytics
The row-and-column format was created before computerized business analytics existed and worked well for transaction processing. Information stored in rows and columns is not inherently useful for analytics and must be indexed. This indexing process creates delays between when data is ingested and when it will become available for query.
Reason No. 4: Rows and columns are a rigid static structure
Row-and-column-based databases are built for specific applications, often before IT knows how those applications will be used. Then, as usage patterns change and the business needs to look at its data differently, the entire table structure must be manually optimized to achieve acceptable query performance. The next-generation of advanced analytics will not be possible with rigid, static row-and-column structures because of the manual overhead of incorporating unstructured data or enabling ad hoc analytic queries.
Improvements to Row-and-Column Database Systems
Improvements to row-and-column database systems
These problems are certainly not entirely unknown. When employees spend their working lives managing tables, it’s not likely that they never thought, “There must be a better way to do this.” As technology and business executives spend millions on servers, software and services, you can bet they looked for a more cost-effective way to get what they need.
In fact, a slew of vendors have emerged to solve these problems with incremental improvements on row-and-column database systems. These include proprietary hardware and massively parallel clusters of computers working with column-oriented systems.
Many of these demonstrate significant performance improvements but are still limited by the rigidity of the classical relational model. Table upkeep, manual performance tuning, indexing and loading processes remain. Additionally, the expense in hardware, custom programming and software licenses means organizations are emptying their pockets for the analytics performance they direly need.
Remove constraints imposed by classical relational model
To break free from the shackles of rigid static rows and columns that are holding us back, we need to toss the incremental improvements on the classical relational model to the wayside and create new models. Once data is no longer constrained by rigid, static row-and-column structures, computers can automatically restructure data to dynamically optimize performance and adjust to new queries with mathematical precision.
Additionally, by removing the constraints imposed by rigid, static data structures and enabling computer software to manipulate data in any structure, we can better analyze unstructured data and remove the scalability problems associated with antiquated data management systems based on the classical relational model.
To upgrade databases from a wagon to a jetpack, we need to remove ourselves entirely from the constraints imposed by the classical relational model and create a better way.
Charles H. Silver is CEO at Algebraix Data Corporation. Charles has more than 25 years of experience as a successful entrepreneur. Most recently, he sold new media company RealAge Inc. to Hearst Corporation in 2007. Charles founded RealAge in the late 1990s based on a ground-breaking business plan for building revenue and attracting customers. Prior to his nine years at RealAge, Charles built a series of profitable franchises in the retail and real estate markets. After graduating from the University of Michigan in 1981, Charles spent the first few years of his career as a staffer for the governor of Michigan and for a U.S. Congressman. He can be reached at CSilver@algebraixdata.com.