Hadoop Drives Down Costs, Drives Up Usability With SQL Convergence
Because they were bleeding money, the team wanted a cost-effective solution. "Our target was $500 per terabyte. We were at $100,000 per terabyte with the old system," Peterson said. "With our Hadoop cluster, we're now at $900 per terabyte." In addition to saving major financial resources, Neustar's Hortonworks-based Hadoop system captures all the data—structured and unstructured—enables real data science, and provides more data to push into the reporting and analytics layer, Peterson said. Yet another Hortonworks Hadoop user, Luminar, is an analytics and modeling provider that serves the U.S. Hispanic market, transforming Hispanic consumer data into insights and business intelligence. Franklin Rios, president of Luminar, told eWEEK the company has more than 140 million consumer records with transactional data added to them. Rather than use sample data to tap into Latino consumer sentiment, Luminar took a big data approach, he said. Like Neustar, Luminar's existing environment could not handle the growth of data, and to scale that environment as it was would mean adding more expensive hardware, software and personnel. "But that was not feasible from a financial point of view," he said.The Hadoop system "became like an 'easy' button for my team," he said, noting that to date Luminar has 150 terabytes of data—70 percent of which is structured data and 30 percent of which is unstructured. Going in, Rios said, Luminar's projections were that the company would gain 13 to 15 percent cost efficiency, but so far that figure is at 28 percent based on the company's calculations. But it is not enough that Hadoop can save enterprises money and help them scale apps; it also must be accessible. As enterprises continue to adopt Hadoop, a key trend emerging is the convergence of Hadoop and SQL (Structured Query Language). "This trend is important because most BI [business intelligence] tools want to speak SQL, and Hive, which allows that, works through MapReduce [and] is too slow," Andrew Brust, president of Blue Badge Insights and a big data guru, told eWEEK. "The various technologies involved tend to query Hadoop's Distributed File System (HDFS) data directly, bypassing MapReduce. This works better with BI data discovery tools because you can interactively query without waiting forever between queries. It is also important because SQL skills are widespread and MapReduce skills are most definitely not." Brust noted some technologies in this space, including Cloudera with Impala, Teradata Aster with SQL-H, Hadapt, ParAccel with ODI and Microsoft with PolyBase—a component of SQL Server Parallel Data Warehouse. And there are many others. "The excitement around adding the SQL layer is aimed at empowering all the people out there that are knowledgeable in SQL," Hortonworks' Connolly said. "We do see that as an important use case, and we're investing in Hive."
"At Luminar, we use analytical modeling, technology and data processing to help our clients fine-tune their marketing strategies [and] tailor their messages to Hispanic consumers," Rios said.