On Wednesday, Feb. 12, at 11 a.m. PST/2 p.m. EST/7 p.m. GMT, @eWEEKNews will host its 83rd monthly #eWEEKChat. The topic will be “Batch Goes Out the Window: The Dawn of Data Orchestration.” It will be moderated by Chris Preimesberger, eWEEK’s Editor in Chief.
Some quick facts:
Topic: #eWEEKchat Feb. 12: “Batch Goes Out the Window: The Dawn of Data Orchestration”
Date/time: Wednesday, Feb. 12, 11 a.m. PST/2 p.m. EST/7 p.m. GMT
Tweetchat handle: You can use #eWEEKChat to follow/participate via Twitter itself, but it’s easier and more efficient to use the real-time chat room link at CrowdChat. Instructions are on that page; log in at the top right, use your Twitter handle to register, and the chat begins promptly at 11am PT. The page will come alive at that time with the real-time discussion. You can join in or simply watch the discussion as it is created. Special thanks to John Furrier of SiliconAngle.com for developing the CrowdChat app.
Our in-chat experts will include: Eric Kavanaugh, CEO of The Bloor Group and host of DM (Data Management) Radio; Wally MacDermid, VP of Cloud for Scality; Dipti Borkar, Vice President of Products at Alluxio; Christopher Merz, Principal Technologist at NetApp; Chris Oshiro, Field CTO of AtScale; Alex Ma, Director of Solutions Engineering for Alluxio; and Sam Lakkundi, BMC Vice-President of Product Management. Attendees can offer their own perspectives at any time.
Chat room real-time link: Use https://www.crowdchat.net/eweekchat. Sign in and use #eweekchat for the identifier.
Data Orchestration Discussion is Also on Radio
You can learn even more about this topic by listening in to DM (Data Management) Radio’s hourlong show Thursday, Feb. 13, at noon Pacific and 3p.m. Eastern, with host Eric Kavanagh and co-host Chris Preimesberger of eWEEK. Title of the show “Data Orchestration: Getting All the Instruments in Tune.” Guest experts will be Sean Knapp, CEO and founder of Ascend.io, and Haoyuan Li, founder and CEO of Alluxio.
The DM Radio network is carried live in 20 markets nationally and on internet radio at the link above.
Managing All That Data Efficiently
By Eric Kavanagh (excerpted from eWEEK, Dec. 3, 2019)
Data and its mechanism of transport has long been the tried-and-relatively-true practice of extract, transform, load, a.k.a. ETL. That’s now finally changing.
Granted, there have been other ways of moving data: Change data capture (CDC), one of the leanest methods, has been around for decades and remains a very viable option; the old File Transfer Protocol (FTP) can’t be overlooked; nor can the seriously old-fashioned forklifting of DVDs.
Data virtualization 1.0 brought a novel approach as well. This approach leveraged a fairly sophisticated system of strategic caching. High-value queries would be preprocessed, and certain VIP users would benefit from a combination of pre-aggregation and stored result sets.
During the rise of the open-source Hadoop movement about a decade ago, some other curious innovations took place, notably the Apache Sqoop project. Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. Sqoop proved very effective at pulling data from relational sources and dropping it into HDFS. That paradigm has somewhat faded, however.
But a whole new class of technologies–scalable, dynamic, increasingly driven by artificial intelligence–now threatens the status quo. So significant is this change that we can reasonably anoint a new term in the lexicon of information management: data orchestration.
There are several reasons why this term makes sense. First and foremost, as an orchestra comprises many different instruments–all woven together harmoniously. Today’s data world suddenly boasts many new sources, each with its own frequency, rhythm and nature.
Secondly, the concept of orchestration implies much more than integration, because the former connotes significantly more complexity and richness. That maps nicely to the data industry these days: The shape, size, speed and use of data all vary tremendously.
Thirdly, the category of data orchestration speaks volumes about the growing importance of information strategy, arguably among the most critical success factors for business today. It’s no longer enough to merely integrate it, transport it or change it; data must be leveraged strategically.
Down the Batch!
For the mainstay of data movement over the past 30 years, ETL took the lead. Initially, custom code was the way to go, but as Rick Sherman of Athena IT Solutions once noted: “Hand coding works well at first, but once the workloads grow in size, that’s when the problems begin.”
As the information age matured, a handful of vendors addressed this market in a meaningful way, including Informatica in 1993, Ab Initio (a company that openly eschews industry analysts) in 1995, then Informix spin-off Ascential (later bought by IBM) in 2000.Those were the heydays of data warehousing, the primary driver for ETL.
Companies realized they could not effectively query their enterprise resource planning (ERP) systems to gauge business trajectory, so the data warehouse was created to enable enterprise-wide analysis.
The more people got access to the warehouse, the more they wanted. This resulted in batch windows stacking up to the ceiling. Batch windows are the time slots within which data engineers (formerly called ETL developers) had to squeeze in specific data ingestions.
Within a short span of years, data warehousing became so popular that a host of boutique ETL vendors cropped up. Then, around the early- to mid-2000s, the data warehouse appliance wave hit the market, with Teradata, Netezza, DATAllegro, Dataupia and others climbing on board.
This was a boon to the ETL business but also to the Data Virtualization 1.0 movement, primarily occupied by Composite Software (bought by Cisco, then spun out, then picked up by TIBCO) and Denodo Technologies. Both remain going concerns in the data world.
Big Data Boom
Then came big data. Vastly larger, much more unwieldy and in many cases faster than traditional data, this new resource upset the apple cart in disruptive ways. As mega-vendors such as Facebook, LinkedIn and others rolled their own software, the tech world changed dramatically. The proliferation of database technologies, fueled by open-source initiatives, widened the landscape and diversified the topography of data. These included Facebook’s Cassandra, 10gen’s MongoDB and MariaDB (spun out by MySQL founder Monty Widenius the day Oracle bought Sun Microsystems)–all of which are now pervasive solutions.
Let’s not forget about the MarTech 7,000. In 2011, it was the MarTech 150. By 2015, it was the MarTech 2,000. It’s now 7,000 companies offering some sort of sales or marketing automation software. All those tools have their own data models and their own APIs. Egad!
Add to the mix the whole world of streaming data. By open-sourcing Kafka to the Apache Foundation, LinkedIn let loose the gushing waters of data streams. These high-speed freeways of data largely circumvent traditional data management tooling, which can’t stand the pressure.
Doing the math, we see a vastly different scenario for today’s data, as compared to only a few years ago. Companies have gone from relying on five to 10 source systems for an enterprise data warehouse to now embracing dozens or more systems across various analytical platforms.
Meanwhile, the appetite for insights is greater than ever, as is the desire to dynamically link analytical systems with operational ones. The end result is a tremendous amount of energy focused on the need for … (wait for it!) … meaningful data orchestration.
For performance, governance, quality and a vast array of business needs, data orchestration is taking shape right now out of sheer necessity. The old highways for data have become too clogged and cannot support the necessary traffic. A whole new system is required.
Questions We’ll Discuss
That’s what we’re here to chat about on Feb. 12. Questions we’ll ask include:
- How is your company using data orchestration?
- What tools are you using to orchestrate data?
- What data orchestration providers do you see as leading this revolution?
- How important is Kubernetes in these new orchestration models? What alternatives are there?
- What should data orchestration be doing, or doing better, than it currently does for your company?
Join us Wednesday, Feb. 12 at 11am Pacific / 2pm Eastern for this, the 83rd monthly #eWEEKchat. Go here for CrowdChat information.
#eWEEKchat Tentative Schedule for 2020*
Jan. 8: Trends in New-Gen Data Security
Feb. 12: Batch Goes Out the Window: The Dawn of Data Orchestration
March 10: New Trends and Products in New-Gen Health Care IT
April 8: New Enterprise Collaboration Tools
May 13: Trends in New-Gen Mobile Apps, Devices
June 10: Storage and Data Protection Trends
July 8: New Advances in Networking
Aug. 12: TBA
Sept. 9: DataOps: The Data Management Platform of the Future?
Oct. 14: IBM, Dell, Oracle, Cisco, both HPs: How Legacy Companies Are Still Innovating
Nov. 11: Hot New Tech for 2021
Dec. 9: Predictions and Wild Guesses for IT in 2021
*all topics subjects to change