Apache Kafka is a distributed event-streaming platform that enables companies to monitor and manage real time data feeds. This open source software launched in 2011, following its initial development by LinkedIn, and evolved into a real-time event-streaming platform by 2015.
Kafka is not the only event-streaming technology; it competes in the marketplace with Amazon Kinesis. But Kafka has gained solid marketshare, and is the basis for multiple implementations, including Red Hat AMQ Streams.
High-profile tech companies like LinkedIn, Netflix, Uber, and others proved the business case for combining Kafka, streaming data, date pipelines, and business analytics. In 2015, using Kafka event-streaming was still a new approach to computing that made it easier to “ingest” large volumes of data from data lakes. That allowed customers to blend enterprise applications with the cloud’s scale-out, distributed computing and microservices.
Kafka uses a “publish-and-subscribe” model that links data sources (IoT sensors, factory-floor updates, retail sales events, media/entertainment data) to data receivers, labeling them as “topics.” The data, sorted by topics, flows in parallel data streams that don’t interfere with one another. The Kafka event-streaming process uses software “connectors,” linking Kafka event-streams to enterprise data stores and software products.
Addressing the Need for Constant Monitoring
Customers want to take the real-time “data temperature” of their business every day and around the clock. They’re increasingly asking Kafka software to help them do that job.
This IT strategy for event-streaming in a cloud-centric world is gaining traction. Kafka is often a key element in a business’s intelligent process automation initiatives, as implemented by many vendors’ software products. The business push to leverage data-in-motion is driving many customers to connect their cloud microservices with enterprise data sources, ranging from sensor data to enterprise databases.
Data-in-motion tells the business where the economic “action” is taking place in their organization. Applying event-streaming data—from the factory floor, local banks, retail stores, and sporting events—helps businesses adjust their daily processes to achieve better business outcomes.
Recent supply-chain backups are being identified using Kafka to track the real-time placement of the delayed cargo shipments. But the types of applications that can be used with Kafka are very broad, reaching across the enterprise and around the world. Examples include:
- Adaptive pricing optimization.
- Smart recommendations systems based on sales.
- Detection systems for anomalies in the data that identifies fraud and theft.
Rapidly Growing Market for Event-Streaming Software
The worldwide market for messaging and event-streaming software is growing at a rapid 26.9% CAGR. It’s projected to grow from $1.6 billion in 2019 to $5.3 billion in 2025, according to IDC.
“While this market is growing rapidly, growth in event streaming is explosive,” Maureen Fleming, program vice president of intelligent process automation research at IDC, told eWeek.
Why is this happening now? What’s changed since Kafka’s earlier growth spurt in 2015, when it began appearing in software vendor’s products? Some recent shifts in enterprise computing are leveraging Kafka in new and important ways:
- Increased emphasis on real-time event streaming. Examples include identifying and applying changes in retail data and financial data, fine-tuning pricing optimization, and updating business decisions based on new sales patterns. Kafka connects data across the enterprise and cloud providers, harvesting the data and feeding it to other types of software that provide deep analytics of the distributed events.
- Proliferation of microservices. Application development is increasingly focused on building microservices for hybrid clouds and multiclouds. These cloud-native applications, leveraging containers and Kubernetes orchestration, work with data-in-motion before storing application results as data-at-rest in enterprise databases.
- Growing importance of multicloud deployments. Large companies with presences across multiple geographic regions need to update database event logs for analytics, data compliance rules, and faster response to changing business conditions. Kafka’s publish-and-subscribe model supports multi-cloud data replication, helping to ensure business continuity in cases of outages and disaster recovery.
The accelerating pace of customer cloud migrations is giving customers a chance to re-think the way data is distributed across their enterprise—and to use it differently than before.
Event-streaming is allowing customers to move into new “patterns” of data management, including scaling capacity by doing data updates in parallel, supporting dynamic analytics across hybrid clouds and multiclouds, and speeding analytics results that improve business outcomes.
Connecting Event-Streaming to Enterprise Databases
As cloud migrations accelerate, the business world realizes that the “front end” of their data landscape, associated with data-in-motion and cloud-native microservices, must now be linked to the “back end” enterprise data, which is data-at-rest stored in data centers, data lakes, and data warehouses.
“Large enterprise organizations are increasingly looking to become more event-driven,” Jeff Pollock, vice president of product development at Oracle Corp, told eWeek. “They want to take advantage of innovative opportunities to work with data—as the data is being born.
“Technologies like Kafka empower a lot of these cutting-edge use cases,” including the development of cloud-native microservices and new applications, he added.
One big change in the event-streaming world is that new roles, also known as “personas,” have emerged as the users of Kafka event-streaming systems. Now that the nuts-and-bolts of streaming data are well-understood, there’s a greater focus on applications and tools that can be built around the foundation of event-streaming software. That’s why the familiar SQL query language for enterprise data—widely used by data scientists—is being integrated into many application tools designed for use with Kafka and event-streaming.
Building dynamic, cloud-native microservices will require user-friendly application toolkits.
“The interest we’re seeing in the last couple of years is not just coming from IT developers, but from business-driven use-cases,” George Vetticaden, vice president of product management for Cloudera’s Data-in-Motion business unit, told eWeek. “Application developers, data scientists, data engineers, all of these classes of developers now want to tap into Kafka.”
Broader Set of Uses for Event-Driven Software
This broader set of users is demanding a broader set of uses for event-driven software. Since 2016, the market has moved from “easier data ingestion” into enterprise data lakes to a generation of software tools and applications that harvest data for faster business decisions.
New patterns are emerging for event-streaming’s publish-and-subscribe model, which allows multiple, parallel data streams to move across the enterprise without slowing data updates from corporate data lakes and distributed data sources. From a business perspective, many customers are looking for increased scalability for large data volumes and better support for multi-cloud applications.
Vendors are Extending Kafka Functionality
It’s clear that no single vendor controls the Kafka software stack. However, many vendors are providing Kafka-enabled software products and services that enable Kafka-fueled mechanisms to help customers transform traditional enterprise computing.
Oracle, Cloudera, and Confluent, to name three examples, extended Kafka-enabled functionality this year to address and simplify operational complexity for customers adopting hybrid cloud and multi-cloud. Let’s look at what each one added:
Oracle’s Golden Gate data-integration software was recently updated and released as a fully managed and automated cloud service for the Oracle Cloud Infrastructure (OCI), Oracle’s second-generation public cloud. This latest release of Golden Gate supports Kafka event-streaming, dynamic real-time scalability, improved ease of use, and automation for scaling up large data volumes.
Cloudera recently announced Cloudera Data Flow for the Public Cloud, a new cloud-native service leveraging Kafka that provides real-time data-streaming on the Cloudera Data Platform (CDP). Cloudera Data Flow automates complex data-flow operations while auto-scaling the volume of streaming data events across customers’ hybrid clouds.
Confluent Inc. announced an expanded multicloud strategy, with Confluent Cloud data-management software that runs across public clouds, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Confluent also announced a strategic partnership with IBM, which allows IBM to resell the Confluent Platform and IBM Cloud Pak software, with unified customer support from IBM and Confluent.
What’s Next for Kafka?
As we approach 2022, the drive to pull enterprise data and cloud data together is accelerating, and many customers are using Apache Kafka to do just that. They’re using Kafka to move event data throughout the enterprise and the cloud, speeding up data-based decisions with the fierce urgency of “now” that is being built into cloud services based on distributed data.
Real-time data, moving through the enterprise through a publish-and-subscribe model, is emerging as an important approach to transforming enterprise infrastructure for the age of microservices and multi-cloud deployments.
“Kafka is one of the main tools in the kit-bag that can help organizations adapt toward becoming a more real-time business,” said Pollock. “And I think that’s at the very heart of many of the digital transformation initiatives that CIOs are embarking on.”