Apache Kafka Survey Reveals Growing Importance of Streaming Data

Unlike traditional enterprise messaging software, Kafka is able to handle all the data flowing through a company, and do it in near real time. This is desperately needed as data volumes skyrocket.


As the overall volume of data from machines and human-operated devices--mostly from machines--continues to zoom up like an F-18 off an aircraft carrier, IT systems sometimes find themselves straining to keep that data moving from place to place at internet speed.

All those moving bits are becoming heavy loads for systems old and new. Server and storage software needs to refreshed all the time in order to keep up, or else an enterprise can fall behind its competitors in delivery of services to customers.

At the outset of the IoT and IIoT (industrial internet of things) age, data center operating systems like Nutanix, OpenStack and Mesosphere are three of the engines that are able to scale up in real time as needed by workload demands.

One thing in common they have is an element called Apache Kafka, an open-source stream processing platform written in Scala and Java by Jay Kreps, Neha Narkhede and Jun Rao while they were working at LinkedIn. It was then open-sourced through the Apache Software Foundation. Kafka provides a unified, high-throughput, low-latency platform for handling real-time data feeds.

Kafka Handles Huge Amounts of Data

Unlike traditional enterprise messaging software, Kafka is able to handle all the data flowing through a company, and do it in near real time. Kafka is a high-performance publish/subscribe message bus designed for high availability and data durability, with minimal latency. Kafka acts as a central data backbone and enables loose coupling between applications (Akka), data processing (Spark) and data persistence (Cassandra) services.

The problem of managing data in enterprises is an exercise in complexity. As companies add new applications, integrate and modernize existing systems, change to a microservices architecture and embrace a factor increase in data-producing endpoints, their infrastructure, and the engineers who support it, face an existential question: Should we pursue each new data project as a unique discrete effort, or find an overall approach with greater scalability?

For many IT systems people, Kafka has become the answer to this question. It transforms an ever-increasing number of new data producers and consumers into a simple, unified streaming platform at the center of their organization. It allows any team to join the platform, allows a central team to manage the service, and scales to trillions of messages per day while delivering messages in real time.

Survey Encompassed 350 Organizations

To find out more about why and how companies are using streaming data and the impact it has on their business, data management platform maker Confluent recently surveyed 350 enterprises or organizations from 47 countries and a wide variety of industries to understand the evolving Apache Kafka user base, use cases and deployments.

Following are some highlights from the survey and seven key data points that show where Kafka is headed in the enterprise.

Data Point 1: Kafka use is experiencing a surge.

Eighty-six percent of respondents reported that the number of their systems that use Kafka is increasing and one-fifth reported that the number is "growing a lot!" A majority (52 percent) of organizations have at least six systems running Kafka with 21 percent having more than 20. According to last year's report, only 41 percent of organizations had at least six systems running Kafka, and only one-tenth had more than 20.

Data Point 2: Kafka enables companies to create new market opportunities.

Because data is available, shared and immediate, companies can create new products and significantly transform existing ones. As Kafka becomes deployed in more mission-critical infrastructures, a majority (54 percent) of surveyed organizations say that their business can make more accurate and/or quicker decisions thanks to Kafka, and 28 percent of respondents were able to identify new business opportunities through the use of Kafka.

Data Point 3: Kafka is being used widely in the cloud.

Kafka is used by organizations in some combination of virtual private clouds (34 percent), public clouds (52 percent) and on premises (57 percent). The survey also found that 60 percent of respondents use AWS as their public cloud, processing billions of messages per day.

Data Point 4: Kafka is being used beyond data pipelines in the enterprise.

Last year, the survey reported that a surge of companies adopted streaming platforms, while this year, the rise of microservices is leading to more diverse use cases of Kafka. Two-thirds (66 percent) of respondents use Kafka for stream processing, three out of five (60 percent) use it for data integration, and half (50 percent) are using Kafka for messaging and log aggregation.

Data Point 5: Applications utilizing Kafka stretch far and wide.

Similar to last year's report, three-fourths (75 percent) of organizations have applications connected to their Kafka systems. These applications process data from sensors, websites, analytics and monitoring tools and share the information so the right teams can process the data they need to make decisions. The types of applications that a majority of respondents connect to with Kafka are asynchronous applications (57 percent) and data warehouse (51 percent). Organizations also have application monitoring (41 percent), system monitoring (30 percent), or recommendation/decision engines (30 percent) powered by Kafka.

Data Point 6: The Kafka Connect API is bringing new data into the streaming era.

The popularity of the Kafka Connect API has grown significantly over the past year. The Kafka Connect API, included in Kafka, makes it easy to add new datastores to your data pipelines without having to write the interfaces from scratch. There was a 15-point increase in organizations using the Kafka Connect API over last year (37 percent in 2017 versus 12 percent in 2016). A majority (59 percent) of respondents have databases connected to their Kafka clusters, with 25 percent of respondents connecting websites and 15 percent connecting sensors and device data to their clusters.

Data Point 7: There is a Kafka engineer shortage.

According to a Dice report, people with Kafka skills receive one of the highest salaries in the technology market. However, despite the salary and the growth of Kafka within organizations, the survey found that three-quarters (75 percent) of respondents find it at least somewhat difficult to find the right talent with Kafka skills.

Editor's note: This article was corrected to credit the creators of Kafka.

Chris Preimesberger

Chris J. Preimesberger

Chris J. Preimesberger is Editor-in-Chief of eWEEK and responsible for all the publication's coverage. In his 15 years and more than 4,000 articles at eWEEK, he has distinguished himself in reporting...