17 Companies Streaming Real-Time Data to Drive Innovation
Uber connects riders with drive partners for an average one million trips a day, providing safe, reliable, convenient transportation at a variety of price points in more than 311 cities around the world. Key technologies: distributed logging via Apache Kafka and large-scale data processing via Apache Spark.
LinkedIn operates the largest professional network on the Internet with more than 380 million members in more than 200 countries and territories. Real-time technology has pushed the company to the forefront of the modern distributed infrastructure, providing a rich experience for millions of LinkedIn users from all over the world through quick insights and an immediacy that enhances engagement in the moment. Key technologies: Apache Kafka.
With more than 320 million active monthly users, Twitter is often the world’s source for breaking news. People turn to Twitter for information when natural disasters strike and election ballots roll in, and for second-by-second updates when countdown clocks begin on pivotal sporting events. With its success, data volumes and user engagement have grown. Recently, Twitter announced Heron, a newly deployed system that was purpose-built to improve speed and latency issues found in Storm. Key technologies: Apache Storm and Heron.
Tapjoy is a monetization and distribution services provider for mobile applications. It delivers optimized ads to more than 500 million global users at a rate of 1 million transactions per minute. Tapjoy implemented MemSQL as the database to power its Mobile Marketing Automation and Monetization Platform. For ad optimization to occur in real time, data needs to be made usable immediately. Key technologies: MemSQL, Apache Kafka and Apache Spark.
Spotify provides its user base of over 75 million listeners with the ability to stream music on demand. Following in popularity, Apple music boasts 15 million users. Twenty-five percent of Spotify users pay a monthly subscription fee for this service. However, the other 75 percent are able to stream music for free due to Spotify’s advanced ad-targeting technology designed using near real-time streaming. Spotify’s ad-targeting technology generates advertisements personalized to users by streaming and analyzing data in near-real time. Key technologies: Apache Kafka, Apache Storm and Hadoop.
In the last year, searches on Pinterest increased 81 percent. With this massive growth, the need for a reliable data infrastructure to optimize engagements became more important. Pinterest’s real-time data pipeline can group trending topics geographically for optimized results for both users and advertisers. Apache Kafka simplifies the collection of logs and the streaming of data to local disks. Fresh data is analyzed with Apache Storm, Apache Spark and Pinterest’s own custom-built logging tools. Pinterest then deploys SQL database MemSQL, which allows for anyone to easily query Pinterest’s data in real time. Key technologies: MemSQL, Apache Kafka, Apache Spark and Apache Storm.
Netflix can easily emit 80 billion events a day between log messages, server activity records and system operational data that it needs for business, product and operational analysis. Data is critical to Netflix’s operation and data pipelines are tantamount to the customer experience. Netflix developed Suro, a technology based on Apache Chukwa. Suro uses Apache Kafka when dispatching log events to a designated cluster under a mapped topic. Key technologies: Apache Chukwa, Apache Kafka and Suro.