How Apache Spark Helped Eight Companies Grow Their Businesses

 
 
By Darryl K. Taft  |  Posted 2015-09-07
 
 
 
 
 
 
 
 
 
  • Previous
    How Apache Spark Helped Eight Companies Grow Their Businesses
    Next

    How Apache Spark Helped Eight Companies Grow Their Businesses

    Spark is a powerful processing engine powering millions of real-time apps every day. Here is how eight companies are using Spark to support their businesses.
  • Previous
    Spark Helps Shopify Make Wise Store Selections
    Next

    Spark Helps Shopify Make Wise Store Selections

    Shopify needed to understand what types of products its customers were selling in order to select eligible stores for a business partnership. However, its data warehouse kept timing out when running data mining queries. Using Spark, Shopify was able to use the power of distributed computing to mine through millions of records of data and process 67 million records in minutes. The company was able to successfully categorize stores based on their products and get the list of partnership-eligible stores.
  • Previous
    Spark Gives OpenTable a Tenfold Speed Boost
    Next

    Spark Gives OpenTable a Tenfold Speed Boost

    OpenTable has more than 32,000 restaurants in its system worldwide, and each month it seats more than 16 million diners. It uses Spark both for training its recommenders and for the natural language processing of the reviews to generate topic models. Spark gave OpenTable a 10x speed improvement, reducing the algorithm run time from weeks to mere hours and allowing for dramatically higher team productivity.
  • Previous
    Spark Helps Pinterest Identify Trends
    Next

    Spark Helps Pinterest Identify Trends

    Pinterest uses Spark to find patterns in high-value user engagement data. Using Spark, Pinterest is able to identify—and react to—developing trends as they happen. In turn, Pinterest and their partners cab get a better understanding of user behavior and provide more value to the Pinterest community.
  • Previous
    Conviva Reduces Customer Churn, Thanks to Spark
    Next

    Conviva Reduces Customer Churn, Thanks to Spark

    Conviva is one of the largest streaming video companies on the Internet, with about 4 billion video feeds per month—second only to YouTube. Conviva uses Spark to help it deliver its desired quality of service by alleviating dreaded screen buffering and learning about network conditions in real time. Conviva feeds this information directly into the video player to optimize streams and manage live video traffic—ensuring maximum system play-through. Conviva is able to reduce customer churn by maintaining a consistently smooth viewing experience using Apache Spark.
  • Previous
    MyFitnessPal Counts on Spark for Better Diets
    Next

    MyFitnessPal Counts on Spark for Better Diets

    MyFitnessPal aims to build the largest health and fitness community online by helping people achieve healthier lifestyles through better diet and more exercise. MyFitnessPal uses Spark to clean up user-entered food data using both explicit and implicit user signals with the final goal of identifying high-quality food items. With Spark, MyFitnessPal can comb through food calorie data crowdsourced from its 80 million users. Originally, the company tried to use Hadoop to process the 2.5 terabytes of data in its database, but it took days to churn through the data to identify errors, such as incorrect calorie and nutritional information.
  • Previous
    Spark Speeds Up TripAdvisor's Recommendations
    Next

    Spark Speeds Up TripAdvisor's Recommendations

    The TripAdvisor travel site helps travelers plan and book the perfect trip. TripAdvisor offers advice from millions of travelers, with links to booking tools that check hundreds of Websites to find the best hotel prices. Spark powers the algorithm that makes TripAdvisor recommendations for its customers. The large chunk of the time it takes to read and process the reviews into a usable format is done once at the beginning of the process with Spark.
  • Previous
    Netflix Leans on Spark for Personalization Aid
    Next

    Netflix Leans on Spark for Personalization Aid

    Netflix uses Spark to support real-time stream processing for online recommendations and data monitoring. Its streaming devices periodically send events that capture member activities, which plays a key role in personalization. These events flow to its server-side applications and are routed to Apache Kafka. Netflix's Spark streaming application consumes these events from Kafka and computed metrics.
  • Previous
    Esri Uses Spark to Provide Real-Time Traffic Data
    Next

    Esri Uses Spark to Provide Real-Time Traffic Data

    Esri's mapping software is used by more than 350,000 organizations worldwide, including the 200 largest cities in the United States. Using Spark, Esri created a geo-location heat map that visualizes data intelligence such as the average speed of a taxi ride, where the worst traffic jams occur in NYC, and flow of traffic during workdays and weekends. Esri uses open transportation data to derive actionable intelligence in real time. This kind of analysis can redefine the way urban developers resolve traffic congestion issues or help taxi businesses improve their efficiency.
 

As real-time applications become more mainstream and companies continue to collect massive amounts of data, users have embraced Apache Spark for its ability to do sophisticated analytics at scale. First developed in the AMPLab at UC Berkeley, Apache Spark is a powerful open-source processing engine built around speed, ease of use and sophisticated analytics that's powering millions of real-time applications every single day. Spark lets you quickly write applications in Java, Scala or Python and supports SQL queries, streaming data, machine learning and graph data processing. Developers can use these capabilities stand-alone or combine them to run in a single data pipeline use case. Spark has quickly become the largest open-source community in big data, with more than 750 contributors from 200-plus organizations. In this slide show, eWEEK combed through online archives and the Apache Spark Website and worked with in-memory database company MemSQL to develop a list of companies that are using Spark to support and grow their businesses.

 
 
 
 
 
Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
Rocket Fuel