eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.
1How Apache Spark Helped Eight Companies Grow Their Businesses
2Spark Helps Shopify Make Wise Store Selections
Shopify needed to understand what types of products its customers were selling in order to select eligible stores for a business partnership. However, its data warehouse kept timing out when running data mining queries. Using Spark, Shopify was able to use the power of distributed computing to mine through millions of records of data and process 67 million records in minutes. The company was able to successfully categorize stores based on their products and get the list of partnership-eligible stores.
3Spark Gives OpenTable a Tenfold Speed Boost
OpenTable has more than 32,000 restaurants in its system worldwide, and each month it seats more than 16 million diners. It uses Spark both for training its recommenders and for the natural language processing of the reviews to generate topic models. Spark gave OpenTable a 10x speed improvement, reducing the algorithm run time from weeks to mere hours and allowing for dramatically higher team productivity.
4Spark Helps Pinterest Identify Trends
Pinterest uses Spark to find patterns in high-value user engagement data. Using Spark, Pinterest is able to identify—and react to—developing trends as they happen. In turn, Pinterest and their partners cab get a better understanding of user behavior and provide more value to the Pinterest community.
5Conviva Reduces Customer Churn, Thanks to Spark
Conviva is one of the largest streaming video companies on the Internet, with about 4 billion video feeds per month—second only to YouTube. Conviva uses Spark to help it deliver its desired quality of service by alleviating dreaded screen buffering and learning about network conditions in real time. Conviva feeds this information directly into the video player to optimize streams and manage live video traffic—ensuring maximum system play-through. Conviva is able to reduce customer churn by maintaining a consistently smooth viewing experience using Apache Spark.
6MyFitnessPal Counts on Spark for Better Diets
MyFitnessPal aims to build the largest health and fitness community online by helping people achieve healthier lifestyles through better diet and more exercise. MyFitnessPal uses Spark to clean up user-entered food data using both explicit and implicit user signals with the final goal of identifying high-quality food items. With Spark, MyFitnessPal can comb through food calorie data crowdsourced from its 80 million users. Originally, the company tried to use Hadoop to process the 2.5 terabytes of data in its database, but it took days to churn through the data to identify errors, such as incorrect calorie and nutritional information.
7Spark Speeds Up TripAdvisor’s Recommendations
The TripAdvisor travel site helps travelers plan and book the perfect trip. TripAdvisor offers advice from millions of travelers, with links to booking tools that check hundreds of Websites to find the best hotel prices. Spark powers the algorithm that makes TripAdvisor recommendations for its customers. The large chunk of the time it takes to read and process the reviews into a usable format is done once at the beginning of the process with Spark.
8Netflix Leans on Spark for Personalization Aid
Netflix uses Spark to support real-time stream processing for online recommendations and data monitoring. Its streaming devices periodically send events that capture member activities, which plays a key role in personalization. These events flow to its server-side applications and are routed to Apache Kafka. Netflix’s Spark streaming application consumes these events from Kafka and computed metrics.
9Esri Uses Spark to Provide Real-Time Traffic Data
Esri’s mapping software is used by more than 350,000 organizations worldwide, including the 200 largest cities in the United States. Using Spark, Esri created a geo-location heat map that visualizes data intelligence such as the average speed of a taxi ride, where the worst traffic jams occur in NYC, and flow of traffic during workdays and weekends. Esri uses open transportation data to derive actionable intelligence in real time. This kind of analysis can redefine the way urban developers resolve traffic congestion issues or help taxi businesses improve their efficiency.