Close
  • Latest News
  • Cybersecurity
  • Big Data and Analytics
  • Cloud
  • Mobile
  • Networking
  • Storage
  • Applications
  • IT Management
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Menu
Search
  • Latest News
  • Cybersecurity
  • Big Data and Analytics
  • Cloud
  • Mobile
  • Networking
  • Storage
  • Applications
  • IT Management
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Database
    • Database

    LinkedIn Open-Sources Spark-Based Machine Learning Library

    By
    Darryl K. Taft
    -
    June 10, 2016
    Share
    Facebook
    Twitter
    Linkedin
      big data BI

      LinkedIn announced that it has open-sourced its machine learning library for Apache Spark, known as Photon ML.

      Apache Spark is an open-source cluster computing framework used for processing and analyzing big data. The open-source data processing engine was built for speed, ease of use and sophisticated analytics. Spark is designed to perform both batch processing and new workloads like streaming, interactive queries and machine learning.

      In a blog post about the open-sourcing of Photon ML, Paul Ogilvie, engineering manager of the LinkedIn Machine Learning Algorithms team, noted that machine learning is a key component of LinkedIn’s relevance-driven products, and the company uses machine learning to train the ranking algorithms for its feed, advertising, recommender systems—such as People You May Know—email optimization, search engines and more.

      “These algorithms play an important role in determining user experience for content-rich websites, so it’s critical that we provide our engineers with easy-to-use machine learning tools that create high-quality models that are fast and scale to large datasets,” Ogilvie said. “By combining the ability of Spark to quickly process massive datasets with powerful model training and diagnostic utilities, Photon ML allows research engineers to make more informed decisions about the algorithms they choose for the types of recommendation systems listed above.”

      Photon ML provides support for large-scale regression, supporting linear, logistic and poisson regression. Poisson regression is a form of regression analysis used to model count data and contingency tables. Photon ML provides the optional generation of model diagnostics, creating charts and tables that can be helpful in diagnosing the model and its fit to an optimization problem, Ogilvie said. It also includes an experimental implementation of generalized additive mixed effect (GAME) models, which is where LinkedIn hopes to take Photon ML to have a broader impact on the industry—on how people build and apply machine learning technology.

      “Currently, the GAME implementation in Photon ML supports generalized linear mixed effect models (GLMix), a subset of the algorithms we intend to one day support in GAME,” Ogilvie said. A GLMix model consists of a fixed effect component and multiple random effects, he added.

      LinkedIn uses GLMix models to improve job recommendations by using a random effect for members and a random effect for jobs. “To be more precise, the random effect for members includes features from job descriptions, such as extracted skills or job titles,” Ogilvie said. “Modeling the random effect in this way allows us to better learn which jobs a highly-active member is interested in, with coefficients for job features specific to that member.”

      Meanwhile, GAME models enable research engineers to train their algorithms using a more accurate picture of the underlying dataset that better reflects the experience of individual members, Ogilvie said. He noted that LinkedIn hopes that increased use of these techniques in the future will lead to better algorithms for recommendation systems in general.

      “Our own initial A/B tests have showed that GLMix models trained using Photon ML improved job recommendations by 15 to 30 percent in job applications, and improved email article recommendations by 10 to 20 percent,” Ogilvie said. “While these tests are still in their early stages, these results indicate that Photon can significantly improve recommendations for members.”

      Last month, LinkedIn open-sourced its Kafka Monitor, a framework for monitoring and testing Kafka deployments.

      LinkedIn originally developed what is now known as Apache Kafka, a standard messaging system for large-scale, streaming data. LinkedIn open-sourced Kafka in 2011.

      Despite it being a standard message broker, it can pose problems for Kafka operators or site reliability engineers (SREs), such as reported metrics can be unreliable or inaccurate—which can be time-consuming for the SRE to investigate. And it is prone to occasional bugs, which don’t manifest until Kafka has been deployed in a real cluster for days or even weeks.

      That’s why LinkedIn built the Kafka Monitor, a framework for monitoring and testing Kafka deployments in real clusters. It reports critical health metrics and runs validation tests to capture bugs or regressions before they make their way into a deployed cluster.

      In a blog post, Dong Lin, a LinkedIn software engineer, said, “Kafka Monitor makes it easy to develop and execute long-running Kafka-specific system tests in real clusters and to monitor existing Kafka deployments’ SLAs provided by users.”

      Moreover, he said Kafka Monitor is potentially useful to other companies to validate their own client libraries and Kafka clusters.

      “Indeed, Microsoft has an open-source project on GitHub that also monitors availability and end-to-end latency for Kafka clusters,” Lin said.

      Similarly, in this blog post, Netflix describes a monitoring service that sends continuous heartbeat messages and measures the latency of these messages, he noted.

      “Kafka Monitor differentiates itself by focusing on extensibility, modularity and support for custom client libraries and scenarios,” said Lin.

      Avatar
      Darryl K. Taft
      Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.

      MOST POPULAR ARTICLES

      Android

      Samsung Galaxy XCover Pro: Durability for Tough...

      Chris Preimesberger - December 5, 2020 0
      Have you ever dropped your phone, winced and felt the pain as it hit the sidewalk? Either the screen splintered like a windshield being...
      Read more
      Cloud

      Why Data Security Will Face Even Harsher...

      Chris Preimesberger - December 1, 2020 0
      Who would know more about details of the hacking process than an actual former career hacker? And who wants to understand all they can...
      Read more
      Cybersecurity

      How Veritas Is Shining a Light Into...

      eWEEK EDITORS - September 25, 2020 0
      Protecting data has always been one of the most important tasks in all of IT, yet as more companies become data companies at the...
      Read more
      Big Data and Analytics

      How NVIDIA A100 Station Brings Data Center...

      Zeus Kerravala - November 18, 2020 0
      There’s little debate that graphics processor unit manufacturer NVIDIA is the de facto standard when it comes to providing silicon to power machine learning...
      Read more
      Apple

      Why iPhone 12 Pro Makes Sense for...

      Wayne Rash - November 26, 2020 0
      If you’ve been watching the Apple commercials for the past three weeks, you already know what the company thinks will happen if you buy...
      Read more
      eWeek


      Contact Us | About | Sitemap

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Property of TechnologyAdvice.
      Terms of Service | Privacy Notice | Advertise | California - Do Not Sell My Information

      © 2021 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×