Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Database
    • Database

    LinkedIn Open-Sources Spark-Based Machine Learning Library

    Written by

    Darryl K. Taft
    Published June 10, 2016
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      LinkedIn announced that it has open-sourced its machine learning library for Apache Spark, known as Photon ML.

      Apache Spark is an open-source cluster computing framework used for processing and analyzing big data. The open-source data processing engine was built for speed, ease of use and sophisticated analytics. Spark is designed to perform both batch processing and new workloads like streaming, interactive queries and machine learning.

      In a blog post about the open-sourcing of Photon ML, Paul Ogilvie, engineering manager of the LinkedIn Machine Learning Algorithms team, noted that machine learning is a key component of LinkedIn’s relevance-driven products, and the company uses machine learning to train the ranking algorithms for its feed, advertising, recommender systems—such as People You May Know—email optimization, search engines and more.

      “These algorithms play an important role in determining user experience for content-rich websites, so it’s critical that we provide our engineers with easy-to-use machine learning tools that create high-quality models that are fast and scale to large datasets,” Ogilvie said. “By combining the ability of Spark to quickly process massive datasets with powerful model training and diagnostic utilities, Photon ML allows research engineers to make more informed decisions about the algorithms they choose for the types of recommendation systems listed above.”

      Photon ML provides support for large-scale regression, supporting linear, logistic and poisson regression. Poisson regression is a form of regression analysis used to model count data and contingency tables. Photon ML provides the optional generation of model diagnostics, creating charts and tables that can be helpful in diagnosing the model and its fit to an optimization problem, Ogilvie said. It also includes an experimental implementation of generalized additive mixed effect (GAME) models, which is where LinkedIn hopes to take Photon ML to have a broader impact on the industry—on how people build and apply machine learning technology.

      “Currently, the GAME implementation in Photon ML supports generalized linear mixed effect models (GLMix), a subset of the algorithms we intend to one day support in GAME,” Ogilvie said. A GLMix model consists of a fixed effect component and multiple random effects, he added.

      LinkedIn uses GLMix models to improve job recommendations by using a random effect for members and a random effect for jobs. “To be more precise, the random effect for members includes features from job descriptions, such as extracted skills or job titles,” Ogilvie said. “Modeling the random effect in this way allows us to better learn which jobs a highly-active member is interested in, with coefficients for job features specific to that member.”

      Meanwhile, GAME models enable research engineers to train their algorithms using a more accurate picture of the underlying dataset that better reflects the experience of individual members, Ogilvie said. He noted that LinkedIn hopes that increased use of these techniques in the future will lead to better algorithms for recommendation systems in general.

      “Our own initial A/B tests have showed that GLMix models trained using Photon ML improved job recommendations by 15 to 30 percent in job applications, and improved email article recommendations by 10 to 20 percent,” Ogilvie said. “While these tests are still in their early stages, these results indicate that Photon can significantly improve recommendations for members.”

      Last month, LinkedIn open-sourced its Kafka Monitor, a framework for monitoring and testing Kafka deployments.

      LinkedIn originally developed what is now known as Apache Kafka, a standard messaging system for large-scale, streaming data. LinkedIn open-sourced Kafka in 2011.

      Despite it being a standard message broker, it can pose problems for Kafka operators or site reliability engineers (SREs), such as reported metrics can be unreliable or inaccurate—which can be time-consuming for the SRE to investigate. And it is prone to occasional bugs, which don’t manifest until Kafka has been deployed in a real cluster for days or even weeks.

      That’s why LinkedIn built the Kafka Monitor, a framework for monitoring and testing Kafka deployments in real clusters. It reports critical health metrics and runs validation tests to capture bugs or regressions before they make their way into a deployed cluster.

      In a blog post, Dong Lin, a LinkedIn software engineer, said, “Kafka Monitor makes it easy to develop and execute long-running Kafka-specific system tests in real clusters and to monitor existing Kafka deployments’ SLAs provided by users.”

      Moreover, he said Kafka Monitor is potentially useful to other companies to validate their own client libraries and Kafka clusters.

      “Indeed, Microsoft has an open-source project on GitHub that also monitors availability and end-to-end latency for Kafka clusters,” Lin said.

      Similarly, in this blog post, Netflix describes a monitoring service that sends continuous heartbeat messages and measures the latency of these messages, he noted.

      “Kafka Monitor differentiates itself by focusing on extensibility, modularity and support for custom client libraries and scenarios,” said Lin.

      Darryl K. Taft
      Darryl K. Taft
      Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×