Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Applications
    • Applications
    • Big Data and Analytics
    • Cloud
    • Innovation

    How AI Data Actually Moves from Collection to Algorithm

    Written by

    Chris Preimesberger
    Published April 4, 2019
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      It seems as if we hear more talk each day about the high potential for artificial intelligence (AI) and the techniques, like machine learning (ML), used to achieve it. As AI grows in prominence, stories of use cases or potential, future use cases will also become more ubiquitous.

      Though excitement about AI and ML is legitimately growing, we hear little about how the data actually goes from collection to algorithm. By examining the process behind building hypothetical machine learning models, we can look at what important processes are often glossed over in articles extolling the virtues of AI.  

      In this eWEEK Data Points article, Kiran Vajapey, human-computer interaction developer at Figure Eight, offers five key insights about this data journey and how it works. Figure Eight Inc. has developed a human-in-the-loop AI software platform that trains, tests and tunes machine learning models for data science and machine learning teams. It supports text, image, audio and video data types.

      Data Point No. 1: Annotation

      If, for example, we take Google image searches of “city streets” and feed those into our autonomous car algorithm, the results it produces probably won’t be actionable. Instead, we’d need to have human annotators use tools to create bounding boxes or label the data before sending it through the model. Humans will need to put boxes around and label every curb, fire hydrant, telephone pole, and human being, among other items, in each photo presented to the model.

      To build an autonomous car model, an organization will likely want to go further than bounding boxes and labeled items in a photo. In this case, organizations can turn to what’s known as semantic segmentation, whereby every single pixel in an image receives a label. When the model’s results are doing something as important as directing a self-driving car, it’s crucial that the AI is as knowledgeable as possible about its surroundings.

      The annotation process is especially crucial for ensuring data quality and accuracy. To do this, you should ensure the tools you use to annotate data apply human intelligence to the process adequately. Even before labeling data, organizations will want to consider their approaches to collecting data in the first place.

      Data Point No. 2: Data Augmentation

      If the perfect data set for your algorithm doesn’t exist, you can typically perform data augmentation to enhance the dataset you do have. Consider a model for a speech-recognition system (such as Alexa or Siri). If you collect crisp sound bites from a recording studio, the algorithm may run into problems in the real world. Because the model is trained to recognize the clean sounds of a sterile environment, it may struggle when presented with voice controls littered with ambient noises or static. Luckily, to make the data more realistic, you can simulate noise in the background of the clean data via augmentation methods.

      Data Point No. 3: Transfer Learning

      If you are attempting to build an ML algorithm for a commercial application, there is a good chance that the exact data set for your use case doesn’t exist. Consider a model to detect cancer in x-ray images. There likely isn’t a lot of publicly available data—x-ray images from cancer patients—for your use case. Transfer learning allows you to leverage existing models. In this instance, you may be able to use an available model that has learned rules about pixel-level edge detection and general image component identification from a previous data set.

      Rather than pre-training your model with millions of images, you can instead remove layers of this existing model until you have an appropriate starting point. Then, you can feed your specific data set into an algorithm that is already trained to identify certain pixels in images. As you work through your specific data set, you can retrain the model to better understand the nuances of x-ray images. In the process of retraining the existing algorithm with your data, you’ll develop a neural net suited to your use case.

      Data Point No. 4: Iterations

      While this may sound counter-intuitive, it’s easy for a team to collect too much data. When training a model, the most sound approach is usually to work iteratively. If you happen to have 1,000 images of x-ray data, use those images first. Once you train the model, you’ll gain a better understanding of whether or not that model works. Let’s say your goal is 85 percent accuracy. If those 1,000 images get you 85 percent accuracy, then you don’t need to collect more. If they only lead to a model that offers 67 percent accuracy, then you will have to invest in finding more images for your data set.

      Even when you do have access to a larger data set to begin with, working iteratively is likely the most efficient option for creating a model. Consider data that needs labels and bounding boxes. You can use existing labeled data to train a model that labels additional pieces of data on its own. As you run labeled data through the model, it will build up your neural net and eventually improve the confidence of your algorithm.

      The model may produce one image that is labeled with 20 percent confidence and another with 80 percent confidence. By giving the images below a certain confidence threshold to a human to assign labels to, you can incorporate human intelligence into the process. This will help acquire a ground truth from humans for the data about which the model is uncertain. Once humans annotate the select data points, you can train the model with the appropriately labeled data.

      Data Point No. 5: Using These Tools Improves Algorithms Without Exploding Costs

      The main data challenge companies run into is that they aren’t sure of the best way to use their data. We worked once with a company that tried to predict stock prices. When trying to predict Apple’s stock price, for example, we gathered all kinds of sentiment data about Apple. Eventually, we learned that we needed to incorporate data points that categorized entities other than Apple for a more accurate prediction. We realized that collecting different types of data yielded a more stable, long-term projection algorithm.

      Companies first must set a goal to understand what it is they’re trying to build with their data. Had we set that goal for ourselves ahead of time, we may have created a more accurate model from the get go. By creating a goal, you will have a frame of reference as you develop strategies and build your AI initiatives.

      The specifics of your data and the given problem you’re trying to solve will change over time. But, if you have a state you’d like to achieve, you can develop your tools and algorithms to reach that specific point. By leaning into these four tools while building your models, it is more likely that your projects end up more efficient, accurate and cost-effective.

      Chris Preimesberger
      Chris Preimesberger
      https://www.eweek.com/author/cpreimesberger/
      Chris J. Preimesberger is Editor Emeritus of eWEEK. In his 16 years and more than 5,000 articles at eWEEK, he distinguished himself in reporting and analysis of the business use of new-gen IT in a variety of sectors, including cloud computing, data center systems, storage, edge systems, security and others. In February 2017 and September 2018, Chris was named among the 250 most influential business journalists in the world (https://richtopia.com/inspirational-people/top-250-business-journalists/) by Richtopia, a UK research firm that used analytics to compile the ranking. He has won several national and regional awards for his work, including a 2011 Folio Award for a profile (https://www.eweek.com/cloud/marc-benioff-trend-seer-and-business-socialist/) of Salesforce founder/CEO Marc Benioff--the only time he has entered the competition. Previously, Chris was a founding editor of both IT Manager's Journal and DevX.com and was managing editor of Software Development magazine. He has been a stringer for the Associated Press since 1983 and resides in Silicon Valley.
      Linkedin Twitter

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.