Close
  • Latest News
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Menu
Search
  • Latest News
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Big Data and Analytics
    • IT Management

    NVIDIA Unveils TensorRT8 to Accelerate AI Inferencing

    The latest generation of NVIDIA AI software is aimed at improving chatbots, search and recommendations.

    By
    Zeus Kerravala
    -
    July 20, 2021
    Share
    Facebook
    Twitter
    Linkedin

      On July 20, NVIDIA launched TensorRT 8, a software development kit (SDK) designed to help companies build smarter, more interactive language apps from cloud to edge. The latest version of the SDK is available for free to members of NVIDIA’s developer program. Plug-ins, parsers, and samples are also available to developers from the TensorRT GitHub repository.

      TensorRT 8 features the latest innovations in deep learning inference or the process of applying knowledge from a trained neural network model to understand how the data affects the response. TensorRT 8 cuts inference time in half for language queries using two key features:

      • Sparsity is a new performance technique in the NVIDIA Ampere architecture graphics processing units (GPUs), which increases efficiency for developers by diminishing computational operations. Not all parts of a deep learning model are equally important and some can be turned down to zero. Therefore, computations don’t need to be performed on those particular “weights” or parameters within a neural network. Using sparsity within GPUs, NVIDIA is able to turn down nearly half of the weights on certain models for improved performance, throughput, and latency.
      • Quantization allows developers to use trained models to run inference in eight-bit computations (known as INT8), which significantly reduces compute and storage for inference on Tensor Cores. INT8 has grown in popularity for optimizing machine learning frameworks like TensorFlow and NVIDIA’s TensorRT because it reduces memory and computing requirements. By applying this technique, NVIDIA is able to retain accuracy while offering exceedingly high performance in TensorRT 8.

      TensorRT is widely deployed across many industries

      Over the past five years, developers in industries spanning healthcare, automotive, financial services, and retail, have downloaded TensorRT nearly 2.5 million times.

      For example, GE Healthcare is using Tensor RT to power its cardiovascular ultrasound systems. The digital diagnostics solutions provider implemented automated cardiac view detection on its Vivid E95 scanner, accelerated with TensorRT. With an improved view detection algorithm, cardiologists can make more accurate diagnosis and identify diseases in early stages. Other companies using TensorRT include Verizon, Ford, the US Postal Service, American Express and other large brands.

      What NVIDIA also introduced in TensorRT 8 is a flexible set of compiler optimizations that provide twice the performance of TensorRT 7, irrespective of the transformer model a company is using. TensorRT 8 is able to run BERT-Large—a widely used transformer-based model—in 1.2 milliseconds, which means companies can double or triple their model size for greater accuracy.

      There are numerous inference services that are using language models like BERT-Large behind the scenes. However, language-based apps typically don’t understand nuance or emotion, which creates a subpar experience across the board. With TensorRT 8, companies can now deploy an entire workflow within a millisecond. These advancements could enable a new generation of conversational AI apps that offer a smarter, low latency experience to users.

      “This is a huge improvement beyond what we have ever delivered in the past,” said Sharma. “We look forward to seeing how developers are going to use TensorRT 8.”

      Real Time Apps with AI

      Real-time applications that use artificial intelligence (AI) like chatbots are on the rise. But as AI gets smarter and better at delivering new kinds of services, it gets more complicated and more difficult to compute. This creates some challenges for those building AI based services.

      Today’s developers must make hard choices across different parameters when dealing with complex AI models. There could be hundreds of models served in the data center, all running together within just a few milliseconds.

      “This is one of the biggest challenges in deploying AI apps today. How do you maximize or retain the amount of accuracy that you train with and then offer it to your customers with the least amount of latency?” said Siddharth Sharma, NVIDIA’s head of product marketing for AI software, during a news briefing.

      AI has the potential to have the biggest, transformative effect on society since the birth of the Internet. AI success is dependent on the quality of models and the speed of execution. NVIDIA’s GPUS are widely regarded as the best silicon to execute AI processing but its software, such as TensorRT, is equally important in making AI mainstream and usable in everyday life.

      Zeus Kerravala
      https://zkresearch.com/
      Zeus Kerravala is an eWEEK regular contributor and the founder and principal analyst with ZK Research. He spent 10 years at Yankee Group and prior to that held a number of corporate IT positions. Kerravala is considered one of the top 10 IT analysts in the world by Apollo Research, which evaluated 3,960 technology analysts and their individual press coverage metrics.

      MOST POPULAR ARTICLES

      Android

      Samsung Galaxy XCover Pro: Durability for Tough...

      Chris Preimesberger - December 5, 2020 0
      Have you ever dropped your phone, winced and felt the pain as it hit the sidewalk? Either the screen splintered like a windshield being...
      Read more
      Cloud

      Why Data Security Will Face Even Harsher...

      Chris Preimesberger - December 1, 2020 0
      Who would know more about details of the hacking process than an actual former career hacker? And who wants to understand all they can...
      Read more
      Cybersecurity

      How Veritas Is Shining a Light Into...

      eWEEK EDITORS - September 25, 2020 0
      Protecting data has always been one of the most important tasks in all of IT, yet as more companies become data companies at the...
      Read more
      Big Data and Analytics

      How NVIDIA A100 Station Brings Data Center...

      Zeus Kerravala - November 18, 2020 0
      There’s little debate that graphics processor unit manufacturer NVIDIA is the de facto standard when it comes to providing silicon to power machine learning...
      Read more
      Apple

      Why iPhone 12 Pro Makes Sense for...

      Wayne Rash - November 26, 2020 0
      If you’ve been watching the Apple commercials for the past three weeks, you already know what the company thinks will happen if you buy...
      Read more

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2021 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×