How Veritone's Cognitive Platform Analyzes Spoken-Word Content

The next frontier: How do you understand and act upon information that is not written? Veritone has an app for that.

What if somebody did for voice content what Google did for Web search, Salesforce did for CRM and SAS does with analytics—all at the same time?

Veritone is a startup with a rather daunting mission: It wants to record, store and analyze all the spoken words in all media. That's right. This means everything spoken in television, radio, online video, podcasts—you name it.

How can this be possible? Actually, while staying way under the radar until now, the young company has been developing this functionality for two years and already has customers using it.

Veritone Media, the advertising arm of the company, is doing the first commercial business for Veritone. Veritone Media is all about producing exact quantifications on the very inexact science of how media affects viewers/listeners/buyers and convinces them to do or buy something.

Monitoring Radio, Television, YouTube—You Name It

Veritone Media describes itself this way: It is an IT company that delivers creative campaign messaging with effectiveness for advertisers, broadcasters and publishers. It uses Veritone's patented cognitive engine to monitor broadcast and streaming media—including radio, television, podcasts and YouTube—and transcribes the audio into a real-time searchable, digital format.

Veritone's frontline product, Cognitive Media Platform, launched Sept. 1, is a cloud-connected ecosystem of cognitive tools that ingest, store and analyze spoken communications on television, YouTube, radio, podcasts. This also can monitor and record private spoken-word communication such as meetings, speeches, presentations, voicemail and telephone calls, but there are clear restrictions on how those can be used. We'll explain in a minute.

At this opening stage, Veritone Media is using its intellectual property to manage, deliver, verify and quantify advertising campaigns. But it quickly becomes apparent that this Veritone ecosystem of software, tools and cloud services can be used in many more use cases, and believe it: These guys are busy in future functionality development.

Veritone has found that equipping machines with the power to understand and respond to the natural human interface of audio and video content is the next evolutionary step in maximizing the value and potential of media. The result is its scalable platform that provides the ability to derive actionable intelligence, in near-real time, from the world's unstructured media.

And there is no data that is more unstructured than the spoken word.

"The spoken word has been around as long as man himself," Veritone CEO Chad Steelberg, a former Google executive, co-founder of AdForce, and co-founder with his brother, Ryan, of the Newport Beach, Calif., startup, told eWEEK. "Five hundred sixty-five years ago, Gutenberg and other individuals tried to capture the spoken word. Seventeen-plus years ago, we had companies like Salesforce and Google come out and try to organize all this information.

"The next frontier, from my perspective, is: How do you understand all that information and act upon it?" The other question: How do you act upon information that is not written?

New Cognitive Media Platform That Uses Multiple Factoring

The answer to both: By building, testing and deploying a new-generation cognitive media platform that uses multiple factoring and is easy enough for a line-of-business person to use.

"The answer is through the No. 1 cognitive media platform. Media is television, online video, radio, audio—all the public media sources that we consume. But it also extends now into the private sector," Steelberg said.

The private sector consists of telephone calls, email, text messages and others, as sanctioned only by the client, Steelberg said. Veritone can monitor phone calls/voicemail if a client selects this type of media to be processed; however, all content remains the property of the client. The information is secure and only available for access by that client.

Recording the audio from hundreds of television and radio stations on a 24/7 basis is a fearfully daunting task, but scraping the audio from YouTube and from myriad podcasts from the Internet seems impossible. It all has to be stored in clouds and then transcribed. Then it has to be made available for analytics on demand. Then it has to be distributed. How is this all even remotely possible?

"A lot of servers," Steelberg said, with a smile and shake of his head. "A lot of servers."

As Veritone grows, it's going to need more engineers, too; as of September 2015, it has only about 30.

Taking the Cognitive Platform Into Uncharted Waters

CMP is currently monitoring all the major radio and television news networks, ingesting all this voice content, storing it, ripping it apart, and running it through all the appropriate engines, Steelberg said. The company will process about 10 million hours of voice data this year. "No one's done that before," he said.

"What if you wanted to take media into an infinite set of cognitive processes—where it's not just one transcription engine, but literally hundreds of transcription engines, pulling apart and piecing the different aspects of the conversation to get higher and higher accuracy levels," Steelberg said. "Just think of all the data you can obtain."

Veritone's cognitive engines currently include transcription, transcoding, image processing, image detection, facial recognition, sentiment extraction and others.

For example, Steelberg said, one of the engines could be facial recognition in coordination with a conversation to determine if a person is truthful or lying. Forget old-school lie-detector tests. Using this cognitive data platform, the user could run a profile of an interviewee on television, add sentiment extraction, turn on a facial characteristics engine and run Nielsen reports (as one of the transcription engines) to tell you what the size of the audience is at that time. Veritone Media then layers all these sets of information, time-correlated against the original media file, to give the user a full, 360-degree report on the TV interview.

Too Much Information? Maybe, but Maybe Not

TMI? Entirely possible, but then again, too much information in some use cases isn't possible.

"So if someone is interviewing Chad, and he smiles at the word Veritone, there's an audience of 12,000—we'll find you all of the areas where that occurred," Steelberg said, with a smile. Veritone can also tell you whether Chad was really smiling or merely faking it. (A key clue: He probably isn't faking it.)

Veritone then layers all that data inside a temporal database for search and discovery, then wires it to what is essentially an open-ended action platform, or CMS [content management system], Steelberg said. "In the case of [Veritone] Media, we deliver ads. In the case of other platforms, we use Web hosts to talk to CRM apps; in education, we notify students in the case of the education platform, and so on."

Veritone is simply scratching the surface of the abilities of this cognitive media platform, as far as use cases are concerned.

"While we do media [advertising] workloads every day, what we're now doing is building new divisions to tackle other verticals," Chad Steelberg said. "These include the enterprise space, non-profits, financial sector, and so on. When you look at the transcription market, there are about 45 different services out there, specializing in different languages, training sets, etc. And they all have similar software.

"But the actual accuracy rate in 2014 was [only] 63 percent. In 2015, it's 68 percent. Seven out of 10 words are good, but the three that aren't always seem to be the most important. Take Ben Rothlisberger [Pittsburgh Steelers quarterback]; his name comes out "the hen rots in his burger."

Conventionally, these cognitive processes all have been done in silos, Steelberg said. This is where Veritone really diverts from convention.

Looking at the Cognitive Analytics Market

IBM, Google, Microsoft and Nuance (Dragon transcription software) are the largest, most well-known players here. Each of them develops their own software and holds it close to the vest. Google and Microsoft will provide APIs for transcription, translation (Google only), image processing, transcoding and a few others. "Phenomenal sets of APIs, but all being developed in silos," Steelberg said.

IBM has a plethora of cognitive tools at work in Watson. "They're the closest thing to a CMS, but their engines are all closed," Steelberg said. "They developed them, and they're their own proprietary services."

Veritone puts all of its content into a primary CMS and opens up the entire set of cognitive engines, so that anybody in the world can develop an engine, Steelberg said. It's the open-source concept, only in a realm that hasn't been touched by it previously.

Veritone uses Restful APIs to provide a means for the development community to embed this newfound intelligence into their applications and to develop and monetize their own cognitive engines. Veritone will make money from licensing its IP.

"For example, I could be a developer in Kiev interested in facial characteristics: smiling, frowning, looking to the left, etc. Three guys in Kiev could publish that API into our cloud, and any of the millions of hours of audio and video we're processing—our clients can route that video through that (Kiev) engine on the fly, without changing our core architecture of the CMP, and have all that information now accessible and searchable," Steelberg said.

The Kiev guys can set their own rate, charging, say 23 cents per hour, for others to use their engine. The engines can be open source or completely proprietary. Just like Apple (in iTunes), Veritone takes 33 percent of the dollars—and there's a big part of the company's business model.

Building an Economy Around Cognition

"Suddenly, we have the ability to create an economy around cognition," Steelberg said. "The more engines—the more cognitive experts in various topics—that we can plug into the media platform, so that we no longer have to be the cognitive engine, then we can be the conductor."

If every cognitive engine were a musician, Veritone is the conductor, the media is the score, and "all we're doing is picking apart which instruments will be operated against the score," Steelberg said. "Any person in the world can participate in that economy."

Veritone's clients now range from political campaigns, to law enforcement, to law schools, from national radio broadcasters to single YouTube channels, from global ad agencies to the neighbor next door, Steelberg said. Through the automated analysis of audio and video media, the CMP creates new revenue opportunities and new research avenues.

Here's yet another example: "You put the microphone in front of a candidate for office at a stump speech in a local market. While he's still on the stage making a gaffe, you immediately have people detecting it, identifying what he/she talked about, and magnifying that gaffe into social media channels before he even gets off the stage," Steelberg said. "You can compare every piece of media (in seconds) to find out immediately if the candidate is flip-flopping on certain topics."

Putting the Power Into Everybody's Hands

Veritone has the capability to put this kind of power into the hands of everyone, he said.

One final IT Science use case: "Financials are one of my favorites. Imagine the capability of taking all the corporate conference calls, all the analyst calls, and being able to run those through a CMP, but then correlate the stock market and other information into the CMP. Then you could ask a question like: The Ukraine: 23 percent of oil and gas CEOs talked about Ukraine with a negative 6 sentiment in their last corporate con call, and the stock performance for those companies over the last six months has been X. Versus telecom, in which only 2 percent mentioned Ukraine with neutral sentiment, no impact.

"So if you're a financial trader, this becomes the gateway to processing human speech, our primary form of communication," Steelberg said.

In summary: "Humans have finite capacity to store and process information. We have become the limiting factor. For the past 17-plus years, companies have solved the storage problem by organizing and making accessible the world's information. The time is now for a platform that can understand media, thus impacting human interaction more than any other technology," Steelberg said.

Veritone may indeed be onto something big. Stay tuned: We'll be following their progress here at eWEEK.

Go here for more information.

Chris Preimesberger

Chris J. Preimesberger

Chris J. Preimesberger is Editor-in-Chief of eWEEK and responsible for all the publication's coverage. In his 13 years and more than 4,000 articles at eWEEK, he has distinguished himself in reporting...