Enterprise Computer Vision: Pilots to Production at Scale

Episode summary

Enterprise computer vision is entering a new phase as organizations use visual data from cameras, sensors, machines, vehicles, facilities, and connected environments to support faster decisions. In this eSpeaks episode, Corey Noles speaks with Jason Nassar, Vertical Enablement Leader of the Automation Platform at Dell Technologies, about how computer vision is changing and what enterprises need to move from pilots to production.

Nassar explains that vision language models and agentic AI are changing what computer vision systems can do. Instead of only detecting predefined objects, newer systems can use embeddings and models to answer questions about images, video, receipts, people, conditions, and operations. The conversation also covers real-time inference, local compute, Dell AI Factory with NVIDIA, Dell Automation Platform, Dell Distributed Private Cloud, zero trust security, and the IT/OT collaboration needed to scale visual AI across enterprise environments.

Key takeaways

Vision language models are changing computer vision by making visual data easier to search, interpret, and connect to business workflows.
Manufacturing, retail, energy, large venues, and industrial operations are using computer vision for quality control, safety, process optimization, and customer experience.
Real-time computer vision often requires compute close to the operation, especially when latency, security, or air-gapped environments matter.
Dell AI Factory with NVIDIA can help enterprises choose the right infrastructure for production computer vision workloads.
Dell Automation Platform and Dell Distributed Private Cloud can support zero-touch deployment and orchestration across distributed edge locations.
Successful computer vision programs require IT, OT, security, compliance, and business teams to work together.

Enterprise computer vision use cases and infrastructure needs

Use case area	What it supports	Infrastructure need
Manufacturing quality control	Detects product defects and helps improve production quality	Local compute, accelerators or GPUs, model tuning, and integration with production systems
Process optimization	Helps analyze assembly-line activity, ergonomics, and operational flow	Edge infrastructure, real-time inference, and integration with industrial control environments
Retail experience	Supports cashierless checkout, product interaction tracking, and personalized offers	Computer vision software, store-level compute, data integration, and customer privacy controls
Venue and facility safety	Detects risks such as weapons or unsafe activity in large environments	Low-latency inference, secure video processing, governance, and alerting workflows
Energy and industrial operations	Supports visual AI in remote or air-gapped environments	Local AI factory infrastructure, zero trust security, and management across distributed sites
Enterprise edge deployment	Scales computer vision from one pilot site to many global locations	Dell Automation Platform, Dell Distributed Private Cloud, zero-touch provisioning, and centralized orchestration

FAQs

What is enterprise computer vision?

Enterprise computer vision uses AI to interpret images, video, and visual sensor data so organizations can make faster operational decisions. In this episode, Jason Nassar explains that newer vision language models allow enterprises to ask more complex questions about visual data, such as what happened in a scene, what an object was, or what action occurred.

What infrastructure is needed for real-time computer vision?

Real-time computer vision often needs compute close to the operation. When use cases involve manufacturing, safety, quality control, or operational response, delayed insight may not be useful. Nassar explains that enterprises may need servers with accelerators or GPUs near the workload, especially when real-time inference or secure industrial environments are involved.

Where should enterprises deploy computer vision workloads?

Computer vision workloads should run where latency, security, and performance needs are best met. Real-time use cases often belong at the edge or near the operation. Less time-sensitive workloads may run in a data center or cloud environment. The episode explains that Dell and NVIDIA help organizations evaluate the use case, expected return, and infrastructure requirements before choosing where the workload should run.

How does Dell AI Factory with NVIDIA support computer vision?

Dell AI Factory with NVIDIA can support computer vision by helping organizations match the right compute, software, and deployment model to the use case. In the episode, Nassar explains that Dell and NVIDIA provide AI factory architectures and products that help customers understand what infrastructure they need to achieve value quickly.

How can enterprises scale computer vision from pilot to production?

Scaling computer vision requires more than a working pilot. Enterprises need model training that adapts to real-world conditions, local compute for time-sensitive use cases, orchestration across distributed locations, and security controls from the start. Nassar points to Dell Automation Platform and Dell Distributed Private Cloud as technologies that can help deploy operating systems, software, integrations, and data connections with zero-touch provisioning across many edge locations.

How should enterprises secure computer vision deployments?

Enterprises should build security into computer vision systems from the start. In the episode, Nassar recommends zero trust security so only authorized administrators can change endpoint software, operating systems, or integrations. He also explains that Dell Distributed Private Cloud can support secure edge deployments for environments such as manufacturing and energy.

The transcript below has been lightly edited for clarity.

Corey Noles: Hello and welcome to eSpeaks from eWeek. I'm your host, Corey Noles, and today we're talking about the next phase of enterprise computer vision: What's changed, what's still hard, and what it really takes to move from promising pilots to production systems that operate at scale. Computer vision has been a part of the enterprise conversation for years, but the landscape is shifting quickly. Organizations now have more visual data than ever before from cameras to sensors, devices, machines, facilities, vehicles, and even connected environments. At the same time, advances in AI infrastructure, edge computing, accelerated computing, and even model deployment are making it possible to turn that visual data into faster insights and more responsive operations. To help us unpack where the technology is headed, I'm joined today by Jason Nassar, vertical enablement leader of the automation platform at Dell Technologies. We're going to discuss the business problems computer vision is solving today, the infrastructure and data challenges that come with scaling it, and what enterprise leaders should be thinking about as visual AI becomes a more important part of their operation. Jason, thanks so much for joining us. Great to have you.

Jason Nassar: Thank you for having me. Appreciate it, Corey.

Corey Noles: Excellent. I guess to get started here, computer vision has been around for a little while now, but the enterprise conversation feels different today around ... from your perspective, what's changed the most here in the last few years?

Jason Nassar: I mean most of the changes actually happened within the last year and the biggest one is really architectural changes. I mean we utilized a lot of machine learning and it's not like that's not there still, but we're moving more over to an agentic feel. We actually are using VLMs, which are vision language models now, and that's completely changing the game. So it it what it does is it is we're changing objects, shapes, colors into numerical values. And these values are called embeddings. And that has allowed, you know, enterprises to actually accelerate what used to take a very long time. You can actually ask questions of what's going on of the images. That's a huge change, right? We're all familiar with what chatbots have done, but the Segentic AI allows you to say, hey, what was this person wearing? What was this person doing? Can I read a receipt? What was on the receipt?

Corey Noles: What is the weather?

Jason Nassar: What is the weather — yeah. And this is really just changing the game when it comes to computer vision compared to the past. And, you know, we have some new models that have been coming out. Some of these VLMs are, you know, Gemini has their Gemini Pro, Clip from OpenAI, Quen from Alibaba, and there are a few others.

Corey Noles: Nice. So when enterprises come to Dell asking about computer vision, what what business problems are they usually looking to solve first?

Jason Nassar: Well it it's usually very specific to vertical. So when I say vertical, I'm talking about manufacturing energy retail. And so that's the focus that we take from a use case. So for manufacturing, very specifically, they're they're they're always trying to improve product quality and improve their processes. So they'll use computer vision to find out whether or not there are defects of a product that's coming through the line. They will also use computer vision to actually optimize process, whether it's process from an ergonomic perspective.

Perspective where people are actually involved, or if it's something that's on an assembly line that's utilizing programmable logic controllers. large venues, they're they're actually using computer vision for things like safety to, you know, detect whether or not there are weapons when it comes to like stadiums, or even at stadiums, and this is kind of like a cross-retail use case, you know, cashierless checkout.

No, where you kind of a lot of us have seen this before, right? Where you kinda scan your credit card when you get into a line and then you just grab what you want and then you walk out and the computer vision is actually looking at it and it knows what you took and and and that's what's charged to your card.

Corey Noles: I saw that at the airport just this week where you walk up and you just sit your stuff down and and it gets it. Or where you just walk out the door now.

Jason Nassar: Yeah, it's kind of amazing. I mean, there are even use cases in retail stores where, you know, let's say somebody's perusing and they pick up a product, let's say a shampoo bottle, but they look at it and I'm not really interested and they put it back. Well, the computer vision can track that person all the way over to the cashier and then after they're done purchasing their items print out a coupon.

So I mean it's remarkable. And then you're targeting your audience, you're improving your sales, of course, increasing revenue. So it it's kind of amazing where we are at this point.

Corey Noles: Amazing. I know a lot of organizations have like cameras, images, video feeds, sensor data, but that doesn't automatically just translate into business value. What is what does it take to turn that visual data into an actionable inside a company can use?

Jason Nassar: Well, I mean, first you you need the compute local to the operation. So it just depends on how intense your workload is going to be, how mu whether or not you're processing video, whether or not you need real time data, or if it's something that's gonna take a little bit longer. And so, you know, if you need something in a highly secured environment, such as manufacturing, you're you're probably gonna want servers with accelerators or GPUs very local to that operation so that it can actually handle that workload, the training and the

Inferencing. But if you don't need that kind of a speed, you you can actually send things to the data center or or the cloud. So so there's that aspect of it. And then you also need highly competent software. So you need software that can actually handle the use cases. And many of these software providers, these ISVs, they have software that sometimes has hundreds of use cases baked in.

So that can make the job a little bit easier and a little bit faster, but you need to make sure that you're picking the right software for your use case as well as the correct compute. Something minor, let's say like detecting you know products like objects, like fruits and vegetables, and saying what the price is, you actually can do that on a x86 processor. You might not need a GPU. However, if you're trying to do facial recognition and understand exactly what's going on and maybe even the emotion.

Motions of a person based on their body language, now you're talking about needing, you know, GPUs and accelerators local to that operation.

Corey Noles: OK, that makes sense. I can I can see where that difference would lie. So a lot of AI computer vision projects work well in a pilot, but often struggle when they move into real world operations. What separates a successful pilot from a scalable enterprise deployment?

Jason Nassar: Yeah, so I mean ultimately the biggest aspect of this is making sure that you're training the models correct.

So in the past, a lot of these pilots really didn't launch at scale because there wasn't a means to actually teach these models as fast as you can teach them right now. So I I gave the example of having hundreds of use cases. So let's say you get a software package and it has, you know, loss prevention use case, it's really good. But no matter what, every single site is a snowflake. So in the past, what that would require is an engineer.

Or somebody that is in IT to actually go through the video and train the model very specific to their use case, to their retail store, to the lighting in that location so that it would work correctly. And this is very time consuming. And then when you deploy it to site two, three, and four, those lighting conditions can be different, for example. So this is causing a lot of problems. What I'm seeing right now, which is remarkable, is the ability to train fast because instead of

Person being in the loop, you actually can use these AI agents to do all of that work. So something that would take weeks or maybe even months from a pilot perspective can be done in almost hours because you basically just need to tell the agent, you know, upload the video files, the examples of the exact location, and you need to say, hey, listen, I need to train on use case A, B, or C, and it's going to do this quicker than ever before, and your model is going to be tuned better than a human really can do it. So that's kind of the earthshattering breakthrough that we've had that will enable companies to scale.

Corey Noles: OK, that makes sense. I I hadn't considered w what could be involved in training and your point about how lighting can something as simple as lighting can be different in every building and even parts of buildings makes makes a lot of sense and I get to see the challenge there. So with that, as enterprises move computer vision from pilots into production, the infrastructure demands can get complicated, as you mentioned a moment ago, especially around data performance, what kind of inference speed you're wanting, how how do Dell and NVIDIA help organizations think through those infrastructure choices that matter the most to them?

Jason Nassar: Yeah, so first off, I we have a lot of subject matter experts such as myself and and and engineers and and and Dell and NVIDIA can provide basically consulting services. So we can actually look at your operation, specifically what your use case is, what you're trying to accomplish, and make sure that you're gonna receive a return on investment. but just a general rule of thumb, i if you need real-time processing, you're going to need your compute local to the operation. So that means you're going to need to be

Under the firewall, not necessarily in the data center of the cloud, but really local to what's going on. And this is super important for manufacturing vertical where everything's kind of happening in real time, and maybe you want to make corrective decisions and you want to send information back to these controllers based on what the computer vision is seeing. However, if you if you're having very large compute operations and real time isn't absolutely necessary, let's say you're just like counting people, you're trying to understand airport operations, to optimize the flow of everyone. That can be done in your data center. And the best part about it is Dell and NVIDIA, you know, we we have this AI factory architecture, these AI factory products that are very specific to it. We can help our customers and tell them very specifically what they need in order to receive a return on investment very quickly.

Corey Noles: That's awesome. So for a lot of computer use cases, timing matters. much like you mentioned in with regard to the guy with the shampoo bottle just a few minutes ago, a a delayed insight might not be useful, for example, if you're dealing with manufacturing, safety, quality control, production issues, even operational response. How are enterprises approaching real-time or near real-time vision AI?

And what role does accelerated computing play in making that scale possible?

Jason Nassar: Yeah, I mean you need to have local to the operations the compute. That's what's what's key when you're dealing with real-time data, but you also have to have a way to manage and orchestrate that, right? And what we're finding from a scale perspective at the edge is a difficulty, usually because of the OT persona, right? The operation technology persona. A difficulty in understanding how to scale IT devices local to what's actually happening with computer vision. So, you know, Dell we we we have two platforms that really enable this to move quickly. It's the Dell automation platform as well as the Dell Distributed Private Cloud. Both of these technologies allow you to deploy operating system software solutions with zero touch provisioning. Okay. So when you have an edge operation that requires that real time.

And you want to deploy that solution to hundreds, if not thousands, of global locations so that you can actually realize that use case. This is this is really the advancement that we've brought to our customers. And it brings you an experience very similar to these streaming TV devices that we have, like the Google Chromecasts, the Amazon Fire Sticks, where you place the compute local to the operation where you need that real.

Real-time insight, you plug it in, and all of the software operating systems, interconnections, data integrations, that's all deployed automatically at scale, hundreds of devices. So what happens is is you take an operation that would have normally taken months to accomplish and you've reduced it down to days, sometimes even hours, depending on what's going on.

Corey Noles: Wow. That's wild. Yeah, I guess in thinking about latency with regard to real time, you absolutely don't want to wait for your video files to go to AWS East and come back or something from after they've been analyzed. You'd you'd never be able to accomplish real time that way, I don't.

Jason Nassar: No, you wouldn't be able to accomplish real time and and and you wouldn't be able to scale it. You know, a lot of these companies are are working in air gapped environments. If you're in the energy industry and you're trying to deploy software for computer vision or in manufacturing, you follow this architecture called the Purdue model and and essentially you have this firewall that doesn't even allow you to connect to the data center. And so, yeah, you n you need the that compute you need these these accelerator intense AI factory-based devices local to that operation in order to execute it.

Corey Noles: Makes sense. Makes sense. So computer vision raises a lot of important questions around things like privacy, security, bias, governance. What what should enterprises be doing to build trust into these systems from the beginning?

Jason Nassar: Well, I think zero trust security needs to be taken into account right from the beginning. So you need to have managing management and orchestration in a way to where

You don't have the risk of somebody coming up with a USB drive, trying to make changes to that endpoint, and then ultimately integrating a cybersecurity risk. I mean, that's going to cost companies millions, if not tens of millions, of dollars if that happens just one time. so what is zero trust? It just basically means that the only people that can actually change what's on that device, that software, that that computer vision software.

Where that operating system, those integrations, are administrators. And what Dell has done is we provide this Dell distributed private cloud where our endpoint devices come right out of the factory with zero trust security. So nobody's going to be able to tamper with them. And this adheres very specifically to these air-gapped environments, like I was telling you, to follow that follow the Purdue model. It also adheres to, you know, NURC and FERC requirements, which are super critical in the energy industry. And so, you know, for the very first time, straight from the Dell factory, you have devices that are highly capable of intense artificial intelligence compute that also have zero trust security in mind, rate rate rate from our factory to you to the to the location where it's the use case is going to be realized.

Corey Noles: OK, OK. So beyond the technology itself, when you want to do this, what teams need to be involved for computer vision to succeed at that enterprise scale?

Jason Nassar: OK, so one difficult challenge with OT environments is the IT and the OT conflict. They they don't generally like to work with each other, but the truth is, now with AI in mind, that absolutely has to happen because companies are that are applying artificial intelligence, agentic AI, computer vision, if they're doing it the right way, they are going to succeed and they're gonna move far faster than companies that are not doing that. And most companies realize that right now. So you need your IT folks and your OT folks, they have to work together.

You're also going to need your security and compliance team as well as your line of business teams. And so, you know, the technologies that we provide, you know, with the AI factory, they kind of enable this. It's all designed so that the IT folks and OT folks can get together and develop the recipe, which is these these blueprints, like I said before: your software, your operating system, your integrations, and develop these up front so that those end devices get have everything loaded to them with.

Zero touch provisioning. And that's super important, but you need this all the way from the CIO level down to the factory worker that's a controls engineer.

Corey Noles: OK. So let's one last question before we go. Let's let's kinda gaze into the crystal ball a little here. Looking looking forward, where do you think enterprise computer vision is headed over let's say the next few years?

Jason Nassar: So right now we are moving leaps and bounds and what computer vision is was was even six months ago to what it is now is changing very drastically. We all software is accelerating at a speed that has never happened before because

While humans are in the loop, it is the AI that's actually creating the software. And so that changes everything. On top of that, we're adding agentic AI integrated directly in with the VLMs, directly integrated with the machine learning. So we're accelerating from that perspective. You can ask questions now of what's going on. But I think if you want to think really way further ahead,

We're gonna get to a point where humans are are not really gonna need to be into the loop. The the computer vision is going to correct itself. It's going to recognize where the software needs to be fixed. And maybe there's just a human that says, yes, I agree with it, or maybe not at all. It's going to get to that point as we get closer to AGI.

Corey Noles: And it's coming. We're gonna get closer and closer. Same, same. Especially five years ago.

Jason Nassar: It is coming. Faster faster than I would have expected even a year ago. I mean it's unbelievable. Yes.

Corey Noles: Well Jason, thank you so much for joining us today. Where can viewers go to learn more about what you're doing, computer vision and all of the things that Dell and Nvidia have to offer here?

Jason Nassar: Yeah, so definitely go to Dell.com and look up the Dell AI factory with NVIDIA so that you can understand a little bit more. Also look up DDPC, Dell Distributed Private Cloud, as well as the Dell Automation Platform. and then, you know, reach out to me. You can find me on LinkedIn, Jason Nassar, and you know, I can help you guys out with whatever needs that you have, a specialized consulting experience directly from Dell so that we can size you upright.

Corey Noles: Excellent. So enterprise computer vision is evolving, and so is what organizations need to consider as they move from experimentation to deployment at scale. Is your organization ready? For more enterprise technology news, analysis, buyers guides, and expert interviews, visit eWeek.com. You can also connect with eWeek on LinkedIn, Facebook, and X. Thank you so much for listening to eSpeaks. We'll see you next time.

Jason Nassar: Thank you.

Enterprise Computer Vision at Scale: Dell’s Jason Nassar on Edge AI, Real-Time Inference, and AI Factory Infrastructure

Episode summary

Key takeaways

Enterprise computer vision use cases and infrastructure needs

FAQs

Company

Categories

Enterprise Computer Vision at Scale: Dell’s Jason Nassar on Edge AI, Real-Time Inference, and AI Factory Infrastructure

Episode summary

Key takeaways

Enterprise computer vision use cases and infrastructure needs

FAQs

Transcript

Company

Categories