Whats the Difference Between Storage and File Virtualization?

Nik Simpson, Storage Analyst for the Burton Group, begins a series of questions and answers about virtualization and how it relates to storage.

Q: Whats the difference between "storage virtualization" and "file virtualization"?
A: At one level pretty much everything in storage these days is virtualized to some degree. Low level standards like SCSI and RAID are forms of virtualization that abstract or hide the underlying physical attributes of disk drives from operating systems and applications. But thats not the usual meaning of virtualization as applied to storage today. In the modern sense there are really two major kinds of storage virtualization, one at the block level and one at the file level. Block-level virtualization is usually just called storage virtualization, and serves applications such as database software that need block-level access to data. The disks will typically (but not always) reside in Storage Area Network arrays (SANs). File virtualization is really something completely different. It serves applications that need to access data in the form of entire files rather than block-by-block, these files will typically reside in file systems located on Network Attached Storage (NAS) devices.

Q: Why is block-level storage virtualization needed?
A: The need for this kind of virtualization arose because SAN users found that a lot of important storage management services were restricted to the disks in a particular array and couldnt be expanded beyond that. Once you filled up all the disks in that array, you had to get another array, and that meant a new thing to manage. If the new array was from a different vendor, you ran into the fact that each vendor has a closed architecture. You cant manage them from the same console, its hard to replicate data between them, and so forth. Storage virtualization tries to remedy these problems by moving key management functions off the storage arrays internal controller out into the network. Once you have done this it no longer matters what the maximum capacity of a given array is or what type of array is your replication target. You can just manage a bunch of discrete physical arrays as if they were one big virtual array. The first vendors to try this approach were companies like DataCore and FalconStor starting in the late 90s or early 2000s, but today plenty of vendors offer it.