It has never been more difficult to manage a file storage environment than it is today. The volume of data and the complexity of our infrastructures make it incredibly hard to change or provision new storage without impacting users or applications. And, for most of us, maintenance downtime is just a fond memory.
Some time ago it became clear to me that the traditional way of managing file data wasn't cutting it anymore for us at Shopzilla. Every task just took too much time to complete. Each task would noticeably affect production workflow. On a day-to-day basis, we were spending a lot of time managing access to multiple file servers in order to balance the capacities generated by our applications and development staff. We were often exceeding physical volume size limits set forth by our storage vendors. And, as you know, creating additional volumes also required new mount points and access paths.
Additionally, planning and executing data migration for a platform upgrade would take months, require modifications to the application source code, and consume our IT staff in the process. As a result, we often delayed upgrades and were stuck paying exorbitant support costs for end-of-life gear.
How Does File Virtualization Help?
After researching our options, we decided that file virtualization was the best answer. File virtualization abstracts multiple physical storage devices into a single virtual pool. From this pool, you can more easily allocate or provision storage to end-users and applications. It can also be used to integrate storage equipment from multiple vendors into the pool (which, for me, is huge because I am a big believer in openness). I want a heterogeneous environment that lets me use the right tools for the job. I don't ever want to feel obligated to a single vendor when I make purchasing decisions.
The Global Namespace is the Key
The key to file virtualization is the Global Namespace. This is the logical representation of file content, irrespective of the physical file server or devices on which the content resides. When a file storage environment is virtualized, clients access the Global Namespace rather than the file servers directly. This has two important consequences.
First, it simplifies access to files so that the client no longer needs to have multiple mount points to access data located on different network drives. And second, the client retains the same logical drive mapping to the Global Namespace - regardless of the physical location of the file. This allows file movement to take place without affecting mount points or file paths.
There are several vendors who offer various flavors of file virtualization. Some provide the Global Namespace, while others require that you have another Global Namespace running along with it. We chose F5 Networks' F5 Acopia solution. We did so because, among its other benefits, it supplies us with the Global Namespace.
Regardless of which vendor you choose, there are three specific steps you should take to make sure your implementation is successful. They are as follows:
1. Create an implementation team.
You can't implement file virtualization in a vacuum because it is something that touches many parts of the organization. Think about who will be impacted. Make sure they are represented on your team. We had our Network Administrator, the Storage Administrator, the Application Managers (who also represented production) and engineering users. The vendor's technical representatives were, of course, also on the team.
2. Design the Global Namespace.
We brought everyone together in a room for an afternoon and drew the new directory structure on a whiteboard. We started at the top, with the Global Namespace, and developed our internal plan. The plan determined where all the volumes would reside and who would have access. It was incredibly compelling to see so many mount points from eight different servers flow into one single tree. It was also important for the members of the team to clearly understand the new structure, and to see where their piece of the pie would be located.
3. Set a date for the cutover, but take your time.
One of our biggest concerns was that the cutover have absolutely no impact on production. While the vendor assured us we could cut over the entire Global Namespace at once, we opted to take a more cautious approach. We tested subsets of the new tree instead. As clients and users test their portion of the tree, we ran our regular processes in parallel. Then, we cut over each subset individually to be sure each piece functioned properly before bringing in the next piece.
The phased-in approach worked for us because it meant we always had a back-out plan. As a 24x7, online operation, we had to know quickly if there were going to be any problems so we could immediately revert to the original configuration. Fortunately, that was not an issue for us because the implementation went very smoothly.