We have this great, juicy story for Windows Vista—not just general search architecture, but also how its "pluggable" as we go forward.
How do search facilities change the overall environment of the Vista user?
Search is integrated all over the OS. A kind of buzzword were throwing around is "instant search." There is a search column in every Explorer window. There is searchability from the Start menu, where I can pull up any application [whose name] I start typing. You see search everywhere.
Will developers be able to extend these mechanisms to include application-specific knowledge of what and how to search?
Theres a lot of extensibility built on top of the always-running indexer. We think its a great thing to have applications be able to integrate in with this indexer thats got all this great information. Part of the way were doing that is by having a search protocol [that an application can invoke] to get results.
Youll see Outlook 2007 is heavily integrated with the new indexer service, to the point where youre able to seamlessly search over all your e-mail. Then if youre searching Outlook, and you want to search over a file you saved off from a message, you can do it seamlessly.
Will Vista produce metadata as a byproduct of normal activity so that the index and search tools have more information to use? For example, if I insert a picture into a document, will that picture file be annotated with a piece of metadata saying, "This picture was associated with this document"? Could the picture then be surfaced in any search that returns that document?
I dont think were 100 percent there yet—per your scenario—but weve made a ton of steps around pushing the system full of metadata so it is very easy to correlate across multiple file types [and] across multiple user scenarios.
One of the things youll see, from the Start menu—if youre using Outlook—youll be able to search over all your e-mail. I can put in a keyword, I can also use a structured query language or I can put in "From Bob." Theres seamless integration between my e-mail store and my actual file system.
Will users get away from old restrictions on the scope of a search? For example, like searching only within a single folder at a time in Outlook?
Absolutely. Weve given you the ability to cross the physical boundaries when you do search: When you search, youre searching the index, a huge database of all the information that youve got that we understand.
The next big piece is all the information out there that we dont understand. Right now, the indexer obviously understands the file system—text files, Word files, PowerPoint, JPEG; things like that we understand. However, there are a lot of things out there that are big binary data blobs that we dont necessarily know how to infer the right data from.
A good example is the MAPI [Messaging API] store with Outlook. We wrote a protocol handler to crack that open; we can represent singular e-mails as data items in Windows. We think theres a huge amount of [potential] in that, expanding [the power to search across] the things on your system that arent just normal file types that Windows understands.
It seems as if file formats are moving in two directions. There are verbose XML or other tagged files, where you really could use some help in distilling the relevant pieces of their meaning. In the other direction, youve got compression routines for rich-media data whose output looks like random noise. Is there a lot of opportunity—as well as need—for people to be clever about how they expose application-specific content to your search engine?
Our general property store is a huge collection of property types that you could [assign to] a file, but its also extensible. One example we always give is that if youre [in] a law department, you could write property handlers for a new metadata type, like "legal review by," and be able to plumb that metadata into the system and Windows will understand it.
Im wondering—and pardon my nasty, suspicious mind—How granular do the privileges need to be that are associated with search? For example, isnt the fact that a certain memo was sent from a certain person to a certain person by a certain time potentially sensitive information, even if Im prevented from reading the memo?
Absolutely, those are great points, and weve thought about that in two ways. One, theres a natural trust boundary implementation in our indexer. From a file system point of view, you cant search on things you dont have permission to read. Youll have no [intimation] that theyre there at all.
That [prevents] accessing data that youre not supposed to. The other part of this is [when you find yourself thinking], "Ive put metadata into this thing that was really only for me to consume," where I tag something as, for example, "e-mail from my dumb boss." So we also have this built-in cleaning ability. Whenever youre sending a file out, you have the ability to go ahead and clean off all the metadata thats represented on it. This is also an extensible point. Imagine some pieces of metadata being for internal company use, and some pieces being for external [use]. You could write a cleaner that would automatically clean out the internal piece and just leave the external pieces of metadata there.
Would you then be able to make that associated in a plug-in sort of way with that file type, so a user could just say "clean this" and it would offer both the internal and the external flavors of "clean"?
Are Vistas metadata tagging facilities specific to NTFS [NT File System] data structures?
It comes down to the property sets that the file itself supports. Almost every file type has a header built into the file for all the property types it supports. So we have that bridge infrastructure where you can search, you can consume [and] you can add new things that you can search over.
Request for Comments
Have a comment or suggestion? Please e-mail Solutions Series Associate Editor David Weldon at firstname.lastname@example.org.