And yes, Blinkx could make scanned-text PDFs searchable with on-the-fly OCR, but that's not quite ready, yet, for prime time.
Right now, desktop searching is one of those emerging technologies that in theorylooking through both your hard drive and the Web at the same time for relevant hitssounds life-changing but in practice still seems like the MP3 player before the iPod came out: Yeah, they work, but no one does it.
Microsoft and Apple are building desktop search features into future versions of their operating systems (respectively code-named Longhorn and Tiger), and search engine giants Google and Yahoo have their own branded desktop search utilities. Even ScanSoft gets into the discussion with its PaperPort OCR/desktop organizer software, which doesnt search the Web but it overlaps with a lot of the desktop search capabilities that the others offer.
And then theres Blinkx, a software company taking on all of the above. Some of this startups competitors work only on one platform or, in the case of Google, dont search PDFs.
We caught up with Blinkxs founder and CTO Suranga Chandratillake to chat about PDFs, searching them, why he thinks PDF users will like Blinkx and what on earth the company was thinking earlier this month when it issued a Blinkx Mac beta to coincide with Macworld and Steve Jobs trumpeting of Spotlight, Tigers desktop search package.
Is searching a PDF technically more challenging than other documents?
No, not really. If anything, theyre slightly better. Indexing is pretty much the same thing, but once youve got a search, you can highlight text inside a PDF. The words that you search for are highlighted in a PDF, which you cant do in Word, for example.
Do we already have more PDFs than we can organize on our hard drives? Will we have more in the future?
Its an extremely popular format, particularly in the business context--everything from sales orders and proposal letters all the way to ad copy and brochures. ... Being able to index them and sort through them is critical. Theres no way we could have launched a product without support for PDF.
PDF is definitely as significant as any Microsoft Office format. In the surveys and analyses that weve done, the biggest data type, by far, is e-mail that can be up to 60 percent of the average persons data. ... The other 40 percent are split between the productivity formats. There are some exceptions. You do get designers, for example, that have a lot of CAD files, but for the average office worker we see a split--by file size--of 35 percent to 40 percent of whats left is PDF, and the rest is split among the Windows Office formats.
I think that PDFs are going to get extremely popular. Think of the things we buy on the Web and the services we pay for on the Web. I pay my phone bill, cable bill, power on the Web. All these people send me PDFs. People are just going to keep using it more and more and more, and sure at the moment its companies sending things off to individuals and that inevitably leads to individuals sending things to each other. That all points to a massive increase. Finding the right PDF at the right time gets to be a bigger and bigger problem, and thats where Blinkx wants to step in.
Read the full interview on PDFzone: "Blinkx: Finding and Organizing Your PDFs."