Microsoft Azure Search Scours Unstructured Data

Search can now index Blob Storage, and a new Log Analytics feature in the Operations Management Suite offers new insights into Azure virtual machines.

Microsoft Azure Search, unstructured data

After enabling Azure Search on cloud databases, Microsoft is now turning its attention to unstructured data.

Due to customer demand, Microsoft released a preview version of its Search indexer for Azure Blob Storage, the company's cloud-based unstructured data storage service, Eugene Shvets, a Microsoft Azure Search senior software engineer announced on Feb. 9. "Our indexers for Azure SQL Database and DocumentDB have been a hit with customers, and many of them have asked us to build similar magic for Azure Blob Storage."

The indexer is intended to spare customers the challenges of extracting text from "blobs," added Shvets. "Formats like PDF and DOC/XLS are binary and difficult to parse; content type detection and metadata extraction can be non-trivial tasks. Good tools exist, but integrating them into an indexing workflow still takes considerable effort and saddles customers with a bunch of code and infrastructure to maintain," he stated.

Azure Search blob indexer can extract text and metadata from PDF files, along with several Office document file formats (DOCX/DOC, XLSX/XLS, PPTX/PPT and MSG). The indexer also works on HTML, XML, ZIP, EML and, of course, plain text files. Instructions on setting up blob indexing are available in this company blog post.

For administrators seeking more information about their Azure virtual machines (VMs), Microsoft also announced a new Log Analytics capability this week. "Log Analytics (OMS) brings the power of Microsoft's new cloud-based management solution, Operations Management Suite [OMS], right into the Azure portal allowing you to provision a brand new OMS workspace, link workspaces to Azure subscriptions, and on-board Azure VMs directly to the OMS service," blogged Anurag Gupta, a Microsoft Open Source Technology Center program manager.

Microsoft also issued two new Azure Resource Manager (ARM) templates, Gupta said. "These templates allow you to quickly deploy a brand-new Windows or Linux VM that instantly on-boards to the OMS service."

Finally, Microsoft published new documentation and a code sample on GitHub for developers kicking off Azure-powered Internet of things projects that involve pulling data in from public data feeds.

"There are a number of documentation articles and code samples on pushing data from devices you control to Azure and for analyzing in combination with other streaming or static data," said Spyros Sakellariadis, principal program manager at Microsoft Azure Machine Learning, in a Feb. 11 announcement. "What's not as well documented is how to pull data from a public Website you don't control, then push that data into an Azure Event Hub," Microsoft's telemetry-ingestion service.

"A recent article and code sample I produced with Dinar Gainitdinov shows how building a simple application with a few lines of C#, does this," continued Sakellariadis. A modified version of the code combines "real-time motor vehicle data with maintenance records in Microsoft Dynamics and another version to analyze how traffic in the Seattle region was affected by the weather," he said.

Pedro Hernandez

Pedro Hernandez

Pedro Hernandez is a contributor to eWEEK and the IT Business Edge Network, the network for technology professionals. Previously, he served as a managing editor for the network of...