Index Engines Adds Deduplication, Enhancements to Data Discovery Platform

The latest Index Engines appliance features auto-restarting for data collection and extraction jobs along with higher-speed data deduplication.

Index Engines announced performance enhancements that include deduplication to its large-scale data discovery platform on Oct. 5.

Generally available immediately, the latest release allows users more control over the indexing and data deduplication processes, the company said in a statement. IT managers can search and extract data more efficiently during data discovery in case of litigation or regulatory compliance. The new hardware platform also boasts 16 core processors and 72GB devoted to index storage.

"As enterprises become more litigation ready, they are proactively processing larger volumes of ESI [electronic stored information]," said Jim McGann, vice president of information discovery at Index Engines.

The Index Engines platform now supports auto-network discovery, making the discovery process more complete. IT managers can automatically find all network locations and endpoints, including servers and desktops, and not rely on their memory to create the list. Once the location has been discovered, the Index Engines appliance crawls the content to create the index.

Using NFS/CIFS file systems, Index Engines processes crawl content at speeds of 1TB per hour per node, the company claimed.

Index Engines 3.3 includes automated restart features for LAN indexing and backup tape extraction, ensuring uninterrupted data collection, the company said. This is particularly useful as faulty tape libraries or corrupt tapes can interrupt data processing, and data can be lost.

If the extraction or indexing processes are interrupted, the appliance auto-restarts and resumes the jobs, finishing the index without leaving any gaps, according to the company.

Typical large-scale deployments consist of multiple Index Engines appliances to process large network data environments and offline tape, according to the company. This means there is a possibility of the same piece of data being crawled by multiple appliances and indexed, which can be confusing when extracting results during data recovery.

To prevent such an occurrence, the release features distributed deduplication functionality so that the system doesn't save multiple copies of the same data. With this functionality, multiple Index Engines agents analyze the data and coordinate content extraction so that only unique files and e-mails are saved. This streamlines the collection process and saves storage space.

Other new features include auto-tagging electronic data based on stored queries, extracting e-mail to MSG format, and enhanced PDF support to discover and identify suspicious documents.

Companies need to retain all e-mails, documents and files for specified period of time, but the information can get unwieldy and hard to manage. Not being able to find a specific file or a series of e-mail communications when needed can be disastrous during a lawsuit or regulatory hearing. A data discovery platform like Index Engines provides an easy-to-search index that makes data extraction and recovery painless, regardless of whether the data is online or stored on proprietary backup and transfer formats, according to the company.

The appliances support index and search capabilities up to 1 billion data objects in a single box, the company claimed.

The base price for the unit is $85,000, and orders will be shipped within four weeks after a customer places an order.