SAN JOSE, Calif.—Data lakes are the newest hot-button term in the IT business world, and so vendors connected with them in any way, shape or form are quickly jumping into them.
A data lake is a massive, easily accessible, centralized repository of large volumes of structured and unstructured data. With the deluge of data now pouring into enterprise coffers and storage pricing continuing to drop, basically all companies connected with storage hardware or software are now figuring out how to include this in their marketing messages.
According to Techopedia, the data lake architecture is a store-everything approach to big data; no deduplication is used. Data is not classified when stored in the repository because the value of the data is not clear at the outset. As a result, data preparation is eliminated.
A data lake is thus less structured compared with a conventional data warehouse. When the data is accessed, only then is it classified, organized or analyzed. Data silos are a thing of the past with a data lake in the system.
EMC has had its own Data Lake Foundation strategy for several months, and on Feb. 19 at the O’Reilly Strata + Hadoop 2015 conference at the San Jose McEnery Convention Center here the Hopkinton, Mass.-based giant made a few announcements about it:
—Isilon updated: The new Isilon HD400 allows customers to scale their data lakes to 50PB within a single cluster. The platform is ideal for customers who require a scalable, high-capacity platform to store between 2PB and 50PB. Isilon has had a strong presence in the digital film and television market for several years.
—New OneFS software: EMC announced a new version of its OneFS operating system, v7.2, which supports newer and more current versions of Hadoop protocols, including HDFS 2.3 and HDFS 2.4.
—New support for OpenStack Swift to support both file and object—the unstructured data types that are growing the fastest.
—Certifications: The key to realizing value from the data in an EMC data lake is using the rich analytics tools that ISV partners such as Cloudera, Hortonworks and Pivotal provide. They are all now certified for use in an EMC data lake.
In its Data Lake Foundation, EMC also offers something called In-Place Big Data Analytics, which uses shared storage and support for protocols such as HDFS to provide cost-efficient, in-place analytics with faster time to results.