EMC Adds New Options for Navigating Data Lakes
The data lake architecture is a store-everything approach to big data; forget about deduplication.SAN JOSE, Calif.—Data lakes are the newest hot-button term in the IT business world, and so vendors connected with them in any way, shape or form are quickly jumping into them. A data lake is a massive, easily accessible, centralized repository of large volumes of structured and unstructured data. With the deluge of data now pouring into enterprise coffers and storage pricing continuing to drop, basically all companies connected with storage hardware or software are now figuring out how to include this in their marketing messages. According to Techopedia, the data lake architecture is a store-everything approach to big data; no deduplication is used. Data is not classified when stored in the repository because the value of the data is not clear at the outset. As a result, data preparation is eliminated. A data lake is thus less structured compared with a conventional data warehouse. When the data is accessed, only then is it classified, organized or analyzed. Data silos are a thing of the past with a data lake in the system.
EMC has had its own Data Lake Foundation strategy for several months, and on Feb. 19 at the O'Reilly Strata + Hadoop 2015 conference at the San Jose McEnery Convention Center here the Hopkinton, Mass.-based giant made a few announcements about it: