Pentaho announces it has open-sourced its Pentaho Kettle big data management and analytics tools under the Apaches 2.0 license.
announced that it has open-sourced
its big data analytics tools, known as the Pentaho Kettle project, under the
Apache 2.0 license.
said it has made freely available under open source all of its big data
capabilities in the new Pentaho Kettle 4.3 release, and has moved the entire
Pentaho Kettle project to the Apache License, version 2.0.
Apache is the license under which Hadoop and several of the leading NoSQL
databases are published, this move will further accelerate the rapid adoption
of Pentaho Kettle for Big Data
by developers, analysts and data scientists as the go-to tool for taking advantage
of big data, the company said.
capabilities available under open-source Pentaho Kettle 4.3 include the ability
to input, output, manipulate and report on data using the following Hadoop and
NoSQL stores: Cassandra, Hadoop HDFS, Hadoop MapReduce, Hadapt, HBase, Hive,
HPCC Systems and MongoDB.
order to obtain broader market adoption of big data technology including Hadoop
and NoSQL, Pentaho is open sourcing its data integration product under the free
Apache license," said Matt Casters, founder and chief architect of the
Pentaho Kettle Project, in a statement. "This will foster success and
productivity for developers, analysts and data scientists giving them one tool
for data integration and access to discovery and visualization."
regard to Hadoop, Pentaho Kettle makes available job orchestration steps for
Hadoop, Amazon Elastic MapReduce, Pentaho MapReduce, HDFS File Operations and
Pig scripts. All major Hadoop distributions are supported, including Amazon
Elastic MapReduce, Apache Hadoop, Cloudera's Distribution including Apache Hadoop
(CDH), Cloudera Enterprise, EMC Greenplum HD, HortonWorks Data Platform powered
by Apache Hadoop, and MapR's M3 Free and M5 Edition. Pentaho Kettle can execute
ETL transforms outside the Hadoop cluster or within the nodes of the cluster,
taking advantage of Hadoop's distributed processing and reliability.
officials said Pentaho Kettle for Big Data delivers at least a tenfold boost in
productivity for developers through visual tools that eliminate the need to
write code such as Hadoop MapReduce Java programs, Pig scripts, Hive queries,
or NoSQL database queries and scripts.
addition, Pentaho officials said Pentaho Kettle also:
- Makes big data platforms usable for a huge breadth of developers, whereas
previously big data platforms were usable only by the geekiest of geeks with
deep developer skills such as the ability write Java MapReduce jobs and Pig
- Enables easy visual orchestration of big data tasks such as Hadoop MapReduce
jobs, Pentaho MapReduce jobs, Pig scripts, Hive queries and HBase queries, as
well as traditional IT tasks such as data mart/warehouse loads and operational
data extract-transform-load jobs;
- Leverages the full capabilities of each big data platform through Pentaho
Kettle's native integration with each one, while enabling easy co-existence and
migration between big data platforms and traditional relational databases; and
- Provides an easy on-ramp to the full data discovery and visualization
capabilities of Pentaho Business Analytics, including reporting, dashboards,
interactive data analysis, data mining and predictive analysis.
Kettle's powerful ETL [extraction, transformation and loading capabilities]
enables developers and analysts to more quickly integrate MongoDB into their enterprise
environments by allowing them to transform and report on data they have stored
in MongoDB," said Erik Frieberg, vice president of marketing and alliances
at 10gen, in a statement. "Pentaho Kettle for Big Data is a great addition
to the MongoDB ecosystem, and 10gen looks forward to continuing to work with
Pentaho to further develop this open-source tool with the MongoDB
Pentaho and Cloudera partnership allows our joint customers to more quickly
integrate Hadoop within their enterprise data environments while also providing
exceptional analytical capabilities to a wider set of business users,"
said Ed Albanese, head of business development at Cloudera, also in a
statement. "We applaud Pentaho's decision to open source its big data capabilities
under the Apache License; the technology they are contributing is substantial
and is a big step forward in helping to accelerate adoption and make it easier
to use Hadoop for data transformation."
allows customers to analyze their structured and unstructured data together in
a single platform without ever having to move data outside of Hadoop,"
said Justin Borgman, CEO of Hadapt, in a statement. "Hadapt's
SQL-compliant query interface together with Pentaho Kettle for ETL allows
analysts to leverage their existing SQL skills for big data analytics on