Talend has also expanded its Open Studio brand to include data quality, master data management and enterprise service bus components, each of which had previously shipped as separate open-source tools.
Last month, open-source
data integration tool Talend Open Studio reached its version 5.0 milestone,
packing a set of new and updated components for accessing and manipulating data
stored in a broad range of formats, applications and repositories. What's more,
Talend has expanded its Open Studio brand to include data quality, master data
management and enterprise service bus components, each of which had previously
shipped as separate open-source tools.
For this
review, I focused on the data integration component of the product, which I
last reviewed about two years ago, in its version 3.1 incarnation. Since that
time, Talend has bolstered the tool with nearly 200 new data components, most
recently including elements for accessing .Net data structures, for working
with Hadoop interfaces and for mapping XML data sources.
Also in
version 5.0, the tool, which had previously enabled users to create data
integration projects in Perl or Java, does away with its Perl capabilities. In
my experiences with the tool, I stuck to Java-based projects, as TOS
shipped with a broader range of Java-based data components.
Talend
Open Studio is built on the popular Eclipse platform, which should make the
tool familiar to anyone who's used Eclipse or another Eclipse-based development
tools. In addition, the Eclipse foundation provides TOS
with excellent cross-platform support-the download for the product contains
versions for Windows, Linux, OS X and Solaris.
Talend Open Studio for Data
Integration is licensed under the GPL and available for free download at
www.talend.com. For larger data integration projects, organizations can tap
fee-based enterprise editions of Talend's integration and data management tools
that include additional features aimed at supporting developer teams. Talend
has announced plans to follow the 5.0 editions of its open-source tools with
updated enterprise editions by the end of the year.
Talend Open Studio in the Lab
I did
most of my testing with the Talend Open Studio 5.0 RC3 on the 64-bit edition of
Fedora 16. My test machines were equipped with 3GB and 4GB of RAM-I
don't recommend using any less, as TOS
consumes a good deal of of memory.
On my
Linux systems, I encountered a problem starting up TOS-the
product requires Xulrunner from the Mozilla project, but the 2.0 version of
Xulrunner that ships with Fedora wasn't working. Mozilla offers 32-bit
Xulrunner runtimes, but not 64 bit, so I compiled a 64-bit version of Xulrunner
1.9.x and specified the path to the runtime in my TOS_DI-linux-gtk-x86_64.ini
file.
I installed and fired up TOS
on a separate machine running the 64-bit edition of Windows 7 and encountered
no such issues running the product.
I noticed that Talend has
somewhat streamlined the initial startup process for the product. In previous
versions, I'd had to create a repository and user account within which to build
my projects. In version 5.0, the tool did away with the repository and user
creation step, apparently taking care of this step automatically, behind the
scenes.
I was
instead prompted to create an account on Talend's Exchange, a community for
sharing custom components among Talend users. TOS
now includes a portal to the Exchange, which is also accessible through the
Web, into the product's main interface. I managed to find a useful component
for posting updates to Twitter in the Exchange, but I had to visit the Website
to download this component-possibly because the component was marked
as supporting only 4.x versions of TOS.
For a TOS
test case, I set out to automate the reposting of my public updates on Google+
to my Twitter stream, pairing Talend's JSON data components with the Google+ API,
and the Twitter posting component I mentioned above with my Twitter account. In
between, I used a Talend tMap component to combine multiple elements from the
Google+ stream into the single message for posting to Twitter.
I found it easy to drag data
components from the tool's palettes to a design canvas on which I created data
integration jobs. The most challenging part of the process turned out to be
parsing the JSON data from Google+, particularly when neighboring posts in my
stream included different pieces of data. For instance, posts consisting of a
single chunk of text lacked the URL and image attachments of shared story
links.
One of the newer feature
additions to Talend Open Studio is an XML mapping tool that helps users grab the
data they want from an XML source, much like the tools for sussing out the
structure of CSV or other delimited data
types that the product has long included.
In future
versions of Talend Open Studio, I'd like to see a similar tool aimed at
JSON-formatted input. In recent years, JSON has been overtaking XML in many of
the Web service APIs that I encounter, and a stronger set of tools around JSON
would be a welcome addition to the product.
During my
tests, I worked on my job at my home and office machines, and found that it was
easy to export my in-progress job to an archive file and import it onto my
active system. When the time comes to deploy my job, TOS
makes it easy to wrap up the job code and any dependencies into a WAR file for
deployment on a Java application server.
As Editor in Chief of eWEEK Labs, Jason Brooks manages the Labs team and is responsible for eWEEK's print edition. Brooks joined eWEEK in 1999, and has covered wireless networking, office productivity suites, mobile devices, Windows, virtualization, and desktops and notebooks. Jason's coverage is currently focused on Linux and Unix operating systems, open-source software and licensing, cloud computing and Software as a Service. Follow Jason on Twitter at jasonbrooks, or reach him by email at jbrooks@eweek.com.