Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Applications
    • Applications
    • Database
    • IT Management
    • Networking
    • Storage

    REVIEW: Talend Open Studio Makes Quick ETL Work of Large Data Sets

    Written by

    Jason Brooks
    Published September 28, 2009
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      When I reviewed the iLuminate 4.0 data warehousing product in June, I spent a healthy share of my testing time getting the data I’d chosen for testing-a set of campaign finance records from the OpenSecrets project-cleaned up and ready for loading into the iLuminate product. This involved writing shell scripts to target small import snags in the data set, as well as divvying up some of my comma-separated values files to duck Excel’s million-row-per-spreadsheet limit (a limit that, like the apocryphal 640K of PC memory, I never thought I’d hit).

      The next time I have to deal with large data sets during testing, I may well turn to Talend Open Studio, an open-source ETL (extraction, transformation and loading) product that makes it easy to round up data, tweak it en masse, and load it into target systems such as databases and enterprise applications.

      For eWEEK Labs’ images of Talend Open Studio in action, click here.

      Talend Open Studio is built on top of the Eclipse platform, which made the product’s interface somewhat familiar to me right from the start.

      Licensed under the GPL and available for free download at www.talend.com, Talend Open Studio is a powerful tool in its own right. But for larger data integration projects, organizations can tap Talend Integration Suite, a subscription-based version of the product that adds technical support and additional capabilities intended to support large teams and deployments.

      Pricing for Talend Integration Suite starts at $4,000 per developer seat per year. Talend offers other editions that target improved parallel processing and real-time performance. (For the full lineup of Talend data integration products, go to here.)

      Scratching the Surface

      In my tests of Talend Open Studio, I only scratched the surface of what the tool can do. The product ships with an impressive range of components for accessing data sources and targets, as well as components for manipulating and integrating data in a variety of ways. Also, the product, which hews to a code-generation strategy involving Java or Perl code, is quite extensible.

      I tested Talend Open Studio Version 3.1 on a machine running 64-bit Ubuntu Linux 9.04. TOS is also available in versions for PowerPC and x86 versions of Linux, for Solaris, for 32- and 64-bit versions of Windows, and for Apple machines running OS X.

      I began my tests by firing up TOS and creating a new project. It was at this point that I could choose between creating a Java- or Perl-based project. For my tests, I stuck with Java-based projects. I set out to import the bulk campaign finance data into a MySQL database by first creating metadata elements for the comma-separated value text files in which the candidate and individual contributor data I wanted to work with were stored.

      TOS presented me with a very straightforward wizard that stepped me through singling out my data file and identifying the proper field and row separators and escape characters required to properly parse my file. As with other import tools I’ve used, the Talend tool included a preview window that made it easy to see that my file would be parsed as expected.

      I also used this wizard to populate my column definitions with names and data types. For the date information in my data file, I specified the correct month-day-year format. Once I saved the metadata definition that the wizard helped me create, I could apply that definition to other, similarly formatted data files. So, for example, while I based my metadata definition on campaign data from the 2010 election cycle, I could use the same definition for subsequent imports of previous election cycles.

      I created a separate metadata element for my MySQL database. This step involved filling out a field with the connection information for my database, just as I would configure a database query tool connection. I could choose from 31 different types of database connections, which included a full range of database products as well as generic ODBC and JDBC connections.

      With my source and target elements ready to go, I dragged each element onto the design canvas for the data integration job I’d created. I indicated that my delimited file would be an input element and that my database would accept output. From here, I modified a few settings for the database output element, indicating, for instance, that TOS should create a new table to receive the data.

      I ran my new job and watched as Talend’s rows, rows-per-second and elapsed-time indicators marked the flow of my test data into my MySQL database.

      With this data in place, I built a new job-this time with my already-configured database element as a source to provide input, as well as a new element, for an instance of SugarCRM that I’ve been testing, to accept the output. Configuring the SugarCRM element was similar to setting up my database connection: I provided the SugarCRM Web services URL and my authentication information, and selected which table I wanted to use from the Sugar system.

      I also added a third element to my job design canvas-a tMap element, which enabled me to map particular columns from my MySQL source to my chosen SugarCRM table, as well as to transform the column values en route between the two stores. I used Talend’s expression builder, for instance, to extract the last names from a full name column in my source table using a function provided with the product.

      Executive Editor Jason Brooks can be reached at [email protected].

      Jason Brooks
      Jason Brooks
      As Editor in Chief of eWEEK Labs, Jason Brooks manages the Labs team and is responsible for eWEEK's print edition. Brooks joined eWEEK in 1999, and has covered wireless networking, office productivity suites, mobile devices, Windows, virtualization, and desktops and notebooks. Jason's coverage is currently focused on Linux and Unix operating systems, open-source software and licensing, cloud computing and Software as a Service.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.