SnapLogic Takes Time to Learn

By Jeff Cogswell  |  Posted 2012-04-30

SnapLogic Brings Real-Time Data-Flow Analysis to Visual App Design Tool

The SnapLogic Cloud Integration Platform Spring 12 release adds real-time data-flow analysis, component validation and debug tracing to the data-connection platform. I recommend this tool if you need to do regular data processing and data conversion.

eWEEK Labs tests showed that the SnapLogic Spring 12 release proved capable of snapping together on-premise or cloud-based applications and data sources using €œsnaps€ representing the different types of applications and data sources, such as MySQL tables, comma-separated values (CSV) files, and even data-join operations.

IT managers controlling large data centers that need to process and move large amounts of data should add SnapLogic Spring 12 to their short list of products. SnapLogic is sold as a subscription and varies in cost, based on the amount of data throughput. Snaps range in price from no-cost to just under $10,000.

To try out the new features, I used the SnapLogic Designer to create a simple connection between a MySQL table and a CSV file. As I used the SnapLogic Designer, I was automatically provided with a choice of valid snaps, or €œcomponents€ as they€™re also called, for each operation and for each table. For example, a table called Customer could have a component for reading from Customer, one for inserting data into Customer, one for deleting data from Customer, one for looking up data from Customer and one updating Customer.

Finally, a component could perform what SnapLogic calls an €œupsert€ operation that combines an update with an insert action. These components can be dropped on the canvas, setting the stage for operations on that particular table. It is possible to start with the lower-level database components for more complex operations.

I chose a Read operation on a particular MySQL table. I then added a CSV_Writer component, which saves data to a flat file in comma-separated format. I connected the two components so that I could read data from the MySQL table and push it into the CSV_Writer component, which would, in turn, save the data it receives to a file that I could later open in Excel.

That€™s the general approach to SnapLogic: The data flows from one component to the next, and each component processes the data in a way that you specify through the component€™s configuration. I chose the fields to read from the MySQL table, and then, when I created the connection, the CSV_Writer automatically picked up those fields and their names by default. I could then rename those fields so they would appear differently in the final file, but I decided to leave them the same.

Then, I ran it by clicking the Run button. But an error message popped up, telling me I forgot to give my CSV file a name. So I clicked on the component, and at the bottom of the screen in the properties, I typed in a file name. Then I ran it again. This time, the operation succeeded. Done deal, and I had a CSV file with the data in it.

To program this operation manually, I would have to write a script that connects to the table, grabs the data and writes it to a file. But it took only seconds to drag each component on the canvas and connect them, and then a few more seconds to verify the field names I wanted and set the file name.

As the pipeline was running, I was able to test out one of the new features, real-time data analysis. I floated the mouse over the components and saw a pop-up window open with some statistics about the data as it€™s flowing through the particular component. These statistics included the number of records coming through the component, the records processed per second, CPU utilization and wait time. The statistics are live, and I could see the numbers updating continuously as my pipeline ran.

SnapLogic Takes Time to Learn

I found this feature extremely useful because it allowed me to know exactly how far along the process was, and that it was, in fact, still running. Further, I did the same type of analysis on each component, as well as the connectors, and saw similar information.

I already mentioned that I received an error when I ran the pipeline; however, among the new features is one called validation. This means that you don€™t have to wait until running the pipeline to catch your errors. Instead, you can validate your components as you create them.

To try this out, I added a filter (a special component that takes incoming data, filters the data based on parameters you provide and sends out filtered records to the next component). I connected the original MySQL_Read component into this filter so that the filter would receive the same data the CSV_Writer component receives, simultaneously. That alone is a cool feature, although not new. The data from a single component can be pushed into more than one component so you can do simultaneous processing.

I then tied the output of this filter to a second CSV_Writer component. But this time, before running the entire pipeline, I clicked on the second CSV_Writer component, and clicked Validate, which resulted in the same message as before: €œProperty File Name requires a value.€ The idea here is that I can catch the error now before adding more components, and well before running the pipeline.

The next new feature under debugging is Data Tracing. This feature let me see the data as it flowed through each component, both before the component processes the data and after. To test this out, I chose a run option called Trace All. This started the pipeline running, and I clicked on my first CSV_Writer component and then on the Trace tab in the properties window, which quickly filled with data records as they came in.

I was able to do this for all the components in my design, and see the data coming into that component and the data coming out. I checked the second CSV_Writer€™s data, both input and output, and could see that it was receiving a subset of the original data (since the data was coming in through a filter).

There€™s also a €œcopy to clipboard€ button next to each trace output so you can immediately grab the output and paste it into another app, such as Excel. What€™s also cool about the tracing feature is if you have multiple, intermediate steps (such as filters and joining of different data components), you can see this intermediate data, not just the final data in your resulting tables or files. That way, if you have a problem such as a wrong filter set, you can check each component€™s input and output until you find exactly where the problem is.

The only downside I could see with SnapLogic is that there€™s a learning curve, which depends on your level of experience with products similar to this. SnapLogic encourages its customers to get on-site training. In my case, that consisted of a member of SnapLogic€™s team taking me through the product, step-by-step, via an online meeting.

My training lasted about an hour, and even after that, I found myself a bit lost at the beginning. I had to dig through the window that holds all the components to figure out which one was which, and it wasn€™t immediately apparent how to get to the properties window or how to choose the run option to include tracing information. However, after I found all that, and was more comfortable, I was able to build what I needed to in only a matter of a few minutes.

Click here to view a related slide show on SnapLogic's new tool.


Rocket Fuel