SnapLogic: Making Data Integration a Snap

Opinion: SnapLogic is an open-source startup that takes a different approach to data integration.

SnapLogic is an open-source project that enables developers to easily and rapidly integrate applications and other data sources through Data Integration Networks.

I met with Chris Marino and Molly Morelock, SnapLogics CEO and director of programs, respectively, about a month ago in San Francisco, and they werent ready to show their technology, but we had a good chat about what they were planning to deliver. This week at the MySQL conference, from April 23 to April 26 in Santa Clara, Calif., will be the first public showing of their stuff.

Marino said the cost and complexity of traditional data integration solutions has put real integration beyond the reach of many organizations. So SnapLogic is embracing the simplicity and scalability of the Web to enable Data Integration Networks by providing a data services layer that includes integration and transformation services as part of the network infrastructure, he said.

Marino said there are already a number of commercial solutions trying to solve the integration problem, ranging from enterprise service buses to business process management engines, to ETL (extraction, transformation and loading) solutions, "and even the SOA [service-oriented architecture] guys are getting into the picture," he said.

Moreover, Marino noted the pace at which the market for data integration is growing. In 2003, the size of the data integration market was about $9.3 billion. By 2008, the market is projected to grow to $18.8 billion, he said.

Meanwhile, in addition to the limitations of commercial integration solutions, the unique requirements of each project make customization a necessary, which typically means hand-coding. And the popularity of scripting languages like Perl, Python, PHP and Ruby has simplified and popularized the custom coding approach, Marino said.

"The handcraftedness is not captured by any solutions," he said, saying a new approach is needed.

Marino said the approach he promotes is to integrate as with the Web, and his answer is REST. "Were of the opinion that these de facto standards have included the solutions to the problem," Marino said. "Were of the opinion that Web services standards will continue to languish and simple things like RSS will proliferate."

As SnapLogic, based in San Mateo, Calif., said in a backgrounder on its technology, "The Web is a collection of loosely coupled servers that rely on simple, stateless interactions between client and server. Standard access protocols (TCP/HTTP), stateless access methods (GET/PUT/POST) and simple data formats (HTML) make it possible for any client (i.e. browser) to render any Web page. Technically, this is known as a REpresentational State Transfer, or REST architecture."

/zimages/5/28571.gifThe Restlet engine reaches Version 1.0. Click here to read more.

Moreover, "We recognized that the most successful APIs are not SOAP [Simple Object Access Protocol] APIs," Marino said. And interoperability standards such as SOAP-based Web services make no attempt to standardize or simplify the structure of the data, leaving it up to the user (or vendor) to build solutions that understand the data, Marino said.

Every data source should simply attach to the network where it can issue requests for the data it needs, he said.

Yet, for that to occur, a new approach to data integration is required that embraces the architecture of the Web itself. Integration capabilities should be built into a server infrastructure layer that supports Web-style or RESTful interactions for data and metadata, while supporting the execution of complex data transformation logic. And this new infrastructure layer would become the foundation of a new DIN (Data Integration Network) that provides RESTful access, transport and transformation services for data, Marino said.

"What we have done is taken the notion of a simple HTTP REST interface and used it for every step along the way for data integration," he said. A REST interface exists between each of the components, Marino said. "Our vision is to the point where this notion of distributed integration is that you enable the end points to be REST-enabled."

Marino said he hopes SnapLogics approach can get enough adoption so that it becomes a new layer of integration infrastructure.

"Were taking the data and putting it into a form that looks like RSS," he said.

Meanwhile, DIN connectivity strategy also requires a community of developers to collaborate, share and reuse each others components and deploy them on a common infrastructure.

Marino said that as an open-source project, "we can allow the community to handle the adapters" required to talk to various platforms.

/zimages/5/28571.gifGuru Jakob Nielsen offers advice on designing applications for usability. Click here to watch the video.

The SnapLogic server is a lightweight process that serves as the container and execution environment for configurable components. A catalog of standard SnapLogic Components provide base integration capabilities such as database queries, file read and write, aggregate, sort, filter, join, and others, the company said

Data sources accessed by SnapLogic Resources present the data in a simplified record-oriented format that eliminates the complexity of application-specific data schemas, the company said. This enables Resources to interoperate more easily and facilitates reuse. Resources can be linked to other Resources through their REST interface. Linked Resources become transformation Pipelines. Pipelines can be assembled into hierarchies to implement complex logic and transform data for sophisticated integrations.

SnapLogic servers support a request authentication model similar to Apaches HTTP authorization model so that each Resource can only be accessed according to user-specified ACLs (Access Control Lists).

Migrating SnapLogic to source and target servers turns them into Snap-enabled service endpoints supporting controlled access for all integration Pipelines that require their data, the company said.

"This is a solution that people can use to replace a lot of their hand-coded, crusty stuff," Marino said.

"We have a UI [user interface] that looks like Yahoo Pipes, except its not just for RSS," Marino said. "Its more like Yahoo Pipes for the enterprise, or Yahoo Pipes for everybodys data."

/zimages/5/28571.gifCheck out eWEEK.coms for the latest news, views and analysis on servers, switches and networking protocols for the enterprise and small businesses.