Google has pulled the covers off an experimental, cloud-based data management system designed to ease the process of data integration and collaboration.
Dubbed Fusion Tables, the technology is in the alpha phase and is focused on “fusing” the areas of data management and collaboration – merging multiple data sources, discussion of the data, querying, visualization, and Web publishing.
“If you think about databases, they’re very much focused on giving very high performance SQL [query] processing and high throughput transactions, which is great…what we’ve been trying to do is take a different angle on this,” said Alon Halevy, a software engineer at Google. “We try to support collaboration among people.”
Through Fusion Tables, users can insert tables from their databases and share them with other users, circumventing the semantic heterogeneity problems that occur when trying to merge data from different sources.
“Data integration has always been a challenge,” Halevy said. “Taking two tables from one database and joining them together versus taking two tables from different databases and joining them together are very, very different kinds of activities. The reason is you usually design a database in such a way that your tables will merge very easily…(but) when two people develop databases independently it’s a much bigger challenge.”
The technology allows users to filter and aggregate the data, as well as visualize it on Google Maps and other visualizations from the Google Visualization API. In addition, users can discuss the data with -collaborators’ – those they share it with – by using a chat feature. If a collaborator with edit permission changes data during the discussion, viewers will see the change as part of the discussion trail.
In the current version, users can upload tabular data sets up to 100 MB each, with a maximum of 250 MB of data per user. The files can be spreadsheets (.xls or Google Spreadsheets) and CSVfiles. Google has entered a few tables in gallery for everyone’s use as they try the technology out. “What our goal has been is (to) remove that boundary of a database,” Halevy said. “If I develop a database table today and you develop a database table, and (then) we discover a few months later that they are actually related to each other, then let’s make it very, very easy for us to fuse the data from these two tables.”