After four months in beta, Google’s Cloud Dataflow services are now generally available to enterprise users, making it easier for them to create big data applications using simple programming languages and simple Software Developer Kits (SDKs).
The move, which also was accompanied by the general availability of Google’s Cloud Pub/Sub services, was announced by Eric Schmidt, project manager of Cloud Dataflow, and Rohit Khare, project manager of Cloud Pub/Sub, in an Aug. 12 post on the Google Cloud Platform Blog.
The Cloud Dataflow and Cloud Pub/Sub products are part of the company’s wide-ranging Google Cloud Platform, which provides the infrastructure for customers to use to build applications that can scale as a business grows, while reducing data processing latency.
“Cloud Dataflow is specifically designed to remove the complexity of developing separate systems for batch and streaming data sources by providing a unified programming model,” wrote Schmidt and Khare. “Based on more than a decade of Google innovation, including MapReduce, FlumeJava, and Millwheel, Cloud Dataflow is built to free you from the operational overhead related to large scale cluster management and optimization.”
The generally available version of Cloud Dataflow provides customers with a fully managed, fault tolerant, highly available, SLA-backed service for batch and stream processing, they wrote, while also providing good performance and an extensible SDK to support customer needs. Also provided is native Google Cloud Platform integration for Cloud Storage, Cloud Datastore, BigQuery, and Cloud Pub/Sub, including new full query support for BigQuery, they added.
Google’s Cloud Pub/Sub can be used by customers to integrate applications and services reliably, as well as analyze big data streams in real-time, wrote Schmidt and Khare. “Traditional approaches require separate queueing, notification, and logging systems, each with their own APIs and tradeoffs between durability, availability, and scalability. Cloud Pub/Sub addresses a broad range of scenarios with a single API, a managed service that eliminates those tradeoffs, and remains cost-effective as you grow, with pricing as low as 5¢ per million message operations for sustained usage.”
Also announced as part of the updated services is the beta release of Google’s new Identity and Access Management (IAM) APIs and Permissions Editor in the Google Developers Console, according to the post.
“These improvements allow users to control access down to the level of particular operations on specific topics and subscriptions,” wrote Schmidt and Khare, making it easier to connect multiple Cloud Platform projects, either within the same organization or to third-party services.
Google originally released the Cloud Dataflow beta back in April, according to an earlier eWEEK report. Cloud Dataflow provides unified programming primitives for both batch and stream-based data analysis. The SDK allows the Cloud Dataflow programming model to be widely used, so that developers can benefit from the productivity of writing simple and extensible data processing pipelines, which can describe both stream and batch processing tasks.
In June, Google announced that JDA Software Group will use Google’s Cloud Platform technologies, including App Engine and Cloud Dataflow, to deliver new supply chain management and data analytics services to its customers. JDA provides retail and supply chain planning and management software and services to about 4,000 customers worldwide. Its portfolio includes contract and demand management, customer engagement, factory planning and scheduling, inventory management and warehouse management services.