Google has released a new technical paper and reference implementation for organizations looking for help in extracting value from the voluminous log data generated by their cloud applications and services.
Such log data can give companies valuable insights into not just how their applications might be performing but also how customers are interacting with them, Google Solutions Architect Sandeep Parikh said in a Nov. 24 blog post.
But the massive size of the data sets involved and the growing complexity of the data make it hard for business to process and extract value from log data, Parikh said.
“As deployments grow more complex, gleaning insights from this data becomes more challenging,” he said. For one thing, logs come from multiple sources, making them hard to collate and query. Building an infrastructure to collect, collate and analyze log data in massive volumes can be also be hard and requires expertise in running large storage systems and distributed servers.
The goal in releasing the technical paper is to give organizations an idea of how they can leverage Google Cloud Platform services and Google Cloud Dataflow to process logs, Parikh wrote.
Cloud Dataflow is a managed service that Google launched last year to help companies analyze massive data sets in real-time streaming mode as well as in batch-processing mode. It can be used to read and process log data from Google Cloud Storage and other sources and to extract and transform metadata, he said. Data from Cloud Dataflow can be sent to Google’s BigQuery analytics engine where it can be reviewed and analyzed, he said.
Similarly, enterprises can use Google’s Cloud Logging service to collect logs from cloud services and applications and store the data in Cloud Storage from where it can be accessed by Data Flow and other services.
The technical paper offers a scenario in which businesses can use such services to configure applications and services, collect and capture log files, store log data, process and extract it, and derive persistent insights from the data.
The paper provides an example of a shopping app used by an online retailer that lets users browse for particular products online and then to locate them at local brick-and-mortar stores. It walks administrators through how to configure the environment to collect log data generated by the app, aggregate it, load it into BigQuery and to query the data.
The paper also details the process of reconfiguring Cloud Logging in such a manner as to stream data into Dataflow and process logs in near-real time.
Analyzing logs at scale has become an increasingly important task for organizations. The proliferation of cloud services, mobile devices and the growing interconnectedness of enterprise networks have caused a data deluge at many companies.
Many vendors—including Splunk, Amazon, Loggr, Loggly and PaperTrail—have surfaced in recent years offering a range of log management services designed to help companies deal with the problem.