Here is the latest article in an eWEEK feature series called IT Science, in which we look at what actually happens at the intersection of new-gen IT and legacy systems. This one’s about how to reach observability at a high level.
Unless it’s brand new and right off various assembly lines, servers, storage and networking inside every IT system can be considered “legacy.” This is because the iteration of hardware and software products is speeding up all the time. It’s not unusual for an app-maker, for example, to update and/or patch for security purposes an application a few times a month, or even a week. Some apps are updated daily! Hardware moves a little slower, but manufacturing cycles are also speeding up.
These articles describe new-gen industry solutions. The idea is to look at real-world examples of how next-gen IT products and services are making a difference in production each day. Most of them are success stories, but there will also be others about projects that blew up. We’ll have IT integrators, system consultants, analysts and other experts helping us with these as needed.
Today’s Topic: Committing to data-driven engineering and reaching high level of observability
Name the problem to be solved: Armis is a leading agentless device security platform, and its systems generate massive amounts of data. With no unified logging solution, whenever the engineering teams had to troubleshoot an issue, they needed direct server access for each system.
Not only did this create permissions issues, it was not providing them with a comprehensive understanding of the data across multiple systems. Additionally, such large amounts of data and the natural unpredictability of data flows, often resulted in exceeded quotas and billing overages.
What the team really needed was a self-serve solution that would allow developers access to all of the relevant system logs for real-time monitoring and alerting at scale, with integrations to their workflow and management tooling.
Describe the strategy that went into finding the solution:
When Roi joined Armis as the new head of DevOps, he immediately saw the need to bring in a solution that would allow for optimization of the workflows as well as scaling coverage. He was a happy customer of Coralogix in his previous company and worked to implement the platform in Armis as well.
List the key components in the solution:
The solution enabled data-driven engineering with features such as data prioritization and filtering and data normalization. The prioritization of data means only critical logs are sent to hot storage while the rest are monitored in real-time using Coralogix Streama service and then directed to an S3 Bucket. Normalization of the data is also an important component as the data sources span from dev and testing to production. This feature helps to standardize log templates so that fields are unified across logs written by different developers in different systems.
The team at Armis also saw immediate value in Coralogix’s Live Tail feature, which gives a centralized view of all system logs in real time, as well as the Coralogix CLI, which allows the developers to access logs in the dev stage without using the browser.
Dynamic alerting and error ratio alerts are the cherry on top, along with version tagging and additional integrations to CI/CD tools, which help to accelerate version delivery and time to market while improving stability and quality.
Describe how the deployment went, how long it took, and if it came off as planned:
The initial integration took just a few hours, and then the Coralogix Support Team helped to get the setup completed for data input, parsing, dashboards and basically anything else needed.
Unlike traditional tools, the platform is very intuitive and requires very little training to use, which has led to widespread adoption across the entire engineering department.
Describe the result, new efficiencies gained, and what was learned from the project:
As a result, the team has achieved a high level of observability and is able to provide customers with a smooth and highly performant experience.
Some of the most impressive improvements that led to an increase in observability included standardizing all log data including legacy logs, reducing data usage by half for more economic monitoring and better coverage unrestricted by cost, improving the CI/CD pipeline from deploy time and automation tests to quick-impact analysis of new versions.
Today, the team has implemented a data-oriented development environment across the organization.
Describe ROI, carbon footprint savings, and staff time savings:
At the same time that the team is getting more impactful insights into the data, they’ve also saved almost $200K/year on storage costs, thanks to the data prioritization done by the Coralogix platform. The team is using Coralogix to attain broad coverage of areas that weren’t monitored at all in the past and to successfully serve some of the largest companies in the world with top-level SLA.
If you have a suggestion for an eWEEK IT Science article, email cpreimesberger@eweek.com.