Here is the latest article in a new eWEEK feature series called IT Science, in which we look at what actually happens at the intersection of new-gen IT and legacy systems.
Unless it’s brand new and right off various assembly lines, servers, storage and networking inside every IT system can be considered “legacy.” This is because the iteration of both hardware and software products is speeding up all the time. It’s not unusual for an app-maker, for example, to update and/or patch for security purposes an application a few times a month, or even a week. Some apps are updated daily! Hardware moves a little slower, but manufacturing cycles are also speeding up.
These articles describe new-gen industry solutions. The idea is to look at real-world examples of how new-gen IT products and services are making a difference in production each day. Most of them are success stories, but there will also be others about projects that blew up. We’ll have IT integrators, system consultants, analysts and other experts helping us with these as needed.
Today’s Topic: Building a High-Frequency Digital Banking Platform
Name the problem to be solved: Q2 eBanking provides a digital banking platform for banks and credit unions, managing more than 250 million customer interactions per month.
Virtually every application that made up the Q2 platform was critical to a customer’s and Q2’s own success. But troubleshooting the highly distributed environment was challenging without a unified application performance management (APM) solution. Q2 had been using a patchwork of tools to monitor an IIS and C# backend paired with a Node.js and Docker front end, with a heavy reliance on logs.
However, neither the logs nor tools were of much use if the problem turned out to be on the customer’s end or with one of the many third-party vendors upon which banks and credit unions rely. Q2 is also on its way to fully embracing a multi-cloud strategy. Its current environment resides in data centers; some acquired technologies run in private and public clouds.
The need for end-to-end visibility prompted Q2 to look into an APM solution.
Describe the strategy that went into finding the solution: Q2 initially tried monitoring applications by creating health rules based on a certain percent of errors, but that approach ended up being too noisy. To reduce false alerts, Q2 switched to dynamic baselines—these are automatically created by the AppDynamics controller using machine-learning algorithms to account for changes to baseline performance over time.
To test possible solutions, Q2 had engineers accelerate root-cause analysis for problems that were facing one of Q2’s biggest customers. When they were able to troubleshoot SQL calls that were more expensive than they should be and were conflicting or locking, AppDynamics really sold itself at that point because they were able to improve those stored procedures.
List the key components in the solution:
- AppDynamics End User Monitoring (EUM)
- Application performance baselines
- Application diagnostics
- Server visibility
Describe how the deployment went, perhaps how long it took, and if it came off as planned: During the proof of concept, engineers were able to accelerate root-cause analysis for problems that were facing one of Q2’s biggest customers. Q2 was able to troubleshoot SQL calls that were more expensive than they should be and were conflicting or locking. As the use of Q2’s platform climbed past 250 million customer interactions per month, the company’s dependence on AppDynamics increased as well, now instrumenting 380 applications with about 1,500 agents.
Describe the result, new efficiencies gained, and what was learned from the project: One of the major wins from adopting AppDynamics was a newfound ability to view the performance of Q2’s platform from the point of view of Q2’s customers. Before adding AppDynamics End User Monitoring (EUM), user-reported issues were often the most difficult ones to diagnose. Operations engineers would try to recreate a problem if they could, but many times they remained in the dark. With EUM, they were able to easily get a user’s perspective.
APM reduced troubleshooting times by hours and in some cases, days. Rather than assemble personnel from the networking team, storage team, VMware team and app team in an IT war room where they would typically end up pointing fingers at each other, AppDynamics better aligned business and IT through its cohesive platform.
Like many organizations, Q2 is transitioning to DevOps and has already begun to increase release velocity. As teams become more agile, AppDynamics is giving them confidence that new releases will not introduce regressions. Q2 has seen daily error rates drop using AppDynamics and experienced an increase in overall code quality.
Describe ROI, carbon footprint savings, and staff time savings, if any: Q2 has now instrumented 380 applications with about 1,500 agents. It doesn’t take Q2 more than 10 minutes to instrument an application these days.
Reduced troubleshooting by hours—and in some cases days— by pinpointing the location of a problem. “It really helps to have a giant, blinking red dot to tell people where to start looking when you have applications spread over 100 different servers and a variety of different applications,” said Jacob Ramsey, an AppDynamics administrator at Q2.
To reduce false alerts, Q2 switched to dynamic baselines—these are automatically created by the AppDynamics controller using machine-learning algorithms to account for changes to baseline performance over time. By incorporating dynamic baselines, Q2 was able to drive down the number of false alerts by more than 83 percent.
If you have a suggestion for an eWEEK IT Science article, email: cpreimesberger@eweek.com.