Close
  • Latest News
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Cloud
    • IT Management

    Best Practices for Fixing Software Problems

    These best practices enable software engineers and team managers to focus on what matters most: learning from problems and making things better for the future.

    By
    eWEEK EDITORS
    -
    February 16, 2022
    Share
    Facebook
    Twitter
    Linkedin

      There’s no such thing as a perfect software product. No matter how stable your application is, there’s bound to be occasions where things go wrong in production. To make the most and learn from each incident, it’s crucial that engineering teams regularly commit to doing post-mortem investigations.

      This is especially important as companies grow and teams increasingly transition to a remote working environment. Even something that seems small can be analyzed and learned from in order to prevent future, and potentially more serious, vulnerabilities.

      Having best practices in place for how to conduct a post-mortem software investigation around an incident is something that cannot be overlooked by technology providers.

      Also see: The Best Project Management Software 

      Fixing Software Problems: Key Steps 

      While there’s no one-size-fits-all solution for every team, there are several fundamental steps that should be taken to make it an effective process and ensure that incidents remain rare.

      • Collect data during the incident. It’s important to collect as much data as you can in a single location, as the incident goes on. This includes server graphs, snippets from logs, and screenshots showing what was going on at each point in the incident. It doesn’t all end up being useful, but it’s good to have everything collected when you start going through the investigation in detail.
      • Start the investigation right away. Get one of the developers/managers involved to take on the role of lead investigator, which means they’re in charge of making sure the investigation gets done, the post-mortem document gets filled in, and the debrief gets held. Starting it right away makes sure nothing gets lost.
      • Review the results within a week. While things are still fresh, hold a debrief to review the post-mortem document as a group, discuss the action items, and make any edits needed. This can be a 30-60 minute video session with the team involved in the incident, as well as representatives from other departments (primarily the customer support team, but any impacted department should attend).
      • Share the results. As soon as the debrief is done, everyone should get a chance to learn from it. Post it where the whole company has access to it for transparency – incidents shouldn’t be hidden away.

      Also see: Digital Transformation: Definition, Types & Strategies

      Additional Measures for Efficient Software Fixes 

      These best practices will set up teams for success, but as the future of work evolves, there are new challenges in following them.

      For instance, companies are now facing employees working in all sorts of time zones, and a mix of remote and hybrid teams makes scheduling and coordination much more complicated. There are several additional measures that can help ensure that a post-mortem investigation remains effective, regardless of the environment:

      • Assume async. Scheduling the debrief more quickly means that it’s harder to find a spot in everyone’s calendars. Rather than pushing the meeting further and further out, do more of the work asynchronously. Make sure the document can stand on its own, and use the quickest communications channels to ask people for their contributions. Also consider recording the debrief (easy with Zoom) so that anyone who couldn’t attend is also able to watch it later, so nobody has to worry about missing out.
      • Complete the investigation quickly. It’s important to shorten the timeline expectations on the investigation. Collecting the data early avoids having multiple ongoing investigations, and allows everyone involved to get back to their sprint work sooner.
      • Simplify the incident document template. Consider simplifying the template so that there are less sections to worry about, and make each section as easy as possible to fill in. In order to still be complete, this document should include sections for:
      • Impact and Scope
      • Trigger (what started the incident)
      • Resolution (what ended up fixing it)
      • Timeline of events
      • Root Cause
      • What went well
      • What didn’t go well
      • Action items
      • Data & Analysis (all the charts)
      • Ask for input from customer-facing teams right away. A customer success team always has great input and is able to help fill in gaps in the timeline. Reach out to them early so there’s time for their input to be added into the post-mortem document before the debrief. Waiting for the debrief is too late!
      • Track action items in backlogs. Why track action item progress in an incident document when there is already a standard tool for tracking work? As soon as you can, get all action items from post-mortems so they can be assigned to backlogs and don’t get lost. It’s also beneficial to have automated reports set up to view the list of outstanding post-mortem actions—driven by a post-mortem label on the items.
      • Have a section for “things we should do if we have time.” Realistically, not all action items are actually actionable—some are more aspirational or something everyone should keep in mind. In order to keep the action items clearer, include this section as a spot to put the things you think are important but you couldn’t turn into assignable/trackable work. It’s better to have a smaller set of action items that you actually do than a giant list of things you would like to do given infinite time.
      • Keep it Blameless. This one isn’t actually new, but it’s well worth repeating! Be interested in what happened and what you’re going to do to fix it going forward, not in pointing fingers.

      Remote work and fast-paced development don’t have to make incidents complicated. By following these best practices, software engineers and team managers can make the most of an incident post-mortem and focus on what matters most: learning from it and making things better for the future.

      Also see: 7 Digital Transformation Trends Shaping 2022

      About the Author: 

      Jesse van Herk, Senior Manager of Product Engineering, Jobber

      eWEEK EDITORS
      eWeek editors publish top thought leaders and leading experts in emerging technology across a wide variety of Enterprise B2B sectors. Our focus is providing actionable information for today’s technology decision makers.

      MOST POPULAR ARTICLES

      Android

      Samsung Galaxy XCover Pro: Durability for Tough...

      Chris Preimesberger - December 5, 2020 0
      Have you ever dropped your phone, winced and felt the pain as it hit the sidewalk? Either the screen splintered like a windshield being...
      Read more
      Cybersecurity

      Visa’s Michael Jabbara on Cybersecurity and Digital...

      James Maguire - May 17, 2022 0
      I spoke with Michael Jabbara, VP and Global Head of Fraud Services at Visa, about the cybersecurity technology used to ensure the safe transfer...
      Read more
      Cloud

      Why Data Security Will Face Even Harsher...

      Chris Preimesberger - December 1, 2020 0
      Who would know more about details of the hacking process than an actual former career hacker? And who wants to understand all they can...
      Read more
      Cloud

      Yotascale CEO Asim Razzaq on Controlling Multicloud...

      James Maguire - May 5, 2022 0
      Asim Razzaq, CEO of Yotascale, provides guidance on understanding—and containing—the complex cost structure of multicloud computing. Among the topics we covered:  As you survey the...
      Read more
      Big Data and Analytics

      GoodData CEO Roman Stanek on Business Intelligence...

      James Maguire - May 4, 2022 0
      I spoke with Roman Stanek, CEO of GoodData, about business intelligence, data as a service, and the frustration that many executives have with data...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2021 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×