Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Cloud
    • Cloud

    Monitor Everything

    By
    Darryl K. Taft
    -
    March 4, 2013
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      PrevNext

      1Monitor Everything

      1

      You can probably name your core gear off the top of your head—maybe not all your less high-profile stuff, but certainly critical devices. In GoDaddy’s case, a performance cascade turned a minor problem into a major outage. Use the discovery engine of your network monitor to ensure wide discovery with low manual configuration investment. Configure scheduled discovery to automatically detect new devices and assess how critical they are.

      2Detailed and Effective Alerting With Escalation

      2

      Out-of-the-box alerts are generally configured to send an email for any alert. It’s great to get a heads-up that a printer is down, but with a little investment, you can create specific alerting for critical infrastructure elements and make sure their alerts rise above the noise. It also allows you to configure aggressive escalation notifications to make sure they’re addressed quickly, before critical services are affected.

      3Check the Charts Every Morning

      3

      Even with the most sophisticated alerting and reporting approach, the human mind of an experienced network engineer is still the best network management tool ever invented—especially if the data is consolidated in one view. Regular observation of the historical performance charts of device memory, CPU and interface utilization allow the network support team to learn the bounds of even complex operation models. Performance charting also allows administrators to establish alerting thresholds, tuned to ensure proactive resolution before users are affected.

      4Create Targeted Map, NOC Views

      4

      There are unlimited uses for the detailed data collected by monitoring your critical network devices, but there’s no substitute for a bright-red alert on a big screen. Create maps that contain specific components of your critical network devices indicating overall status and related top-level metrics. For example, mount a 60-inch LED on the wall to display a geographic map with your core network devices with up/down status, including the primary network links between them with their associated utilization metrics. Your network operations center (NOC) team will always appreciate being the first to identify developing issues before users are affected.

      5Publish Rich Reports

      5

      This is helpful because some managers think the network is just like the phone system: it’s in the wall, you can’t see it and even the same sort of connector is used. They don’t think about capacity planning until it’s a problem. By publishing utilization reports regularly, you bring attention to the users driving the depletion of your mission-critical (and often expensive- and complex-to-upgrade) network hardware. Visibility into these issues makes expansion a regular topic of conversation outside your group, and purchasing requests more likely to be approved.

      6Limit Outages Caused by Human Error

      6

      Some of the worst outages in our careers have been from human errors, and it’s especially common with networking issues. Enter enough arcane command-line interface (CLI) commands hundreds of times at all hours of the day, and sooner or later you’ll have an accidental disaster. Having multiple engineers logging into local network gear complicates the process, and misconfiguration issues can be difficult to troubleshoot. Ensure you make nightly backups of your device configurations, ideally with a system that facilitates change detection.

      7Create an Internal Communications Plan

      7

      You don’t need process flow charts for every possible issue permutation, but a concise spreadsheet of reasonably likely issues can help the team get you back online quickly. Identify risk areas, team member responsibilities and initial troubleshooting steps. Include a reference to your team’s emergency contact info. It’s way better to quickly debug an outage at 2 a.m. on the VPN than explain it to your manager in the office at 8 a.m. when you arrive to the office.

      8Create an External Communications Plan

      8

      Work with your marketing communications or public relations team to create a communications plan in case your network outage affects customers and partners, as was seen with AWS, Azure and GoDaddy. The plan should include guidelines for when your department is expected to inform them of an outage, as well as the kind of information you’ll need to provide them. Your team might be expected to provide a brief summary of the issue, the time the outage occurred and an estimation of when it is expected to be resolved. Be sure to discuss expectations for how much detail can and should be given.

      9Prevent Issues From Growing Bigger

      9

      Sometimes outages happen outside your control, even after implementing the previous tips. To prevent issues from growing bigger than they need to be, you need the right alert management system so the right team is notified at the right time. For example, if a file was incorrectly deleted, production IT would be notified that an unexpected change was made. Also, a proper alert management system quickly engages the triage/production IT team related to the system or zone having issues as soon as the performance starts to degrade beyond acceptable limits—not after customers started complaining.

      PrevNext

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×