Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Applications
    • Applications
    • Cybersecurity
    • IT Management

    Blackouts Begin at Home

    Written by

    Peter Coffee
    Published December 2, 2003
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      No, I did not get an advance copy of the Interim Report of the U.S.-Canada Power System Outage Task Force, released last week. Any resemblance between its conclusions and my column of two weeks ago is pure coincidence—and Im more worried than gratified by this prompt confirmation of the problem that I described.

      The task force report found three groups of causes for the August 14th blackout. Assuming that the pruning of tree limbs near wires doesnt need to be on your Web services agenda, the first and third groups still have high relevance for application developers as they move into broadly distributed systems with high-availability requirements. The report labels Cause 1 as “Inadequate Situational Awareness,” and Cause 3 as “Failure of the interconnected grids reliability organizations to provide effective diagnostic support.” The crucial event under the Cause 1 heading was the failure of an alarm system: “…[A]larm and logging software failed sometime shortly after 14:14 EDT… operators were working under a significant handicap without these tools. However, they were in further jeopardy because they did not know that they were operating without alarms…”

      As I said above, its pure coincidence that two weeks earlier, I had written that “the scene is set for…catastrophe when something appears to be working—and isnt.”

      But wait, theres more. “At 14:41 EDT, the primary server hosting the [Energy Management System] alarm processing application failed, due either to the stalling of the alarm application, queuing to the remote terminals, or some combination of the two. Following preprogrammed instructions, the alarm system application and all other EMS software running on the first server automatically transferred (failed-over) onto the backup server. However, because the alarm application moved intact onto the backup while still stalled and ineffective, the backup server failed 13 minutes later, at 14:54 EDT. Accordingly, all of the EMS applications on these two servers stopped running.”

      In short words: a fail-over mechanism only made provision for a failure of the application platform; it did not correctly deal with failure at the level of the application itself, but instead merely duplicated that failure on the backup server.

      Even with both servers down, the system did not suffer hard failure; it did, however, slow down to the point that screen updates took almost a minute instead of the usual 1 to 3 seconds. Ordinary movements from one top-level screen to a lower-level detail screen, and back again, took minutes to perform. The report does not state this conclusion, but I will: interactive speed is part of correct application performance, and anything that slows application response needs to be treated as a form of failure—not just an annoyance.

      The servers failures triggered pager alerts to IT staff, who rebooted them. “At 15:08 EDT, IT staffers completed a warm reboot (restart) of the primary server. Startup diagnostics monitored during that reboot verified that the computer and all expected processes were running; accordingly, IT staff believed that they had successfully restarted the node and all the processes it was hosting. However, although the server and its applications were again running, the alarm system remained frozen and non-functional, even on the restarted computer. The IT staff did not confirm that the alarm system was again working properly with the control room operators.”

      As with the initial fail-over, the problem here was an ineffective definition of what it means to be “up and running.” The process was running, but the application represented by that process was not doing what it should—and it was not part of the problem resolution procedure to confirm that it was.

      Ive previously mentioned application assurance tools like those from TeaLeaf Technology Inc.. At eWEEK Labs, weve also seen significant improvement of late in application security analysis products like Sanctum Inc.s AppScan. The tools are there.

      Whats also needed, though, as the East Coast blackout report clearly shows, is a culture of responsibility for making sure that the system is meeting the enterprise need—and not just “working.”

      Tell me if your own system is a blackout waiting to happen.

      Peter Coffee
      Peter Coffee
      Peter Coffee is Director of Platform Research at salesforce.com, where he serves as a liaison with the developer community to define the opportunity and clarify developers' technical requirements on the company's evolving Apex Platform. Peter previously spent 18 years with eWEEK (formerly PC Week), the national news magazine of enterprise technology practice, where he reviewed software development tools and methods and wrote regular columns on emerging technologies and professional community issues.Before he began writing full-time in 1989, Peter spent eleven years in technical and management positions at Exxon and The Aerospace Corporation, including management of the latter company's first desktop computing planning team and applied research in applications of artificial intelligence techniques. He holds an engineering degree from MIT and an MBA from Pepperdine University, he has held teaching appointments in computer science, business analytics and information systems management at Pepperdine, UCLA, and Chapman College.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.