Eight Factors to Consider for Database Capacity Planning

 
 
By Darryl K. Taft  |  Posted 2015-07-13
 
 
 
 
 
 
 
 
 
  • Previous
    Eight Factors to Consider for Database Capacity Planning
    Next

    Eight Factors to Consider for Database Capacity Planning

    By Darryl K. Taft
  • Previous
    Working Set Size
    Next

    Working Set Size

    This is the set of data a system needs to address during normal operation. A complex system will have many distinct working sets, but one or two usually dominate. In stream-like apps such as email or a news feed, the working set can be much smaller than the total set. People rarely access messages more than a few weeks old; they might as well be considered a different system. It's most useful to think in probability bands: Over a given period of time, what is the probability of various pieces of data being used? For the initial analysis, you can focus on the rough size of the working set, as opposed to the detailed characteristics. However, those details often come back to bite you.
  • Previous
    Average Transaction Size
    Next

    Average Transaction Size

    This can be thought of as the working set of a single transaction performed by the system. How much data does the system have to touch in order to serve a transaction? Downloading a photo and running a Web search involve similar-sized answers sent to the client. However, the amounts of data touched in the background are very different. Note that we're using the word "transaction" to mean a distinct piece of work. This idea equally applies to big analytical jobs.
  • Previous
    Request Rate
    Next

    Request Rate

    How many transactions are expected per hour/minute/second? Is there a peak hour, or is demand steady? In a search engine, you may have five to 10 queries per user over a period of minutes. An online ebook reader might see constant but low volumes of traffic. A game may require multiple transactions per second per user. In short, consider the expected throughput. The combination of throughput and transaction size governs most of the total data flow of the system.
  • Previous
    Update Rate
    Next

    Update Rate

    This is a measure of how often data is added, deleted and edited. An email system has a high add rate, a low deletion rate and an almost-zero edit rate. An ad auction use case has ridiculously high rates for all three. A useful way to gauge how much to worry about the update rate is to compare it to the read throughput. The growth rate of the data also ties into the working set size or retention policy. A 0.1 percent growth rate implies a three-year retention (365 times 3 is about 1,000), and vice-versa. A 1 percent rate implies 100 days.
  • Previous
    Consistency
    Next

    Consistency

    How quickly does an update have to spread through the system? For a keyword advertising bid, a few minutes might be acceptable. Stock trading systems have to reconcile in milliseconds. A comments system is generally expected to show new comments within a second or two, with frantic work backstage to provide the illusion of immediacy to the commenter. Consistency is a critical factor if the update rate is a significant portion of the request rate. It is also critical if propagating updates is especially important to the business, e.g., account sign-ups or price and inventory changes.
  • Previous
    Locality
    Next

    Locality

    What portion of the working set does one request need access to? How is that portion defined? What is the overlap between requests? On one extreme you have search engines: A user might want to query bits from anywhere in your system. In an email application, the user is guaranteed to access their inbox only, a tiny well-defined slice of the whole. In another instance, you may have a deduplicated storage for email attachments, leaving you prey to hot spots.
  • Previous
    Computation
    Next

    Computation

    What math do you need to run on the data? Can it be pre-computed and cached? Are you doing intersections of large arrays? Are you bringing the computation to the data, or the other way around? Why?
  • Previous
    Latency
    Next

    Latency

    How quickly are transactions supposed to return success or failure? Users seem to be okay with a flight search or a credit card transaction taking several seconds. A Web search has to return within a few hundred milliseconds. An API that outside systems depend on should return in 100 milliseconds or less. It's also important to think about the variance. It's arguably worse to answer 90 percent of queries in 0.1 seconds and the rest in 2 seconds, rather than all requests in 0.2 seconds.
 

Read data in, write data out. In its purest form, this is what computers accomplish. Building a high-performance data processing system requires accounting for how much data must move, where it must move and the computational tasks needed. Meanwhile, our appetite for, and consumption of, data continues to increase, with seemingly no end in sight. However, the trick is to establish the size and heft of your data and focus on its flow. Capacity planning is the key to figuring out how much memory and storage you need, rather than frantically trying to fix problems as they arise. Identifying and correcting bottlenecks in the data flow will help you build a low-latency system that scales over time. Based on conversations with Eric Frenkiel, co-founder and CEO at distributed in-memory database provider MemSQL, this slide show offers guidelines on how database administrators can build more efficient databases that keep up with the ever-growing amount of data their companies store and process.

 
 
 
 
 
Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
Rocket Fuel