How to Integrate Large-Scale Databases with Perl

 
 
By Jeff Hobbs  |  Posted 2010-09-16
 
 
 

How to Integrate Large-Scale Databases with Perl


It's all too tempting today to always look to the latest technologies as a way to solve problems, without looking at tried-and-true methods that have been working for years. There's nothing wrong with adopting new tools, of course, but there's a tendency to throw the technological baby out with the bathwater because of the perception that new equals better. Sometimes it does, but when working with mission-critical systems and data, it's quite likely that the mature solutions are going to do better by you.

What I have learned is that Perl is one of the best ways to tackle integration of large-scale databases. Perl, while no longer the hot new thing, is mature and still thriving. Perl 5 has a long history of successfully working with open-source and commercial relational DBMS (RDBMS) such as MySQL, PostgreSQL, Oracle, SQL Server and many others. Thanks to that long history, I have learned a few things you ought to know.

Use the right tools

A common mistake is for developers to reinvent the wheel when working with databases. Don't! Use the standard DBI driver modules that come with Perl to connect to your database. They've been tested hard, probably by companies with even larger data sets and more traffic to and from the database.

Perl's DBD::* modules provide a standard database interface that defines methods, variables and conventions that is consistent. This means that working with databases is not only well-documented and tested, it also gives great flexibility down the road. You might build your application on MySQL for testing but deploy on PostgreSQL or Oracle. You might need to migrate away from SQL Server at some point. Perl's DBI module lets you avoid lock-in on the application side.

Another tip for using the right tools is to use package managers-rather than the Comprehensive Perl Archive Network (CPAN) module-to manage your Perl modules. There are package managers available that offer a good way to manage binary modules without having to build from CPAN for updates. If you're using Perl from a Linux distribution, the best bet is to use the packaged Perl modules from the distribution so you're getting testing and updates from the Linux vendor.

Never Trust the Client


Never trust the client

The customer is always right, but client input should always be assumed to be wrong. Data can be malformed accidentally or maliciously. But either way, it has the potential to cause problems.

Perl provides some excellent tools to sanitize external input data. Make sure that you're stripping "special" characters from input, avoid stored HTML and be careful where you're storing user-supplied data. Use of Perl's "taint" mode will also ensure data generated outside your program as tainted so it cannot accidentally be used as a file name or subprocess command.

"Don't trust data supplied by the browser" should be the foremost rule of thumb.

Your data is yours

You should be very conservative about data that's accepted as input, and even more conservative about data that is sent out. Make use of security features that are available in connecting to your database. Most databases can work with SSL or have other features to ensure that communication between an application server and a database server are encrypted. It's also a good idea to store data in an encrypted state should an attacker actually get so far as gaining access to your data store.

Legacy systems or applications may be constructed in such a way that a native encrypted connector is not possible. That's suboptimal, but not impossible to fix. Use a Secure Shell (SSH) tunnel between systems when SSL is not natively supported by the database connector.

Ensure that session data is encrypted. Any session exchanging personal data between your application and the user over a network should be encrypted, but also look to encrypting session state information when storing session data in a URL. The Crypt::* modules will provide the proper tools to do this and also look to the CGI::EncryptForm module.

Performance Improvements


Performance improvements

The first thing some programmers do is commit premature optimization. That is, worrying about getting the tightest code but failing to optimize the way the database is used. Many times, the performance bottleneck is the database-so figure out how to optimize its performance before worrying whether you've got the very best algorithms.

Naturally, you're going to have a beefy database server but put it to good use. Don't hit the database server unless it's absolutely necessary. Cache results so that you're not making multiple (and unnecessary) calls for data you've already fetched once.

But you need to get the data, so how can you avoid it? A couple of ways: You can use an intermediate local data store to cache data between the main RDBMS and your application, such as memcached or Berkeley DB. Another tip is to avoid sprinkling unnecessary SQL queries throughout your code. Use object-relational mapping (ORM) to convert data between incompatible type systems. Perl's DBIx::Class module can speak with all kinds of traditional RDBMS to handle just about any type of work you're doing.

Summary

No matter what RDBMS you're using, it goes well with Perl. These aren't comprehensive guidelines but a good starting point to ensure that your application is going to be successful. Take the advice here and you'll be well on the way to a successful deployment or revision.

Jeff Hobbs is Director of Engineering at ActiveState, overseeing the development of all ActiveState products. He can be reached at jeffh@activestate.com.

Rocket Fuel