Thanks to the marriage of open source and Java in Sleepycats new pure-Java version of Berkeley DB, many may reconsider whether they need to use a relational database where simple data structures will suffice, according to Alex Karasulu, technical lead for the Apache Directory Project. The project is an attempt to centralize naming and directory needs at the Apache Software Foundation.
Sleepycat Software Inc.s recent release of a pure-Java version of its Berkeley DB open-source database is a win for Java developers such as Karasulu, whos thrilled with the open-sourcing of this general-purpose, typically embedded software.
As Sleepycat has noted, Berkeley DB Java Edition is the companys response to customers who were looking for portability and development ease.
In the case of the Apache Directory Project, Karasulu says the project members are so convinced that Sleepycats pure-Java database will become the “de facto standard Java API for manipulating B-tree databases” that theyre basing the projects pure-Java LDAP directory server, called Eve, on back-ends built using Berkeley DB Java Edition.
eWEEK.com Associate Editor Lisa Vaas conducted an e-mail interview with Karasulu to find out how hes going to leverage Sleepycats marriage of open source with Java.
Whos building applications in Java? Does it span the spectrum?
I think so. People can use a B-tree anywhere a fast, relatively constant time lookup is needed, regardless of the size of the data. This happens all over the place.
Were actually in the process of building an embeddable, pure-Java LDAP server called Eve. Shes a beaut, introducing triggers and stored procedures to the world of LDAP.
We intend to have it [match or surpass the functionality of] Microsofts ADAM [Active Directory Application Mode] server.
Plus, we expect Eve to be able to integrate into Geronimo [the J2EE server project of the Apache Software Foundation] and other J2EE application servers.
LDAP and other directory servers have a soft spot for APIs like JE. These servers are protocol servers but need their own specialized back-end databases for managing entries in hierarchies within some namespace.
For which database applications is Java particularly well-suited?
JE is particularly well-suited for use in embedded environments where you need to store data and look it up fast without a heavyweight SQL database engine to consume resources you dont have.
Namely, this is great for handheld devices, PDAs and other gizmos. With the craze of having JVMs [Java virtual machines] on most of these devises, JE solves a serious need.
What makes you convinced that Berkeley DB Java Edition will become the de facto standard Java API?
A B-tree is a complex data structure to implement, but the exposed API is small and simple for such a complex data structure to implement. When the API and implementation are done right, with a good balance between design and speed, then there is very little incentive to rewrite this piece of general-purpose software.
People will opt to use rather than rewrite, since the difficulty in rewriting far outweighs any advances you could possibly make. Furthermore, when made open as in this case, there is even less incentive to reimplement the code.
If changes can be made to the open code and patches submitted, programmers are satisfied and less likely to rewrite JE in its entirety when they can patch it. Sleepycat has covered these bases well.
What do you think of Berkeley DB Java Editions performance capabilities?
It uses NIO [new input/output], which should have a considerable effect on performance due to the way memory is accessed. The NIO packages are new APIs that allow Java to have the power of C where IO is concerned. These APIs are new in the 1.4 JDK and make Java-based servers much more effective. Furthermore, a binding API makes mapping objects to records very intuitive while avoiding the overheads in serialization.
Other implementations often leave this up to users, who often use serialization and wind up paying for it with a massive performance hit.
For more on binding APIs, check out Microsofts Microsoft Developer Network site.
What were you using to manipulate B-tree databases before this pure-Java play?
Ive hand-rolled a couple of databases I had in an RDBMS [relational DBMS]. Some were to just play with; others I knew the schema would not change and so decided that they could be optimized using nonrelational B-tree-based storage.
Some databases, like those used for LDAP servers, are unique. LDAPd, which I wrote years ago, used Berkeley DB C edition and its JNI [Java Native Interface].
It was way slower than using the C version directly due to copy overheads in JNI. So, I kind of gave up on it. Then, when JE appeared, I started planning an overhaul using JE instead. Im still at the planning phase and intend to retrofit the Eve Directory Server [LDAPd 2] with JE.
In this case, the speed advantage, ironically, comes from not going through JNI.
Berkeley DBs Rivals
Whats the difference between what Sleepycat has done with Berkeley DB and what relational database vendors such as IBM have done with Java in DB2, for example?
Were talking about a completely different category of database here. JE and Berkeley DB provide the building blocks of relational databases. Of course, there is more to JE, which makes it easier to use while performing common operations such as joins. The API is much more than a simple B-tree data structure in this way.
However, fundamentally at its core, both these APIs are B-tree implementations. Simple databases that do not need the robust features of a relational database system can be hand-rolled using JE or Berkeley DB. This makes their footprint significantly smaller and removes the need for a client-server tier.
RDBMSes like DB2 and Oracle [Corp.s databases] are intended more for enterprise-scale applications. Plus, they were designed for use out of the box. You can install Oracle, pump data into her, and slice and dice with SQL.
The SQL machinery rather than an API is used to intuitively interface with the database and ask questions. These databases possess their own client-server communication mechanisms, like Oracles SQL*Net. JE does not.
JE is nonrelational with only support for two column tables, while DB2 and Oracle store data within tables. In JE, you must build tables by hand—using multiple B-trees. When databases are simple in nature, yet large in size, JE is an excellent API to use for hand-rolling a database.
Basically, when all of the searches are handled using canned queries, you might want to think about hand-rolling your database to make it fast or embed it within another application.
How does Berkeley DB work with those relational databases? How will a pure-Java version work better with them?
Berkeley DB can be used to work with relational databases, sure. It can also be used to build one. I think MySQL as one option uses the Berkeley DB C edition as the underlying record-backing store. One could build a pure-Java relational database using JE pretty rapidly, I think. Its probably just a matter of time now.
Sleepycats Impact on Java
Will this pure-Java, open-source database herald the start of more Java development?
Sure, this brings Berkeley DB to the hands of Java developers. It probably will make people reconsider the use of a relational database where simple data structures suffice. Im sure many projects have gotten unnecessarily complex by using an RDBMS when they could have gotten away with using Berkeley DB JE.
Tell me some of the benefits of switching to a pure-Java database from a relational database. How do setup, data import speed, performance and disk storage compare?
More than going Java, its hand-rolling a database rather than using a relational one that makes the difference. Java is just a language, but its better for those writing Java applications to use rather than the JNI interfaces to C-based APIs.
I see going the JE route, if you can, [to be] advantageous, in that it reduces the need for an extra tier while keeping access to large sets of data fast. So, if you have a small database design with just a few tables, with only canned queries, it might be best to go JE.
A pure-Java version of JE is better to use over the C-based version with the JNI interface because of performance hits incurred while copying data between Java and C.
1) Setup: Its easier to set up JE than an RDBMS but harder to build the database, so there is a clear tradeoff here.
2) Data import speed: This is totally dependent on your database design and the nature of your data but should be similar across JE and relational databases.
3) Performance: JE will be much faster without an extra SQL compiler and interpret as well as an optimizer. Basically, an RDBMS analyzes every SQL to optimize it and compiles it into another format that is faster for conducting a search. Then, it actually conducts the search.
This is removed with a JE application as an overhead, considerably speeding up an application. Further, more databases are usually accessed using client-server, so there is network latency, etc., which robs it of performance.
4) Storage: An RDBMS is far better suited toward storing data and storing it more cheaply—efficiently—than something like JE. Databases have another entire layer to handle storage-related issues, whereas JE is very primitive using a log file.
Editors Note: In the initial version of this story, Alex Karasulu incorrectly mentioned what Berkeley DB Java Edition supports. It in fact supports only J2SE and J2EE.