Rethinking the DBMS

2009/03/02

Not another article about the doomed relational model

Filed under: Uncategorized — Tags: , — Ben Samuel @ 10:18

This one is pretty bad. The author, Mr. Bain, is, according to the bio, the founder of “a company that makes investments in early stage software development.” Well, his article is hype about some new technologies and hype is a necessary reality of life, but if he represents investors he ought to give them better advice than he’s giving in this article.
And from the first line, it doesn’t make sense:

Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, “if you want vast, on-demand scalability, you need a non-relational database”.


After that complete non sequitur:

During this time, several so-called revolutions flared up briefly, all of which were supposed to spell the end of the relational database. All of those revolutions fizzled out, of course, and none even made a dent in the dominance of relational databases.


If he’s just saying that all the silly articles predicting that the relational model was doomed were completely wrong, well, that seems vaguely familiar. But if he’s saying that non-relational technologies were attempted and then abandoned, I’d have to take issue with that. OODBMSs should count as one of the “so-called revolutions” and they’re still around and are, in fact, incorporated in many mainstream SQL DBMSs. Most of the other trendy technologies I can think of, such as XML or some of the data warehousing tech, were never really considered direct competitors to the relational model.
The lesson here is that the marketplace of human ingenuity is always bigger than your imagination and there’s plenty of space for all these various technologies. And, to be clear, I’m not pooh-poohing these technologies, rather, I’d like to follow them more closely because I think they’re extremely interesting. My argument is that they aren’t going to shift relational DBMS technology because they don’t compete with it.
What exactly is it that the relational database dominates, and how does it dominate? The what of the answer isn’t way out there, and it surprises me that a business guy like Mr. Blain missed it.
Here’s the basic business reason for a SQL DBMS: your business has many processes that involve many (metaphorical) moving parts and you need to capture that logic, known as business logic, in a set of rules. You need your data to conform, at all times, to those rules.
This isn’t to say that there isn’t a whole lot of other data that can’t be stored in other ways or even must be stored in other ways, but here are a few of the things that businesses have to worry about:

  • Tracking employees.
  • Paying taxes.
  • Government regulations.
  • Tracking customers.
  • Tracking shipments.
  • Managing vendors.

This stuff goes back to before the Roman Empire, and, if anything, it’s only going to get more regulated and more complex, which means the need to track business logic, and thus the case for the relational DBMS, will only grow.
The second half of the answer, the how, is that the relational model is well defined. DBMSs using SQL (which is pretty much all of them) don’t use the relational model but what is often referred to as the SQL model. The differences between the relational model and the SQL model are a little arcane (and I’ll probably touch on them in this blog) but they are significant enough that it’s simply wrong to conflate the two.
Many of the competitors to the relational model, such as the old Pick systems (still around, incidentally) used a model that was not well defined. So what’s wrong with loosely defined systems? There are a few, but I’d argue the complete and total lack of interoperability is the problem. At first blush XML refutes this, but really, it proves it.
What did we have pre-XML? A complete mess of binary formats, and the n-squared problem of writing translators for all of them. Did you ever try these translators? Even simply opening a Word document in a later version tends to destroy the underlying data.
Along comes the web and the successful tag structure of HTML is generalized to arbitrary data. LISP advocates complain that it’s just symbolic expressions with angle brackets.
So now we have a post-XML world. DOC has become DOCX and what’s changed? Well, loading and saving documents is somewhat more robust because some of the logic has been offloaded to reusable libraries. And now you can use XSLT to transform documents.
But the original problem remains: you have a whole mess of incompatible file formats and n-squared translators that don’t actually work.
The various key-value stores are very new, and it remains to be seen how well they will work together, but I haven’t seen any evidence that they even attempt to address the issue.
(Updates: 4 Mar: formatting.)

5 Comments »

  1. I am afraid I must disagree with you on:

    “Tracking employees.
    Paying taxes.
    Government regulations.
    Tracking customers.
    Tracking shipments.
    Managing vendors.
    This stuff goes back to before the Roman Empire, and, if anything, it’s only going to get more regulated and more complex, which means the need to track business logic, and thus the case for the relational DBMS, will only grow.”

    There are 250 million people in the IT business – way too many – the significant reason for this is that the Relational Data Base Systems and the complexity they introduce. As the complexity grows (as you suggest) more and more hard work will have to happen in these systems. The problem is that RDBMS are hopeless at generalizing and expanding. The structure is solidified upfront and as new generalisations need to be made people avoid reengineering the existing DB structure as this would disturb the existing code that operates on this (too much risk especially for risk averse corporate CIO) so what develops consequently is a DB that consists of large number of adhoc additions – i.e. spaghetti, this table for this that for that. The problem is that only trivial relations are really reflected by so called Relational dbms and other more complex relations are embedded in code that is always proprietary and unique. In fact, you complain about free formats that are arbitrary and incompatible. If you look at the systems using RDBMS the db schemas are almost always different, no compatibility – so the same problem exists – we do a lot of conversions from RDBMS systems to our system and we see total lack of standards in DB structure. So, oddly we would argue against your point : RDBMS is too free and still not flexible enough. Also I do not suggest that the contenders you mention do not suffer from various problems. Finally, I believe that if Romans used RDBMS they would not manage to venture out of Rome – they would still be working generalizing their DB schema :-) In fact, they used their brains that are entirely semantically based, something quite different than RDBMS.

    Due to these severe (actually) difficiences in the RDBMS systems there will be systems that are going to supperceed RDBMS. My belief is that totally semantically based systems are the ones that will ultimately destroy RDBMS dominance. We at thoughtexpress have developed and operate such system to run enterprises. The advantage over RDBMS system that we and our customers experience is significant.

    Comment by pawel lubczonok — 2009/03/04 @ 01:24

    • Thanks for the comment. One point up front: let’s not confuse SQL DBMSs and RDBMSs. The SQL model is the result of a compromise years ago and I’m not calling this blog “rethinking databases” to advocate maintaining that compromise. So there are certainly issues with SQL systems, but I disagree with you about exactly what they are.

      You’re saying that SQL DBMSs cause complexity, but I think you’re falling for a common fallacy. It’s similar to politicians who claim they can “trim the fat” from the government and save vast amounts of money, or to businesses that promise to “cut out the middleman.” Or, closer to home, just look at every other area of IT and how users complain that software is “too bloated,” but then when a developer drops a single feature they howl in protest.

      They howl because they really need all the features to cope with the job at hand. That’s because life in a modern, post-industrial society really is complex. And while it’s tempting to try to shortcut that complexity, often times you simply move it elsewhere, or push it down the road. The fallacy is in believing that the complexity isn’t real, basically, it’s a fallacy of wishful thinking.

      This isn’t to say that there’s never any fat to trim, just that often times once you get done trimming it you find that the savings aren’t nearly what you expected.

      I don’t know what you mean by saying “the structure is solidified upfront.” This simply isn’t true in any SQL DBMS as you can change the schema at any time. If existing applications depend on legacy schema, you can use views and triggers to replicate the old tables. Now, it is true that SQL systems often aren’t flexible enough in allowing you to represent those logical changes within the schema and developers often put logic in the application to bridge the gap. I think that’s an area where the SQL standard is deficient and where future technology could improve.

      I understand that it’s painful to work with spaghetti tables, but let’s be realistic here: a real life operation has a lot of moving parts. You have to represent absolutely all of them or your system simply won’t do what it’s supposed to. When the operation changes, you now have to represent the old operation and the new one simultaneously until you can upgrade every client. That complexity is, again, inherent. What we need are bigger, better tools to manage that complexity, not to pretend it’s not there.

      As to the lack of standards in database structure, let me expand on what I was saying about file formats. Look at a piece of paper in front of you. If you’re in the US, it’s probably 8 1/2″ by 11″. The writing on it probably has 26 letters, 10 digits, the writing is laid out just so. And you can look at 10 million pieces of paper, and they’re all pretty much the same thing.

      So why is it that for 20 years people have been writing word processors and no one can agree on how to lay out text on a page? It’s ridiculous: they’re all exactly the same problem. But I’m looking for a job and I have 8 versions of the same resume, a one page document with minimal formatting. Even so, every time I try to upload it to a website it comes out looking like garbage.

      The reason I think SQL DBMSs are more compatible is that they pretty much agree on the data types. They pretty much agree on how a query should work. So if I want to send an ad hoc query to any SQL DBMS, it’s not a trivial undertaking, but it’s not absolutely hopeless. And in terms of maintenance, a person familiar with ANSI SQL could read code for any commercial SQL DBMS and have a rough grasp of what it was doing.

      Thanks again for the comment, and please spread the word.

      Comment by Ben Samuel — 2009/03/06 @ 23:07

  2. 1. COMPLEXITY

    Your response on complexity basically states that the complexity comes from the problem being solved itself and that:

    a) The problem complexity is not reducible
    b) The tools used are not adding to the complexity.

    I must strongly disagree. If we were to program everything in assembler rather than higher level more abstract structures the complexity would be significantly higher. I am sure you would agree. So, I believe that current DBMSes etc, etc, we are stuck in a particular paradigm that, because so many people depend for their living and so many large corporates (MS,Oracles etc) rely on to persist in order for their business models to be maintained. It is about time to move beyond the simplistic paradigm of the way we create systems. There are more abstract better ways of doing this stuff.

    Also, it is also true that if the tools become more advanced there may be a way of reducing the complexity of the original problem itself. This happens all the time. For example, with communications. First there was the telephone, than tape recorder, television etc. they all have different protocols and initially perceived as different things. Once, we got our CPU and internet/networks spread across the world – our tool has become much more abstract and powerful and we have finally realized that all these things such as voice, video etc is just information that can be digitized and send across the networks etc. So, we have new tool that enables us to think of all this stuff in a uniform way and much simplified. Complexity has been encapsulated. The jump from dbms to next level of stuff is more difficult as it requires realisations about thinking itself and nature of knowledge etc.

    2. Regarding solidification.

    The RDBMS is solidified because it needs intervention of developer to change its structure. In the brain the organization happens on the fly as and when needed.

    Also, just consider some problems that people noted recently. Vast real time db – need to add a field in real time without stopping operations. What happens then?

    You explain word processing packages lack of standards. It however is possible. Many complex fields have done so. Mathematics and its language is extremely standardized.

    Agreement on data types is just the lowest level of agreement possible. We must also agree on structuring higher order stuff. The problem is that the structure of RDBMS is such that it is not very good for expressing more complex structures.

    “Agree how query should work”? The paradigm of what query can return is also simplistic. What if you asked a question like this? What will be the profit margin in 2 years time under specified assumptions for a large business? Unless it is already stored in the db normal query can not answer such question as you would have to write programs that do the necessary calculations to get into 1 year time in future. So, even on a simplistic level two difficiencies:

    1. Lack of temporality
    2. Can only query if stored in db.

    When people had the tape recorder they did not imagine a different way of storing sound – read with lazer and digital !!! I think it is time to break out of the thinking that RDBMS is the thing for ever and nothing better will be found.

    Good things come from opposing views :-)

    Comment by PAWEL LUBCZONOK — 2009/03/07 @ 09:35

    • I’ll respond only briefly, as the point-counterpoint format becomes hard to follow, and some of your points deserve a more complete response.

      Regarding complexity, you can automate many things, but they’re still there. C helps manage the call stack and manipulating registers. But if you’ve ever written a shared library, you run into the minutiae of worrying about linking, how memory is laid out, all things that higher languages supposedly free you from. The way higher order languages work is they optimize the task of writing by making the common case fast and they offer a toolset to deal with more complexity, but the task is still complex.

      And you’re right that in certain situations we can eliminate some complexity, but it generally requires massive action. For example, the road system we’re familiar with today used to be a mess of different private authorities, confusing signage and no standards. The government came in and set out standards and it’s now far simpler to decide how a road is laid out and you can drive from one state to another without worrying about learning the local customs.

      But there was no new paradigm! Roads are still roads, signs are still signs, the building techniques are improved, but engineers still follow the same fundamental principles. All we did was, at massive taxpayer expense, force people to adopt a single set of standards.

      Regarding solidification again: You’ll have to explain what structures the relational model can’t express. Keep in mind that SQL puts a lot of arbitrary constraints in place. There’s no reason why, for example, the relational model can’t handle lambda expressions, parameterized views, what-if contexts, etc, and I’ll show in future articles how those can be implemented.

      “What will be the profit margin in 2 years time under specified assumptions for a large business?”

      A proper relational DBMS should allow what-if contexts directly in ad hoc queries, I agree. And most SQL systems don’t have a rich set of mathematical functions to do projections. But this isn’t hard to do in existing SQL DBMSs; in most you can set up a different domain and create views that link back to the actual data.

      Comment by Ben Samuel — 2009/03/07 @ 17:41

  3. Ben,

    Simple example of something that would not work is the following:

    1. STORE in TABLE person Date of Birth.
    2. Ask SQL to give you AGE in 4.5 years time.

    or

    1. Have an insurance co running on the RDBMS.
    2. Ask SQL to give you value of unit accounts if mortality charges increased by 5% from 4 years ago and you want to know what is the effect of these unit accounts in 2 years time.

    In terms of removing complexity think of how changing the way one thinks about sound, video etc. from analog to digital removed complexity and made it all just information.

    There are branches of mathematics where initially difficult problems became simple by thinking differently about them or immersing them in different theory.

    Waiting for longer response :-)

    Comment by PAWEL LUBCZONOK — 2009/03/09 @ 12:26


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.