Multivalue Integrity
I recently saw an article on integrity in relational databases vs. multivalue databases.
I would agree with it except for two things:
I do believe that people will get attached to complex systems — but I don’t believe that the majority of people will be completely unwilling to apply other systems in new situations. The article makes it sound as if most readers will fall on one side or the other of the “MV vs. relational database” line and adhere to it. I believe that people have a more open mind than that, but that’s just me, and that’s where I disagree.
The other point at which I disagree is the strength of the typing and rules applied to the databases. My point of view is this:
Data stored in any database model has rules attached to it only by the accessing program. Only the accessing program will choose whether to read rule metadata stored with the actual data.
A database holds data. That data is either a number, some text, or both. The database can be in the form of a relational database with its atomic values, a paginated book full of written tables that have both text and numbers, or a multivalued database that holds just basic types. The same type of data, however, can be stored in all of these. Data in a database is *transformable*. It can be stored in one type of way or another, but it is still the same data.
The point of all that: a piece of data (alphanumeric/numeric or int/char/text/float…) must be accessed somehow. In the example of the book, if the book is closed, we can’t get to the data, but the data is still there. The data means nothing until we open the book (access the data) and start reading it (and interpreting it after we read it). The whole of a type of data in the table can be gathered by scanning the data — we see a bunch of small numbers in the book, we know they’re integers. We see decimaled numbers, we know they’re float. We see two letters in one field, it’s automatically some type of text that is supposed to be readable, or which has some implied meaning. If we see instructions above a table in a book that say that we need to change a column or a column’s interpreted amount each time something specific happens, we will decide to apply that transformation to keep data integrity, or we will decide not to apply the transformation for any number of reasons.
The accessing of the data, the method used to access that data, the way the accessed data is then read — these are the things that ultimately determine type. Data will lend itself to one type or another, even without our designation. We can store data in a relational database as text, but read it back as numbers instead, since text is ultimately stored as numbers in a computer. Although we can store data in one model or another, the accessing method which returns the data to us can completely mask data types.
So…we now have data stored in one form or another, we know that the data can have rules attached to it no matter what form the data is in, and we know that the data will lend itself to a particular type whether the database stores strong typing information or not.
There is nothing that prevents us from adding information to either a multivalue dictionary or data file and then applying referential integrity rules to data values in the file based upon that information. It is the accessor programs/methods that determine the application of the rules. Where my thought processes differ from the author is that I believe that multivalued databases are completely capable of self-maintenance and integrity.
July 24th, 2006 at 11:12 pm
Some commercial relational databases (Sybase, Oracle) allow you to associate SQL code in the database as stored procedures and triggers. Triggers “fire” on insert, update, or delete of data and can be used to enforce data integrity rules. “Newer” iterations of ANSI SQL include standard SQL syntax for specify data integrity rules (i.e., primary keys, foreign key references). Triggers can enforce non-relational-ish rules, like data formats, data ranges, or data integrity rules which are more complex than foreign key references (i.e., involving joins or aggregation).
Whether it’s better to enforce the data integrity rules in the database engine or in the client software is a debatable point and depends, in part, on the application.
In my experience, people can get hung up on the solutions they know - you know the old adage, “If the only tool you have is a hammer, everything looks like a nail”. I think we’re all like that. I mean, the ideas we come up with are the end result of what we know and how we combine what we know to come up with something new and different. The trap to avoid is getting smug and assuming there’s nothing else important to learn, being to arrogant to consider an idea from a co-worker, or to lazy to be curious.
July 25th, 2006 at 8:45 am
That Marc guy is pretty smart!