Software & Databases: You're All Missing The Point
SQL. NoSQL. Java. Rails. Oracle. MongoDB. C++. perl/python/ruby/scala/... Etc., etc., etc.
Do you really think choice of query syntax, ACID compliance, and language are
the reason why software design succeeds or fails? It's not.
Software engineering and then ultimately system flexibility and performance
suffers because we routinely and inappropriately allow databases and
database APIs to nudge our designs to
mix three things together that need to be kept very separate: data, key, and domain.
To begin, let's start with objects, not the database. If you start a design in
terms of tables (or key/value or document collection; this isn't a SQL v. noSQL
issue) then you're already going down the wrong path.
Let's make a simple class:
class MyClass {
private String name;
private int age;
public MyClass() { // things.... }
public MyClass(MyClass a) { // the copy constructor! }
// usual barrage of get/set, etc...
}
Now, before we do anything about databases and such, we should be looking
at this class and asking things like:
- Is this class going to observe value semantics?
- Is this class immutable?
- What are the space and performance considerations for this class?
- What are the logical and physical dependencies for this class (get ready for the next
rant on open source version management....)
- How can this class adapt and change over time?
If we wanted stricter adherence to the philosophy of design-by-contract
then our simple class
would have a subtle difference in its initial documentation:
class MyClass {
// various internal state
public MyClass() { // things.... }
public MyClass(MyClass a) { // the copy constructor! }
public String getName() { // }
public void setName(String a) { // }
}
Very often we too closely couple the internal state variables with what
we think we're going to set up as columns in an RDBMS and that's wrong ...
but I am getting ahead of myself.
All this has to be considered before we think about the database. And
as we flesh out MyClass a bit, we do so without concerning ourselves
about a key or a clustered index. There is no persistor. There is no
O-R framework. There is just this object.
So having defined a tidy MyClass, we can create instances of it,
put it in Lists, etc.
// perform operations on o
MyClass o = new MyClass();
o.setName("Bob");
// Perhaps if modeled as immutable, no setters; data at construciton only:
MyClass o2 = new MyClass("Bob", 26);
// perform operations on o2
// Put them in containers:
for(int i = 0; i < 8; i++) {
list.add(new MyClass());
set.add(new MyClass());
}
All this is great -- but rather academic. The real-life use case is once we create an object,
how do we find it again (or otherwise get at it) once we
are outside local scope of the method that constructed it?
Almost always the answer
is to construct a key that is associated with the object and place both in a domain for
lookup. A HashMap provides a convenient implementation of a domain:
Map m = new HashMap();
MyClass o = new MyClass();
m.put("K1", o);
At this point, object o can be found in domain m via key K1. Three separate components (key,
domain, object) operating in harmony. So far, so good.
The really important points of the exercise are:
- Key "K1", from an information architecture perspective, has nothing to do with
object o. K1 is a key in domain m to find object o; that's all. Under no
circumstances can object o rely on the fact that string K1 is being used to index it
in domain m.
- Key "K1" is unique only in the domain of the HashMap.
This is fairly straightforward implementation and just about all programmers are familiar
with HashMaps and key/value entries.
The Problem
Now let's see what typically happens when we switch the implementation of
domain from a HashMap
to a database. We will use RDBMS semantics here but the issue is not isolated to
RDBMS nor is it related to SQL:
Statement px = conn.createStatement() ;
ResultSet resultSet = px.executeQuery("select id, name, age from people");
while(resultSet.next()) {
MyClass o = mapResultSetToMyClass(resultSet);
// etc...
o.getName(); // OK
o.getKey(); // ?!? BAD! What is key doing in MyClass?
}
resultSet.close();
px.close();
The subtle issue here is that the resultset is carrying not only important data
for the object (name, age) but also the key (id). All too often, a
naive implementation of mapResultSetToMyClass() will "influence" the definition
of MyClass by trying to shove the key into it, i.e.:
MyClass mapResultSetToMyClass(ResultSet r) {
MyClass mc = new MyClass();
String key = r.getString(1); // 1 is position of id in result
mc.setKey(key);
String name = r.getString(2); // 2 is position of name in result
mc.setName(name);
//...
}
MyClass is no longer pure! Instead of the key being associated with the object and
managed by a domain, the key is now IN the object. How did this happen?
Sloppiness. Because the ResultSet record has the key to the record right there
with the data, we take the shortcut and make key a peer to the rest of the data.
It isn't. Why is this bad? Leaving aside the acadenics around key/data separation,
for starters:
MyClass o = mapResultSetToMyClass();
MyClass o2 = new MyClass(o);
o2.getKey(); // ...so now BOTH o and o2 have the same key?
Because the domain key is now intrinsic to the object, copy construction -- and essentially
any other write operation on the object -- puts the object into a state that is inconsistent
with the persistor. It's not a divide-by-zero exception, no, but it greatly
confuses the design because it is implied by the existence of the key in the object
that it represents exactly what is in the persistor. Within a few revs of the
software, the binding between key and data in the object becomes inseparable as
utils and other functions come to rely on the presence of the key to craft
operations to the persistor(s). The object looses it's identity as a heap-bound
thing and becomes a flashier version of ResultSet, which means in practice it
becomes little more than a flashier HashMap. Do you really want to go
through all the
trouble of managing and documenting an object when it is no more than a HashMap,
where the behaviors and types are driven not from the topside interface but
rather from the bottomside data supplier?
Depurifying the object by embedding a key also means you cannot create and
manipulate the object independently of the domain. Technically, you can
but what does this mean:
MyClass o2 = new MyClass();
o2.getKey(); // I guess the key has to be ... null?
// and later...
o2.setKey(someThing); // ? Huh?
The Solution
The path to goodness is actually pretty simple but requires diligence in design
and code factoring. When reading and writing a database, one has to keep the objects
and keys separate but associated -- just like any other domain:
Map m = new HashMap();
while(resultSet.next()) {
String key = resultSet.getString(1); // same as above
MyClass o = mapNonKeyPartsOfResultSetToMyClass(resultSet);
m.put(key, o);
}
Well that seems barely more complicated than the HashMap example! So why
don't we always do it? Because hibernate and other O-R frameworks,
all-too-helpful IDEs with code generators, plus an industry plague of convenience
instead of good engineering, drives developers to point tools directly at
the database and let the code be built up from there. When this happens,
inevitably, key and data are bundled together as peers in the to/from DB code.
Sometimes, though, we can't use a HashMap, which means we also lose the component
that is acting as the collection for the key space. Let's say we need to use
a List of data instead. No problem:
List list = new ArrayList();
while(resultSet.next()) {
String key = resultSet.getString(1); // same as above
MyClass o = mapNonKeyPartsOfResultSetToMyClass(resultSet);
// NO SHORTCUTS! Use a Pair class to associate key and data and
// store that in the list!
MyPair<String,MyClass> p = new MyPair<String,MyClass>(key, o):
list.add(p);
}
Seems like a lot of extra work, right? It isn't. Why do we want to do this?
How about some pro-forma, pre-save analytics:
// Get everything from the persistor:
while(resultSet.next()) {
String key = resultSet.getString(1); // same as above
MyClass o = mapNonKeyPartsOfResultSetToMyClass(resultSet);
MyPair<String,MyClass> p = new MyPair<String,MyClass>(key, o):
list.add(p);
}
// Now add in things from the GUI:
while(grab things from GUI as a Map or JSON or XML) {
MyClass o3 = convert things to a MyClass();
MyPair<String,MyClass> p = new MyPair<String,MyClass>(null, o): // ah HA!
list.add(p);
}
analyze_everything(list);
The analyze_everything function is interested only in performing operations on MyClass.
It will iterate over the list, plucking out the MyPair objects and then MyClass. It
leaves the key alone. Later, if we want to take action on a particular MyClass and
it has a persistor key, we can use it to update the persistor. If the key is null, we
can ask the persistor to vend a new key (or we can invent one) which will be associated
with MyClass and the whole thing written to the persistor.
There is a nice
Pair class
available at commons.apache.org.
Ultimately, this is just another example of the important of factoring and separation
of concerns. Objects need to be defined independently of the persistence and keying
frameworks that manage them.
Like this? Dislike this? Let me know
Site copyright © 2013-2025 Buzz Moschetti. All rights reserved