21-Oct-2013 | Like this? Dislike this? Let me know |
A tad over 20 years ago, I invented the DBI/DBD concept for perl. You can google for that if you feel so inclined. It wasn't called DBI/DBD initially and heaven knows the likes of really dedicated folks like Tim Bunce and Alligator Descartes carried the torch. The motivation at the time was to standardize the way that connections and SQL commands could be passed to DB engines, and the results consumed. Lack of standardization in SQL nonwithstanding, for medium-duty tasks, the concept worked -- and continues to work -- perfectly fine. But it always bothered me that SQL on the way in and a ResultSet on the way out was somewhat restrictive in terms of the shapes of data that could be manipulated through the interface.
In my recent design & development efforts, I have been refining an ecosystem for the manipulation of Map-based data. Bespoke objects like class Trade and class RateCurve and class UserProfile are great for doing bespoke things, but often when you want to just combine data -- not behavior -- and externalize it in some way (putting it on a screen, passing it in a message, writing it to a file, etc.), the so-called "map of maps" or MOM design pattern becomes easier to work with. MOM is a way to manage rich nested structures of data that contain other Maps and Lists, and a small set of well-understood types like String, Date, BigDecimal, Integer, Double, and byte[] as the catch-all for other content. 99.9% of all interesting data structures can be expressed in this way.
For the MOM ecosystem to be generally useful, there must be a set of reasonably high performance (faster than 1 million/sec) core capabilities that allow the consumer to generically manipulate the content no matter if it is a derivatives trade or a list of router configurations:
The core capabilities above provide the foundation for 2 broad classes of utilities:
Hello world examples for non-trivial interfaces are always a little difficult to create, but the example below should illuminate the purpose of MBI:
// For Postgres: MBClient client = new PGImpl(url, userID, userPassword); OR // For MongoDB: MBClient client = new MongoDBImpl(machine, port, otherArgs); // // From here down, we are vendor neutral. // Database a = client.getDatabase("mydb"); Domain d = a.getDomain("things"); { // This PQL is equivalent to: // select * from things where dat1 >= TO_DATE(now - 4days) and lname = 'moschetti' Map query = new HashMap(); List l2 = new ArrayList(); { Map m3 = new HashMap(); Map m2 = new HashMap(); m2.put("dat1", new java.util.Date(now - (4*DAYS))); m3.put("gte", m2); l2.add(m3); } { Map m3 = new HashMap(); Map m2 = new HashMap(); m2.put("lname", "moschetti"); m3.put("eq", m2); l2.add(m3); } query.put("and", l2); } // Clearly, variants of query() exist for projections, preferences, etc. Cursor c = d.query(query); while((item = c.next()) != null) { Map m = item.getData(); Date dt = (Date) m.get("createdOn"); }
Traditionally, it has been "easy" to save rich data in all sorts of persistors and drag it all out into the application layer to perform filtering. Easy -- but at times horribly slow/expensive. So as part of the MBI/MBD design, a basic framework for SQL rewrite is also offered. This enables a PQL statement to be converted by the MBD implementation into some amount of SQL that can be used to filter content at the database level before doing the final filtering in the application space.
As an example of this, MBD reference implementations have been created for MongoDB, Oracle, Postgres, and Cassandra. Those familiar with MongoDB will appreciate that the MBD implementation is relatively lightweight since MongoDB "speaks" rich MOM for basic i/o. Oracle and Postgres implementations use a more sophisticated arrangement of raw content plus "helper columns" in combination with SQL rewrite to achieve acceptable performance. The Cassandra implementation over CQL3 is fairly similar to that of Oracle and Postgres, but somewhat restricted due to the simpler feature set of CQL. An MBD implementation for Ehcache would be even more straightforward than MongoDB. Hybridized MBD implementations combining a cache and a to-disk persistor are also fairly straightforward. The logic for query/index optimization when using traditional RDBMS is the tough part and that has already been created.
Like this? Dislike this? Let me know