MongoDB 3.4 and Money: Hello BigDecimal!

3-Jan-2017

This rant was published three years ago, 30-Dec-2013. At the time, support for an arbitrary precision decimal type was only on the roadmap. Last month, with the release of MongoDB 3.4, decimal128 (a.k.a. BigDecimal) was formally introduced into the type system of MongoDB with support in all drivers and utilities like mongoimport.

The concepts introduced here, however, are as appropriate now as they were back then and now even easier to implement thanks to decimal128 support.

Contemporary financial systems data designs need more than just the amount of money involved; they need the currency code (see Let's Talk About Money). Money (and price) without a currency code is not money; it is just a number. This tight pairing is actually more easily managed in MongoDB than in traditional RDBMS because the amount and currency code -- and indeed, potentially more day 2 attributes -- can be well-managed together in a subdocument.

The Traditional Way in RDBMS

Consider a data design that requires us to capture transaction amount, tax, and rebate amount. Let's start with BigDecimals in an RDBMS:

    create table Trade (
        amount   numeric,
        tax      numeric,
        rebate   numeric
    );

So right off the bat, we have a problem: there are only "magnitude" columns with no associated currency codes. And we don't want to simply assume that everything is in USD (this is 2017). So perhaps we try this:

    create table Trade (
        amount   numeric,
        tax      numeric,
        rebate   numeric,
        ccode    text
    );

This is a step in the right direction but there is a serious consequence: all numerics "localized" in this table are assumed to have the same currency code. This may not always be the case! Even if it is 95% of the time, the hacked-up logic required to deal with the 5% tail will increase the brittleness of the otherwise adequate 95%. In addition, as the schema grows over time, the relationship between the ccode column and the magnitude columns becomes less clear. Other than direct examination of the way the column values are actually processed in the data access layer, there are no structural items or even hints that these fields are closely related.

The next level of completeness would be to provide a ccode for each numeric:

    create table Trade (
        amount   numeric,
        amount_ccode    text,
        tax      numeric,
        tax_ccode    text,
        rebate   numeric,
        rebate_ccode    text
    );

The data access layer now has everything it needs to make a Money objects for amount, tax, and rebate. But let's examine some subtle issues:

Suppose, from a business standpoint, rebate is permitted to be unset. In this physical implementation, which field should be the indicator of being set, rebate or rebate_ccode? Let's assume the numeric will be the indicator and we will take advantage of getBigDecimal() in the JDBC API to return null or an allocated object instance.

* Side note: The other JDBC calls that return native types (e.g. getDouble(), getInt(), getLong(), etc. will return 0 if the column is NULL. No kidding. To accurately determine if the column was NULL, you need to call ResultSet.wasNull() after one of the getter methods.
If we have an existing rebate and we wish to make it unset, do we null BOTH the numeric and the text currency code?
What does it mean if one side of the pair is set (or unset) without the other?
How strongly followed will the convention be for the naming of the pair so that given a request for money field foo, both foo and foo_ccode will be fetched? How do we know that these are in fact a pair?

The last point above becomes more important as the table grows and the field names do not necessarily "line up" and make it clear that they are in fact a pair. Arguably, the more robust way to name the fields is as follows:

    create table Trade (
        amount_val   numeric,
        amount_ccode    text,
        tax_val      numeric,
        tax_ccode    text,
        rebate_val   numeric,
        rebate_ccode    text
    );

We finally have data structures (peered magnitude and currency code) and naming conventions (_val and _ccode) that provide a robust, longer-lived solution than the first attempt -- and it is somewhat more involved than simply "adding a BigDecimal column." Below is a representative example of the code that would be used to produce the Money object in a data access layer.

    // Let's not dwell on how to construct the query.  Assume:
    // 1.  We are looking for one and only one trade:
    // 2.  We want to have some semblance of logic to drive pairing
    //     based on "basename" of the money field, e.g. 'tax'.
    //     In other words, we don't want 'tax_val' and 'tax_ccode' field names
    //     as hardcoded strings floating around.
    //

    String[] mflds = new String[] { "amount", "tax", "rebate" };



    //  Build a dynamic SQL string along the lines of:
    //      select amount_val, amount_ccode, tax_val, tax_ccode, 
    //          rebate_val, rebate_ccode from Trade where key = 'AA123'
    //

    StringBuffer sb = new StringBuffer();
    sb.append("select ");
    for(int i = 0; i < mflds.length; i++) {
        if(i > 0) { sb.append(","); } 		       
        sb.append(mfld);
        sb.append("_val");
        sb.append(",");
        sb.append(mfld);
        sb.append("_ccode");
    }
    sb.append(" from Trade where key = '");
    sb.append(bizKey);
    sb.append("'");

    ResultSet rs = stmt.executeQuery(sb.toString());
    if(rs.next()) {
        // We are using getBigDecimal() to indicate null or not:
        for(String mfld : mflds) {
            BigDecimal v = rs.getBigDecimal(mfld + "_val");
            if(v != null) {
                String ccode = rs.getString(mfld + "_ccode");
                Money m = new Money(v, ccode);
                someMap.put(mfld, m); // e.g. "amount" -> Money()
            }
        }
    }

    return someMap;

The MongoDB 3.4 Way

In MongoDB, we would take advantage of rich shape modeling to well-capture the relationship between val and ccode. There is no explicit schema (the data itself is the schema) so we will use an insert() function from the command line interface to demonstrate:

    db.trade.insert( {
        "amount": {
           "val": NumberDecimal("77.09"),
           "ccode": "USD"
         },
        "tax": {
           "val": NumberDecimal(".87"),
           "ccode": "USD"
         },
        "rebate": {
           "val": NumberDecimal("2000"),
           "ccode": "JPY"
         }
    });

Note the constructor for the new NumberDecimal type. We use string arguments specifically to avoid problems with literal numbers being captured as 32 bit integers or floats. It is important to recognize two use cases here that are no different than those experienced with a traditional RDBMS:

Within currency, all numeric comparisons and sorting work as expected.
Cross-currency operations are not possible without a currency conversion table and such operations are outside the domain of the persistor anyway.

Here is a representative example of the code that would be used to produce the Money object in a data access layer. Note that the logic is simplified because we can treat amount, tax, rebate as "objects":

    String[] mflds = new String[] { "amount", "tax", "rebate" };
	    
    // Construction of predicate and projection is simpler because
    // of the map-of-map structural approach instead of creating a SQL
    // string with required whitespace, commas, quoted strings, etc.
    DBObject predicate = new BasicDBObject("key", key);
    DBObject projection = new BasicDBObject();
    for(String mfld : mflds) {
       projection.put(mfld, 1); // subdoc will include val and ccode children
    }

    DBObject v = coll.find_one(predicate, projection);

    if(v != null) {
        for(String mfld : mflds) {
            Map m2;
            if((m2 = v.get(mfld)) != null) {
                BigDecimal a = ((org.bson.types.Decimal128)m2.get("val")).bigDecimalValue();
                String s = m2.get("ccode");
                Money m = new Money(a, s);
                someMap.put(mfld, m);   // e.g. "amount" -> Money()
            }
        }
    }
    return someMap;

The object-oriented approach also makes it much more straightforward to both interrogate for money "setness" and to make money "unset" in MongoDB:

    db.trade.find( { "rebate": {"$exists": 1 }});

    db.trade.update( { "bizKey": "A2"}, {"$unset": {"rebate": ""} } );

It is particularly useful that unsetting a field with a subdocument removes all the fields at once. In traditional RDBMS design, data can become logically "broken" when one or more of the peered fields is removed but one or more are not.

Day 2 Advantage of MongoDB

In certain scenarios, it may be paramount to efficiently perform numeric comparisons or sorting cross-currency within the database. An option is to use a floating point approximation of the value pre-converted to USD dollars and cents. We cannot outright substitute the precise version in BigDecimal with the floating point approximation; we must make it a peer. Both implementations require some safe practices around CRUD (i.e. if the precise value changes, so must the approximation). In MongoDB, the obvious way to implement this is to make the approximation a peer within the subdocument:

    db.trade.insert( {
        "amount": {
           "val": NumberDecimal("77.09"),
           "ccode": "USD",
           "approxUSD": 77.09
         },
        "tax": {
           "val": NumberDecimal("87"),
           "ccode": "USD",
           "approxUSD": .87
         },
        "rebate": {
           "val": NumberDecimal("2000"),
           "ccode": "JPY",
           "approxUSD": 18.97
         }
    });

The RDBMS implementation, however, is becoming less straightforward because the intrinsic bundling of the components of money are being flattened out, heavily relying on the naming convention alone to infer structure:

    create table Trade (
        amount_val   numeric,
        amount_ccode    text,
        amount_approxUSD    numeric,
        tax_val      numeric,
        tax_ccode    text,
        tax_approxUSD    numeric,
        rebate_val   numeric,
        rebate_ccode    text,
        rebate_approxUSD    numeric
    );

This means a non-trivial ALTER TABLE activity to add the new peer field _approxUSD everywhere you use money. But even more significant (and daunting) is that any code that might require _approxUSD must have the query changed to fetch _approxUSD, resulting in a non-trivial effort to modify all the SELECT statements to fetch the new peer field.
In comparison, in MongoDB the existing find() calls do not need to change. The new peer fields, if they exist, will be automatically fetched as part of the "object."

The naming convention must also "reserve" the suffixes so that they are not used by other columns to capture non-money data. Failure to do so results in confusion and possible misinterpretation of the new field as money -- by itself, a "broken" money field (having no _ccode and _approxUSD peers).

It therefore makes sense to create as explicit and informative a suffix as possible, ideally using special characters. The tradeoff is that typically fields so named must be wrapped with quotes. This is not a "problem" per se but it does complicate SQL statement construction because typical simple string substitution of column names may break (e.g. need to escape the extra quotes) and certain tools may react negatively to both the special characters and the quotes:

    create table Trade (
        "amount:val"   numeric,
        "amount:ccode"    text,
        "amount:approxUSD"    numeric,
        "someother_val"   numeric,
        "tax:val"      numeric,
        "tax:ccode"    text,
        "tax:approxUSD"    numeric,
        "rebate:val"   numeric,
        "rebate:ccode"    text,
        "rebate:approxUSD"    numeric
    );

It should be clear that when consistent and proper future-proofing techniques are applied to modeling money in a persistor, MongoDB is both easier and more robust than traditional RDBMS.

Like this? Dislike this? Let me know