Good Identity Architecture

24-Sep-2013

Practically every website these days uses email address as a login name. This is a very clever and useful construct for many reasons including:

In one shot, the site captures both a string identifier for an actor PLUS an email address at which to contact that actor. This immediately enables two-step sign-up (i.e. emailing a message to the effect of "You have asked to sign up at site X; please click the link below if you really wish to do so or delete this email.") and also provides an off-site communications channel to the user for important news.
With plain ol' names, a user can experience irritation that their favorite login name has already been taken and this irritation is directed at the site. With email addresses, however, the user knows a priori that they are the only person who would use that login, especially if the site has a two-step setup based on material sent to the email address.
The knowledge and confidence in email address uniqueness makes it very easy for a user to use the same login name on many sites. Gone are the days of trying buzz1, buzz2, buzz123, ...

But...

email addresses and logins are things that a user experiences at the interface to the system, not the internal design. I recently came across a design where the object to capture User exactly mirrored the database table User (already a dead end...) and was constructed as follows. I am showing the private fields only but the get/set methods did nothing except get/set these fields so in this context they are only a distraction:

    class User {
        private String emailAddr;
        private String cryptPW;
        private String prevCryptPW;
        private int wantsEmailNotify;
        private Date createDate;
        private String whoAmI;
        private String avatarKey;
        ...
    }

I give the developer +1 for at least encrypting the password (or so the field claims) but that's about it. The single User entity above is heavily overloaded and does not provide the necessary separation of concerns. An email address is not a suitable internal identifier for identity and it is not the basis for an identity architecture.
Systems must strongly define and implement the following five concepts:

Digital Identity
Identity Attributes
Credentials
Identity-based Application Attributes
Authentication

Before we show the reengineered version of the above, let's explore the concepts in greater detail.

Digital Identity
Digital Identity (DID) is an abstract identifier for an actor. which is a person or a system process (a.k.a. a functional ID or system ID).
- It is an immutable, non-transferable key that can be used anywhere and everywhere in the system.
- As a piece of internal information architecture, it should not (and needs not) be known and/or managed by the actor.
- DIDs are not secret.
- A healthy DID design contains a reference to another DID that authorized the creation of this DID. Note this is not the same as entitlements management but in fact entitlements management is just a (much) broader application of the same privilege-delegation chain model.
- An example DID might be A456. As identifiers, DIDs should not be smart; keys should not encode aspects of themselves. Integer keys are technically feasible but are discouraged because on a practical large system basis, to- and from-string externalization combined with comparison always ends up becoming brittle as the String and int types get interchanged. It is much safer to stay with Strings exclusively.
Digital Identity carries no other information. Not even a name or or an SSN. We'll see why in a moment.
Identity Attributes
Identity Attributes are data associated with a DID that map one-to-one with the DID. These attributes can include name, address, SSN, email address(es), etc. They do not include information at an application level or, more generally, things associated with the DID as opposed to things that are fundamental to the actor behind the DID. The reason these attributes are separate from the DID is that although there is only a single domain of DID, there may be more than one locus of infomation for identity attributes. For example, a system design may choose to manage SSN and other private data in objects and databases separately from other less sensitive data. These other objects and databases are keyed by DID just like email and name and thus are peers.
In practice, though, it has become common to embed simple identity atttributes like name and email address directly into the DID structure. One can pragmatically rationalize the difference between sensitive and non-sensitive information and let non-sensistive information live in the DID structure. Further, co-locating critical info like actor name with DID may make processing simpler and therefore more reliable than if these attributes lived elsewhere. For the purposes of this article, however, we are showing what the high-bar design should be.
Credentials
A credential contains three things:
1. A DID
2. Any amount of material that is required by the authentication engine for that kind of credential
3. Relative strength
The most common form of credential is the LoginPassword credential. For DID A456, for example, it would contain DID A456, login "person@provider.com", and an encrypted password. Very important concepts are present here:
- Credentials cannot be created without a valid DID.
- Credentials need to be carefully managed and protected.
- All credentials must be unique within a domain for that kind of credential -- but multiple, discrete credentials that point to the same DID are permitted in the model.
- The login in a LoginPassword credential has nothing to do with a name or an email address or anything else. It is independent of these things; the login is simply a string that will be passed to the authenticator. There is essentially no restriction on what the login string could be because it matters only to the person logging in, not the system. What you type into a screen for login has almost zero information architecture impact on the system. It's the DID behind the scenes that links the login to an actual actor, not the login.
- There is a interesting aspect to the uniqueness of the material within a domain of login+password credentials. Two LoginPassword credentials with the same login but different passwords are technically unique -- but this is typically too confusing to manage, so we enforce uniqueness at the login, and permit passwords to be non-unique. This is why the email-as-login convention has become so popular because it effectively and easily guarantees uniqueness. We'll see later how we turn an email+password experience for the user into the proper set of data in the proper set of entities.
- A comprehesive design foray into credential strength is not necessary here, but in summary, some credentials may be "harder to prove/validate" than others and therefore may permit access to more sensitive features. The credential strength model could be a simple integer or a more complex set of structured data.
Authentication
Authentication is the process of taking a credential appropriate for consumption by a particular authenticator process and basically returning yea or nay. Of course, more sophisticated things like tokens and such can be returned too but truly the main purpose is simply to yield true or false.
Authenticators are not necessarily credential stores although very often they play this role because of the close association of credential and the logic necessary to validate it. Conceptually, however, credentials need to be managed independently of a tool that consumes them -- especially simple credential types like LoginPassword.
Identity-Based Application Attributes
This closely aligns with entitlements management and represents information that is bespoke to an actor and its interaction with a particular program and/or system. Things like "maximum comments to make," "dollar limit of purchases", "default screen color," etc., are well away from the core of DID management and authentication and need to be managed in a separate set of objects and persistors. They are of course still keyed by DID.

Overall, this design leads to increased flexibility, security, and changes to data without disrupting the key structure of the information architecture:

Different kinds of credentials like FingerprintCredential or ThirdPartyTokenMatcherCredential are easily constructed and integrated into the system because in the end, it all comes back to the DID and everything else in the system hangs off the DID.
Users can change email addresses, phone numbers, screen names, or pretty much anything else without disrupting the key/data structure of the system.
Multi-factor authentication schemes (soft/hard tokens, geographic, time, etc.) are simply more input materials in a credential so crafted to handle it.
Login and security management can be well-factored away from per-application and indeed, per-system code. This reducing risk in a number of obvious ways including easing the testing of application logic.

`User`, revisited

To make this all very concrete, let's go beyond the classes and recast the User design above in a set of RDBMS tables. Assume a new user setup where the user enters bob@provider.com for an email address and a password. Upon completion, this is what we might see. Essential bookkeeping fields like createdOn, modifiedOn, checksum have been omitted in places for clarity:

DID

DID managingDID createDate provenance

A457 X111 2013-08-23T13:12:11.987 I2

The setup process confirmed that the login+password was unique in the domain and a new identity (DID) could be created.
The system process that created the DID is itself identified as X111.
It is important to understand the level of vetting/authenticity of a particular DID and since this was created anonymously over the internet, a code of I2 was assigned. An employee of the site or a B2B partner might have a different code indicating stronger level of provenance.

IdentityAttributes

DID fname lname primaryEmail

A457 bob@provider.com

The email address was captured but as an anonymous internet login, no first or last name is known. A "premium" user or one that elects to add the information would populated these fields at a later time.

LoginPasswordCredentials

DID login EPW prevEPW strength

A457 bob@provider.com sdfklj27 1

The initial login+password credential set up for the user; note that the previous encrypted password is blank.
This kind of credential carries strength 1. Credential strength can be used by consuming application/entitlements logic to drive features and capabilities. Other kinds of credentials may be stronger, meaning they are more securely linked to the actor to whom they were issued.

ApplicationAttributes

DID defaultScreenColor dollarLimit

A457 blue 10000.00

All application level attributes live in a table (or tables) well-removed from the core identity and credential information.

To demonstrate the capability of this design, assume that Bob's engagement with the site expands over time. We might see the following:

DID

DID managingDID createDate provenance

A457 X111 2013-08-23T13:12:11.987 I4

As a result of further vetting and engagement, the authenticity and connection between Bob and his DID has improved to provenance level I4.

IdentityAttributes

DID fname lname primaryEmail

A457 Robert Jones bob@provider.com

More detailed information about Bob is known.

LoginPasswordCredentials

DID login EPW prevEPW strength

A457 bob@provider.com j3jif387 sdfklj27 1

A457 bob@provider.com uf8273hj 2

In this system, login+password combined is used as a test for uniqueness of credential material, not just login. Thus, Bob can have a second login with the same name (bob@provider.com) and a different password. The login+password credential is strength 2. The idea here is Bob has one login for low-security interaction and one for higher-security interaction.

FingerprintCredentials

DID FPEngine ID FP strength

A457 Finger1 4552 (binary data) 4

Fingerprint credentials are also supported. Some fingerprint authenticators may not be able to match based on just the fingerprint data and may require an ID (i.e. ID is quickly looked up first, then fingerprint data is matched). We show such a setup here where Bob has been assigned ID 4552 in the Finger1 authentication engine.
Although we show the fingerprint data captured here in the FP column, more likely the Finger1 authentication engine would domicile that data, in which case the ID becomes the critical link between our system's DID based world and the identity domain in Finger1. In the most exterme vendor-specific case, the authentication engine owns all the data, and will have to capture DID as well. Remember that a credential cannot be created without a valid DID.
The strength of this credential is 4, stronger than login+password.

Like this? Dislike this? Let me know

Good Identity Architecture

User, revisited

`User`, revisited