Deriving Win/Lose Streaks with MongoDB

29-Dec-2021 Like this? Dislike this? Let me know

A popular data design for outcomes of games might look like this, leaving out game-specific details and showing just 2 players:

    {"pid":"P1", "gid": 1, "date": new ISODate("2021-12-18"), "outcome":"WIN"}
    {"pid":"P1", "gid": 2, "date": new ISODate("2021-12-19"), "outcome":"WIN"}
    {"pid":"P1", "gid": 3, "date": new ISODate("2021-12-20"), "outcome":"LOSE"}
    {"pid":"P1", "gid": 4, "date": new ISODate("2021-12-20"), "outcome":"TIE"}
    {"pid":"P2", "gid": 5, "date": new ISODate("2021-12-20"), "outcome":"LOSE"}
    {"pid":"P2", "gid": 6, "date": new ISODate("2021-12-20"), "outcome":"LOSE"}
    {"pid":"P1", "gid": 7, "date": new ISODate("2021-12-21"), "outcome":"WIN"}
We wish to find the total win, total lose, and win streak and lose streak for 1 or more players. Getting the totals is easy with the $group operator:
    {$group: {_id: "$pid",
	      wins: {$sum: {$cond:{if:{$eq:["$outcome","WIN"]},then: 1, else: 0}}},
	      loses: {$sum: {$cond:{if:{$eq:["$outcome","LOSE"]},then: 1, else: 0}}}
             }}
but how do we "walk" a series of games and keep track of the number of consecutive WIN or LOSE and furthermore "hang onto" that number while hunting for longer streaks later on? This is not about using $filter; this requires carefully walking a set of outcomes and applying conditional logic similar to the following:
    current = 0
    max = 0
    for each game in array of games:
        if game is a win:
            current = current + 1
            if current > max:
                max = current
        else:  # not a win, so cannot be consecutive so reset to zero
            current = 0
The challenge is that
  1. We don't have an array of games, we have documents of games. None of the more fancy MongoDB array operators work across documents, only within documents.
  2. We cannot easily create "helper" variables
...or can we?

The Power of $reduce

The most traditional use of $reduce is to operate on an array of numeric scalar data to yield a single numeric result. The $sum function could be implemented with $reduce thusly:
    {$reduce: {input:[1,3,4,5], initialValue:0,
    // The purpose of "in" is to "return" a new single value that will be
    // assigned to $$value for the next go around:
        in: {$add:["$$value","$$this"]} }}
The $reduce operator can also use arrays of objects. The machinery works the same way; it is only the field addressing (basically, the dotpaths) that makes it more complex:
    # Notice the initial value is not 0; it is a doc with qq = 0
    {$reduce: {input:[{qq:1},{qq:3},{qq:4},{qq:5}], initialValue:{qq:0},
    #  "in" must now return a new single object with qq set to a new value:
        in: {qq: {$add:["$$value.qq","$$this.qq"]}} }}
But there is nothing that demands that $reduce use or not use fields inside the object. We exploit this to combine both "working" and "state" variables in the object. The only tricky part is that the in section must always "fill out" the complete object for delivery to the next iteration of the logic; otherwise, the state is "dropped."

The Solution

db.foo.aggregate([
    //  Filter for anything or nothing here.
    //  BEWARE of filtering for nothing because that will drag the ENTIRE
    //  DB out...
    {$match: {pid: 0}},
    
    //  With $sort, you get the start of the MOST RECENT streak where a higher
    //  gid implicitly means more recent.  You could sort by gameDate or similar
    //  too for same effect.
    //  Comment this out to get the end of the EARLIEST streak.
    {$sort: {gid: -1}}

    //  Construct array of data for $reduce.  Note the objects we $push have
    //  both "working variables" (gid and s, which are the actual data) and
    //  our "state variables" (max and curr)
    ,{$group: {_id: "$pid", outcomes: {$push: {gid: "$gid", s: "$outcome", max:0, curr:0 }} }}
    
    ,{$project: { 
	//  Easy:  Get total won and lost by simple filter on field 's':
	won: {$size: {$filter: {input: "$outcomes", as: "zz", cond: {$eq: ["$$zz.s","WIN"]}} }},
	lost: {$size: {$filter: {input: "$outcomes", as: "zz", cond: {$eq: ["$$zz.s","LOSE"]}} }},

	//  This is the MQL implementation of the pseudocode above.
        //  Also note that although s (the outcome status) is NOT necessary
        //  to pass back in the output of in, the game id, max, and curr must
        //  always be passed back, even if the value has not changed!
	winStreak: {$reduce: {
	    input: "$outcomes",
	    initialValue: {gid:0,max:0,curr:0},
	    in: {$cond: {
		if: {$eq:["$$this.s","WIN"]},
		then: {$cond: {
		    if: {$eq:["$$value.max","$$value.curr"]},
		    then: {gid: "$$this.gid", max: {$add:["$$value.max",1]}, curr: {$add:["$$value.curr",1]}},
		    else: {gid:"$$value.gid", max: "$$value.max", curr: {$add:["$$value.curr",1]}}
		}},
		else: {gid:"$$value.gid", max:"$$value.max", curr:0} 
	    }}
	}},

	loseStreak: {$reduce: {
	    input: "$outcomes",
	    initialValue: {gid:0,max:0,curr:0},
	    in: {$cond: {
		if: {$eq:["$$this.s","LOSE"]},
		then: {$cond: {
		    if: {$eq:["$$value.max","$$value.curr"]},
		    then: {gid: "$$this.gid", max: {$add:["$$value.max",1]}, curr: {$add:["$$value.curr",1]}},
		    else: {gid:"$$value.gid", max: "$$value.max", curr: {$add:["$$value.curr",1]}}
		}},
		else: {gid:"$$value.gid", max:"$$value.max", curr:0}
	    }}
	}}	
    }}

    // Who had best streak?  Get the top 3:
    ,{$sort: {"winStreak.max":-1}}
    ,{$limit: 3}
	
], {allowDiskUse:true});  // You will very likely need allowDiskUse if trying to analyze
                          // 1000s of pids with 1000s of games.

{
	"_id" : 378,
	"won" : 516,
	"lost" : 239,
	"winStreak" : {
		"gid" : 378695,
		"max" : 21,
		"curr" : 0
	},
	"loseStreak" : {
		"gid" : 378784,
		"max" : 4,
		"curr" : 0
	}
}
{
	"_id" : 844,
	"won" : 497,
	"lost" : 244,
	"winStreak" : {
		"gid" : 844367,
		"max" : 20,
		"curr" : 0
	},
	"loseStreak" : {
		"gid" : 844263,
		"max" : 6,
		"curr" : 0
	}
}
{
	"_id" : 306,
	"won" : 507,
	"lost" : 242,
	"winStreak" : {
		"gid" : 306350,
		"max" : 18,
		"curr" : 5
	},
	"loseStreak" : {
		"gid" : 306869,
		"max" : 5,
		"curr" : 0
	}
}
found 3

The interesting output fields are the player id (_id), that players longest winning streak (winStreak.max), amd the game where the streak started (winStreak.gid). winStreak.curr is a leftover from the process and is of less interest.

Like this? Dislike this? Let me know


Site copyright © 2013-2024 Buzz Moschetti. All rights reserved