Incomplete Information: A "Gamey" Discussion

Pocket aces, article reading histories, app metrics, thunderstorms before a flight - what do they all have in common? None of them tell the complete story. They all only hint at one factor of the scenario that is attempted to be understood. Despite our data-laden world, we're still often forced to make inferences to help guide important decisions where informational symmetry is lacking.

My name is Tim and I am an engineer at Canopy working on the various technologies we are harnessing to get it right this time. The TL;DR is that we are inverting the traditional internet business model from one that relies on extracting every bit of user information into one that provides people with recourse and complete ownership over their data. You can read more about that here, here and here.

Today, we are in the final stages of launching our first consumer product that is powered by this technology. We want to give our users personalized experiences while keeping their data safe and secure on their devices. It is an incredible challenge, but I am lucky enough to work with some of the smartest people in the business to take it on.

In addition to saving the internet, one of my side-passions is poker - specifically Texas Hold’em. I recently final tabled the World Series of Poker Main Event and finished 8th out of a starting field of 8,569 players.

I have been enamored by the game of poker for many years. It has allowed me to meet fascinating people and taught me generic frameworks that are applicable to other facets of life. One such framework derives assumptions about hidden state from direct observations and extrapolations. Another calculates expected values of all possible future actions and outcomes of an event, using the input derived from the previous framework.

For example, here is an abridged version of an in-game decision: deciding whether to call a bet of $100 into a pot of $200 on the river (where there are no more future cards/actions after mine). The first framework assists in deriving the ranges (a frequency-weighted distribution of possible poker hands) that my opponent shows up with. Then, if the hand I am calling has 60% equity versus his range, the second framework is used to determine that the expected value of the call is +$140. That is a result of me losing $100 40% of the time and winning $300 (the pot of $200 in addition to his $100 bet) 60% of the time. These concepts can be applied to any decision point in poker. Taking this one step back, if we move the decision of calling a $100 bet on the turn (where there is one more community card to come), I would need to factor in all possible future outcomes to determine the expected value of a call. Given the information that we do know, which are my two cards along with the four exposed community cards in a deck of fifty-two, there are forty-six possible rivers of which we would need to determine our expected value for given our opponent’s potential strategy regarding each distinct card. Those expected values can then be averaged to determine the “actual” value of my turn call, or strategy chosen.

Care must be taken to not have extremely misaligned assumptions - leading to my favorite verbiage of "garbage in garbage out". As one traverses further up the game tree the number of assumptions required and possible decision points explode exponentially, so poor range assumptions can induce massive errors that suggest incorrect play once information becomes known as the game progresses.

Given the initial example, one might ask: how do I know I will beat 60% of his range? Well...frankly I do not know from an absolute perspective. The practice of assigning ranges utilized above in any poker situation is primarily based on observed data points, or in machine learning terms - features (some discriminate and measurable property). In poker, this can be anything from game-external features such as what the opponent is wearing and their posture, to game-internal features like the frequency at which they raise preflop (this is before any community cards come out). Using all the data I have, such as information I have read, players I have faced in the past, and all of my hand histories, I am able to correlate and identify patterns among groups of players to assess the hidden information - their potential range of hands. Similarly to machine learning though, these features can have extremely high to almost no correlation with the output.

To apply this concept to what we are working on at Canopy, you can think of the primary purpose of a good recommender system is delighting the user with something new. Instead of maximizing the potential currency/chips won in poker, we can instead apply the same frameworks towards user delight. Given values for various degrees of delight (quantifying delight is a whole topic for another blog post), we then can utilize features that we believe are relevant, like article genres, read time, sentiment, publication format, etc. to make assumptions on unread articles - similar to how I do with a poker player’s range. Those assumptions can then be fit into a model used to calculate and maximize expected delight.

As the classic finance adage goes: past performance is not indicative of future results. Metrics we collect to drive business decisions only measure what has happened, not what will happen. For a startup like Canopy, it is common to encounter forks in the road where we have to put forth all effort towards a specific project or goal. With the finite information we have, we need to predict the value and likelihood of success for the range of all possible projects we could be working on. The frameworks from poker are highly applicable, as they can help evaluate which projects deliver the highest return on investment. However, delivering an internet experience that the world deserves has some value that approaches infinity for us. That means as long as the possibility that we succeed is greater than 0%, projects that align with a better internet will always be our top priority.

Going one step further, I have always wondered about taking these frameworks to the extreme and optimizing life around this. For example, if I had a flight at 9pm when should I leave for the airport? Well, I would need to establish a baseline value for my time, then observe features like weather and time-of-day to gauge the hidden elements: the possibility that there is traffic, the expected queue of the security line, etc. On the other side I would need to determine the loss in value from missing the flight, whether that is re-booking fees or utilizing more of my personal time. I would then tweak that model until each extra minute I leave later no longer gains me value due to the higher possibility of the negative outcome from me missing my flight. This subculture of hyperrational-optimization is endless. Other potential routine optimizations include the popular "should I speed on the highways to save time" (risk of monetary loss from tickets/increased accident % vs value gained from time saved) to the traditionally emotionally-driven "what presents should I get for my significant other" (risk of an upset partner vs value from saving more money), but I digress.

For more reading there is a bunch of fascinating research regarding work done on poker and video games.

I would like to thank all the excellent folks here, not only for allowing me to extend my 1.5 week vacation into a 3 week one and supporting my (degenerate) endeavors, but all their hard work in making the internet a better place. I am excited for the future, where we can all accept a "private-by-default" attitude from all services we (as consumers) engage with. Then for once, we will have complete information regarding what our personal data is being used for.

published by:
Timothy Su
DAte:
August 21, 2019