How Data Card Quality Impacts AdOps


One of the great strengths of NextMark is that it allows media planners and AdOps teams to develop strategies based on a large database of crowdsourced Data Cards.  The crowdsourced Data Cards create a competition among list brokers, data compilers, and list managers to present the most complete Data Cards, as is evidenced by the Data

NextMark Data Card Summary
NextMark’s Summary of how Data Cards are rated.

Card Quality ratings on that NextMark maintains.  Data Card Quality ratings provide a simple scoring system that allows a media planner or list buyer to see the relative investment of time each list manager or data compiler has put into the crowdsourced platform.


Data Cards are used by media planners in selecting list vendors and developing target audiences for a given campaign.  They have become an essential tool for database marketing in that they underpin cost and response projections.  The challenge, unfortunately, is that quality of the Data Card (which is essentially an advertisement for a product) does not reflect the quality of the product being advertised.  As NextMark points out in this blog post: it is easier to maintain a database of Data Cards than it is to maintain (for example) a compiled database of responsive contacts.

That gap between what Data Cards present and what the companies that post Data Cards can deliver is a classic problem of the commons in that there is no objective assessment available within the information presented in the Data Card itself that would allow the media planner or list buyer to know what methods of data modeling or data compilation a given list manager uses to support the list description and universe quantities presented by a Data Card.  In other words, if a list manager presents a “compiled” or “modeled” list of beanie baby enthusiasts consisting of 12 million postal addresses, phones, and email addresses, there is nothing within the information presented by the card for anyone to know how accurately the list is compiled or how well it is modeled.  Since the competition to capture sales among list managers is very intense, the incentive to oversell one’s data modeling practice or data compilation methods can be very strong.  This leads to lost ROI for media planners and list buyers, even though Data Card itself seems to have met all of the formal requirements of NextMarks 13 point Data Card quality scoring methodology.

Maximizing Cash Flow In Decisions Under Risk

In a series of blog posts I want to demo using Rapid Miner vs R Studio to explore the topic of maximizing cash flow in decisions under risk.  Rapid Miner and R Studio are tools that an analyst might use as alternatives to Excel.

The type of problem I will be considering is whether or not you should play a certain type of game and how to know the answer to questions like that (should I play this game, i.e. risk money to win money) in general.  They are generally questions that can be formulated as follows:

An urn is filled with 99 black balls and 1 white ball.  If you draw a black ball out of the urn you will lose $10000 and if you draw a white ball out of the urn you will win $1000000.

Thus your decision table looks like this:


Your expected return on a game such as this is going to be:

(1/100)*1000000-(99/100)10000 = 100

In other words, you can expect to end up $100 richer by playing the game, so from the point of view of net cash flow you should certainly choose to play.  Most people, on the other hand, would see the 99% chance of losing $10000 as a strong reason not to play.

It makes sense, therefore, to introduce what is known as a “utility function” that addresses the attractiveness of the $100 given the risk of $10000 in the preceding example.  There are two important aspects of utility functions in a context such as this:

1.  More money is always better than less money
2.  The more money you already have, the less attractive small increases that require large risks seem

In other words, if you already have $1000 then risking it all for $100 is not as attractive as the scenario where you only have $0.50 and you risk it all for a $0.05 payout.

That sort of reasoning is how the concept of risk can be introduced into a mathematical discussion of decision making.  It is easy to see that there is a certain point where you would risk $1000 for a chance to gain, for example, $1100 more over and above the $1000 (so that you end up with $2100) or even that you would risk $1000 for a chance to gain just $500 more (so that you end up with $1500).

The way that risk tolerance is determined is that you empirically set the value of a probability p so that the expected return can be calculated for the game.  In other words, you empirically determine the probability p in the options:

1.  Certainly keep your $100 (don’t play)
2.  Buy a lottery ticket where you have a probability p of winning $1000000 and a probability 1-p of losing $10000 (i.e. the lottery ticket costs you $10000).

The value of that probability p is different for different people.  Someone who needs the value of p to be such that the expected return from option 2 is less than $100 is said to be “seeking risk”.  Someone who needs the value of p to be such that the expected return of option 2 is higher than $100 is said to be “risk averse”. To make the same point again in concrete terms:  when you are willing to buy a lottery ticket whose expected value is $90 as opposed to just keeping your $100, then you are indifferent to option 1 vs option 2 and you would be considered a risk seeker.  If you would only buy the lottery ticket when its expected value is $110, then you are only indifferent to option 1 vs option 2 in cases where you expect to win and thus you are considered risk averse.

A major issue for many businesses is whether or not the manager calculating the utility function is using her own personal value of the probability p as opposed to using a value for p that is set by the business.  In many cases she will have a higher tolerance for risk than the business does.  Another problem with such calculations is that often times a business owner will have a very high tolerance for risk personally and incentivize his employees to also have a high tolerance for risk when in fact the business cannot afford it.  Because of these pitfalls, many analysts use standardized functions for calculating utility.  In particular, many analysts use what is called the “exponential utility function”:

U(x) = 1 – e^(-x/r)

In this function the dollar amount being gambled is x and the risk factor is r.  The more risk averse the manager or business using the function is, the larger the value for r should be.

The value of r can be set using the lottery example above, viz. determine the point at which you or the business is indifferent to these two options:

  1. Receive $0
  2. accept a 50-50 chance of winning r dollars vs losing r/2 dollars

The value for r is generally set empirically for types of businesses or estimated for a particular business.  For example, in the book Data Analysis and Decision Making a widely accepted value is 6.4% of net sales.  So, for example, if you have net sales of $10000000 then your r value would be $640000 and it would be easy to calculate that U(100) = .00015 and U(25000) is .0383.

My interest the next few blog posts is to use Rapid Miner and R Studio to explore cases where the decision that maximizes cash flow is or is not optimal from the point of view of the exponential utility functions and look at how one might know that the decision that maximizes cash flow isn’t necessarily the rational decision to make.