Eric Tang's Blog

Startups, Software, Everything Wildcard

Intent and the Internet

I’ve been thinking about intent lately. In the world of “big data”, “sentiment analysis”, “behavioral marketing” (blah blah blah…), “we use intent to drive user behavior change” is the party line. It’s a shame that “driving intent” is this black box that “our data scientists created” and no one else understands it. What does it really mean in the context of the internet we live in today?

To Lay the Ground Work

On the most fundamental level, every small action we do on a webpage is triggered by an intent. These are not “want to buy a car”, “want babies” intent. These are more like “click on name link”, “navigate to home page”, “sign up”, etc. Micro-view, super short-term, knee-jerk reaction type of stuff. From that we can define an “Intent” as “the desire of performing an action”, which really is the action itself + some meta data (like time, location, etc).

Now that’s quite a simplistic and lower-value view of Intents. What people really want is to derive “insights” from “intents”, which are the more “meta” intents people talk about in the advertising world (“want to buy a BMW”, “want to go to Daft Punk concert”, etc).

Let’s call the more insightful intents “meta-intents”, and the lower-level, action-representations “simple-intents”.

The Problem Of Predicting Intent

So what does it take to arrive at the “meta-intents” from a bunch of recorded “simple-intents”? When we translate it to “data science” terms, the question becomes “Given a series of prior simple actions, how likely is it for a person to take a particular action I care about in the immediate future?”

Statisticians have long studied this problem. Countless research studies have been conducted, ranging from brand loyalty, purchase behavior, to World of Warcraft and condom usage. Many regressions and latent-models and collaborative filters later, they all have one thing in common: the data has to be high-signal and low-noise. Said in another way, we have to know all of the relevant actions and at the same time filter out actions that are not relevant.

To put it in an example: in a cookie-centric world, an advertiser typically have about 10%-25% of the data. This means if I visited 100 sites this morning, they know about 10-15 of them. How high of a signal is that? How accurately can they predict what I’m trying to do?

In Real Life

Simple intents and meta intents inform one-another. We can derive simple-intents from meta-intents to guide users down to specific paths, and we can observe users’ simple-intents to derive meta-intents. But these 2 operations require different amounts of efforts. “Meta-to-simple” can be achieved almost automatically (mostly driven by algorithms), “simple-to-meta” is much more manual (data scientists or analysts validating assumptions on top of hadoop clusters). On by the way, it’s not obvious what those meta intents are so the assumptions are hard to make. Think the book Freakonomics.

From a cold-start, since you don’t have enough data to study any patterns of simple intents, ‘meta-to-simple’ is the only approach. There might still be room to use public data sets to generate meta-intents, but that’s a timed window as data science becomes commoditized over time. A data-driven product needs to have enough data science DNA in-house to continually experiment with new assumptions, and that’s how you build true advantages that no one else can copy.

There are a lot more topics to think about, like personal profile and short-term vs. long-term intent. We’ll talk about that in another post.