UI Design: Using Machine Learning Thoughtfully

There are vast amounts of data available right now, and if you have any tracking on your website or app of clickstream data, chances are you are using analysis, data science, or machine learning to try making sense of it all. This is great! Whatever level of analysis you are using, moving to a data-informed culture is a great way to improve your product and the experience for your users.

It is important to highlight some potential pitfalls that might come up if you lean too heavily on machine learning to decide how to design your product, and ways to work around those pitfalls.

Machine learning algorithms are black boxes, which may lead to non-intuitive UIs

One potential risk of using a machine learning algorithm as the sole piece of information to design a UI is that it results in a non-intuitive UI. Users have learned how to use technology to do various tasks. For example, see a blue underlined piece of text and you might expect it to be a URL leading somewhere (and be frustrated upon clicking it to realize that it is not!). A lot of this learning has been done in the background in our lives on computers and online. But now we have a mental model of how we expect certain systems to work.

If machine learning algorithms are used as the sole piece of information to design a UI, this could lead to something that does not perfectly fit within a user’s model of how the system might work. Nielsen Norman Group found in a 14-person diary study that when people interact with systems built on machine-learning algorithms, the users had weak mental models and difficulties making the UI do what they want.

Part of this is because most machine learning algorithms are black box models: we put data in, we get outputs out… and we don’t necessarily know what happened in the middle to get those outputs. Further, in most cases, users do not necessarily know what pieces of their data are being used to personalize their experience. Give their full article a read, I found it had some extremely valuable ideas.

Even with this in mind, I think that machine learning can still provide valuable information to help inform UI decisions. The trick is to using machine learning to inform the decisions: Add in a layer of human experience. For example, supplement machine learning analyses with user interaction, like Nielsen Norman Group did in their diary study, or do user testing where possible. And before doing anything radical that provides a vastly different UI experience based on machine learning findings, make sure you know why the rules were in place in the first place so you know the new alternative still provides what users expecting that experience to provide. “Learn the rules like a pro, so you can break them like an artist.”

Data does not make us not mind readers: We do not know why users did what they did or even what they are trying to do

When we do any type of analysis looking at user behavioural data, we see what they did through their clickstreams. We do not see what they were trying to do.

So while we may notice that a particular element in a mobile app doesn’t get engagement, or that another element like sharing gets lots of clicks but no completions, we don’t know what the user was trying to do. Were they expecting clicking share to do something else? Are they not engaging with the element because they don’t know what it means or that the app can even do that?

This is where user testing can inform how real life users are intending to use your website or app. Another thing you might consider is sifting through reviews about your product: if someone is really unhappy and couldn’t make it do what they expected they could make it do, you can bet someone somewhere is complaining about it.

The model must be trained on historical data, which tends to perpetuate the status quo

A machine learning model must be trained. And to train it, you need data. That data is inherently going to be historical in nature (oh, what I wouldn’t do sometime to have a DeLorean sometimes). It will be a snapshot of your current UI and if you solely look at what is happening on it, you may end up finding you perpetuate the status quo. In the less extreme cases this can just lead to leaving some money on the table, but in the worst cases, this can lead to getting stuck in unfortunate discriminatory equilibriums we as a society may be trying to get out of (which I won’t get into here and will save for another blog post).

Let’s say that you have a web page with a giant green “Order Now” button in the middle of the screen. This accounts for 25% of order funnel starts, which is higher than any of the other entry points. Therefore, some might be hesitant to change anything about this: why risk breaking a good thing?

However, what if unbeknownst to your team there is some other completely different page layout that renames the button, changes the colour, and the location that could lead to an even higher click rate and order completion rate?

Of course, it’s probably not going to surprise some of you that I’m now going to bring up A/B testing. A way around this pitfall of historical data is to start doing A/B or multivariate testing. Create an alternative web page layout with incremental changes, show it to a randomized percentage of your customers, and compare the results of the A vs. B.

A/B testing allows us to think “What if” and see what might happen. It helps us get to causality.


So while machine learning might unveil interesting insights and recommendations, sometimes you will want to use A/B testing to confirm if your causal model of what is happening really is happening, or to ensure that you are not getting stuck only implementing changes that can only come from historic data. Shake it up!

Conclusion

Machine learning of clickstream data is just one tool you have at your disposal to inform your design.

Other possibilities include:

  • User testing
  • A/B or multivariate testing
  • Reading negative reviews
  • Interview product users
  • Examine competitor products
  • Thinking through mental models of how systems work

Good Data Communication and Tom Clancy’s Jack Ryan

[Note: This post contains spoilers for the first episode of Tom Clancy’s Jack Ryan]

A TV show where the lead character is a SQL-using analyst and economist? Count me in! But, it looks like it will mostly be a wartime heavy action political thriller so I probably won’t be as into it as Moneyball. I watched the first episode, though, and I think it’s a decent demonstration of some good and some bad data communication.

JACK: Uh, Jack Ryan. I work Yemen. I’ve been monitoring SWIFT network transactions in and around Aden.

GREER: And?

JACK: And Um Well, actually, in the last few months, I have red-flagged several of these transactions as potentially suspicious.

GREER: Suspicious? How so?

JACK: It’s anomalous to see large, one-off SWIFT transactions to individuals, especially in Yemen. And normal SWIFT transactions usually occur in patterns.

GREER: What’s your concern? What’s at issue?

JACK: Um, it is my theory that the individual behind these transactions could be a high-level target.

GREER: Hmm. Which one? Well, now, that I’m not [CHUCKLES] – necessarily clear – Who is he?

JACK: I believe his name is Suleiman. It means “man of peace.” He just popped up on CTC Yemen’s radar. The RH assets have mentioned him as well

GREER:  That’s it? They heard a name. What else have they said?

RYAN:  It’s not about what they’re saying, sir, it’s about how they’re saying it. I mean, they’re talking about this guy with real reverence, and I’m not talking sectarianism. I mean, he appeals to Shia and Sunni.

GREER: Wow. A brand-new bin Laden on my first day. Who would have thunk it? [LAUGHTER] So how come you’re the only one that knows about this mystery man?

RYAN: Well, one of the difficulties in cobbling together intel is dealing with two databases – that aren’t meant to talk to one another.

GREER:- Mm-hmm.

RYAN: That’s why I’ve actually written a custom SQL query

GREER: Next.


[ELEVATOR BELL CHIMES]

PATRICK: SQL query. Seriously? You went there, Ryan?

TAREK: Ops guys like maps and graphics. You should try putting stick figures in your reports next time.

Source: https://www.springfieldspringfield.co.uk/view_episode_scripts.php?tv-show=tom-clancys-jack-ryan-2018&episode=s01e01

 

Working in data can be hard – how to get people to see and trust the data as you, the analyst, have intimately come to trust the data?

Jack Ryan is a financial analyst working for the CIA, using complex SQL to make “two databases – that aren’t meant to talk to one another”, well, talk to one another.

Jack Ryan’s colleague critiques how he introduces himself and the data he has been working on to their new group chief, James Greer. And while its true maps and graphics can go a part of the way to getting people’s attention, Jack Ryan actually did something right that helps show how to get people’s attention when sharing your data analysis.

What did Jack Ryan lead with? He leads with the ‘why’. The big clincher of what his SQL and monitoring SWIFT network transactions all meant: a potential high-level target, ‘Suleiman’. This kind of stuff gets people listening. All too often, after working on the detailed nitty gritty of the data for days, weeks, or even months we as analysts forget to step back and see the big picture: How does this fit into the company’s goals?

Additionally, Fishtown Analytics suggests that “In order to disseminate factual knowledge, it is insufficient to simply disseminate data. Factual knowledge must include the data themselves as well as the knowledge about how those data were produced. ” Jack Ryan  includes how he got the data (“monitoring SWIFT network transactions in and around Aden”) and even provided perhaps too much detail (“I’ve actually written a custom SQL query”). Know your audience: Give them the high-level background of how your data was obtained, so they are more likely to trust it, and give details if they are interested and technically minded enough to understand it.

Another part of the ‘why’ is a recommendation. Recommendations can get the conversation rolling and get people thinking about how they can use the data at their disposal. Jack Ryan also does this later in Greer’s office by recommending they freeze Suleiman’s account. Greer disagrees. He doesn’t communicate well about what he is doing with the insight, though, leading to Jack Ryan going behind his back and freezing the account anyways.

Communication and trust in a team are important. Maybe the decision maker doesn’t do exactly what you suggest, but at least after sharing the ‘why’, the data details, and a potential recommendation they are listening and looking at the data you provide. Jack Ryan was presumptions and very confident in his abilities, but he hadn’t gained his higher up’s trust that he wasn’t just BSing his way through the analyses. I think it takes time and experience to build trust, and proven wins. It’s easiest to start small with this as small wins often back lower risk involved. If Jack Ryan had suggested doing monitoring in advance instead of assuming his analysis was correct, and then the monitoring yielding more confirmation, its likely Greer would have trusted Jack Ryan’s abilities over the long run.  Further, if Greer had communicated he disagreed with Jack Ryan’s recommendations resulting from his insights and discussed with him and the team about alternatives of what to do, the whole debacle could have been avoided. Of course, this is TV, and TV drama often stems from miscommunication, so this wouldn’t happen.

Jack Ryan went into perhaps a tad too much detail, and appears to have given an unpalatable recommendation and go at things behind Greer’s back, he clearly provided enough detail and interest to get Greer listening. At first, it seems like Greer has ignored what he said, but it turns out he had been listening (just communicating terribly).

GREER: What’s the matter? You don’t like flying?

JACK: What the hell’s going on?

GREER: That account you froze. S.A.D.and Yemeni PSO picked up somebody.

JACK: Suleiman?

GREER: No. A couple of couriers, they think.

JACK: Wait, you said S.A.D.- but I didn’t order any surveillance.

GREER: I did.

JACK: I thought you said I wasn’t “there yet.”

GREER: You weren’t. But that doesn’t mean you were wrong.

JACK: Well, how come you couldn’t have said that, instead of throwing me out of your office? GREER: Because I don’t know you. And I don’t answer to you.

Greer listened to Jack Ryan’s data. I think Jack Ryan had good data communication in this episode, other than him going behind Greer’s back and doing his recommendation anyways and Greer not communicating his tentative belief in Jack Ryan’s findings. Because Jack Ryan started the conversation with the ‘why’, gave data details to back up the findings, and gave concrete recommendations, it got Greer listening. While the implementation of what to do next was handled poorly so we could watch an action political thriller,  we can learn good data communication and learn from this mistake.

Goodhart’s Law: Additional Taxonomy, Applied to Digital Marketing

One of my favourite concepts is Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” Scott Garrabrant on LessWrong came up with a framework for four different ways proxy measures can fail when you are optimizing for them. To reinforce my learning I wanted to try finding examples in digital marketing and website optimization that apply to the other categories.

“Adversarial Goodhart – When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.”

We observe that online sales are correlated with display campaigns being the last touch before conversion (that is, under a ‘last touch attribution’ model). Our proxy for success is increased last touch attribution clicks for our display campaigns, but the true goal is increased online sales profit.

A competing display agency also serving ads for the company has the goal of increasing the amount of money that the company is spending on their ads. This is opposed to the true goal of the company as the more the company spends on ads, the more costs their company has (unless of course, those ads have an amazing ROI). The competing agency, knowing of the proxy of last touch attribution, thus tries to get their ads to show up right before purchase by primarily doing retargeting ads and swamping all users with ads so that they are the last touch before as many sales as possible.

I wrote more on this problem here.

“Extremal Goodhart – Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.”

We have correlated paid search advertising spend with online sales: The more we spend in a month on paid search, the more profit our company sees (which is of course our true goal). Our monthly spend historically has ranged from $25K-$100K but is usually around $75K.

The CMO sees this data point at a company presentation and is so excited that they free up $1M to spend on paid search advertising. This is more than we ever expected to see our company have in paid search budget! But, according to our model, this is great: we should see so much more profit!

So, we take this money and spend it. Unfortunately, we do not see an amazing increase in profit because of this! Why?

The world where our proxy value took on an extreme value is very different than the world where the proxy and goal was originally observed. The paid search keywords being bought with the additional spend are probably very different than the keywords under which we observed success, as the keywords where we saw success are likely to be saturated. So now there is probably spend on keywords completely unrelated to what our company sells, so users are not clicking and purchasing from our website at as high rates.

Here, what I’m talking about is basically the law of diminishing returns.

“Causal Goodhart – When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.”

Every day we observe a spike in visits to the website and subsequent sales around 9:30 PM. So, we conclude, more users on the site at exactly 9:30PM (our proxy) causes sales (our goal).

We maximize users on the site at exactly 9:30PM by blocking users from accessing the site and displaying a popup saying “Come back at 9:30pm!” We blanket the internet with ads for our company that only display around 9:30 PM. This will of course increase sales because 9:30 PM visits cause sales! We expect our company to rake in additional millions!

But… It turns out that we are getting so many users (and sales) on our site at precisely 9:30PM every day because someone in the TV advertising department (who we of course never talk to because large companies are frequently siloed) has purchased ads for a popular news program and is offering a promo code for users who ‘act fast’ and buy within the next 15 minutes.

In this case, there was a non-causal connection between our proxy and our true goal. There was a distinct other event (the TV promotion) that was causing both our proxy and our true goal to increase.

“Regressional Goodhart – When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.”

Clicks are correlated with sales conversions. Therefore, to optimize for your goal of getting users with the highest conversion interest to the site, you decide to invest in campaigns that get the highest clicks and divest from campaigns that have lower clicks. Here, your proxy is Clicks and your true goal is Converting Visitors.

However, as many an individual who has encountered a click-baity title can tell you, what makes someone click on something isn’t just conversion interest. It is also entertainment or shock factor interest. Here we are not just selecting for our true goal, but the difference between the proxy and the goal where the difference is entertainment (or shock value, or some other unthought of factor).

Conclusion

Some of the examples might feel a little ridiculous, but I think are things that could happen without trying to mitigate the potential for this occurring for your proxy measures.

  • Choose proxies that are the most tightly aligned with your goals
  • Assess the true causal nature for your proxy (or other events)
  • Have better company communication to ensure you don’t miss some other potential causal factor
  • Don’t fall victim to extrapolation

There are probably many other ways of optimizing for proxies can cause failure: Can you think of other types of failure? Or, can you think of other examples of each type of failure in digital marketing?

Women’s Day Special – Remembering Last Year

This women’s day, I was reminded of a blog post I was a part of highlighting some of the great stuff we all do. Read it here on Cardinal Path’s blog.

Danika Law, Staff Consultant, Data Science

What Women’s Day means to Danika:

I work with and am friends with so many women who work so hard, do such amazing things, and are super talented. Acknowledging this is part of what Women’s Day is about: we’ve come so far and should recognize our accomplishments with pride, while recognizing how much further we have to go.

Proud moment:

I’ve gained a lot of knowledge and confidence this past year: there are a lot of things I’m proud of. I’ve been encouraged and supported in so many ways to learn about interesting technologies, develop new product, and more. Having people ask me hard statistics and math questions and trusting my advice, and being given the platform to speak at company meetings and on the blog, are so valuable. I’m excited and proud of everything I’ve done with the data science department this year, and excited for what we will do next.

To learn more about some other amazing women, read on here.

 

Webinars – Data Science and Content Attribution

I have been on two webinars during my time at Cardinal Path.

Content Attribution

One of the projects I’m super proud of that I’ve worked on at Cardinal Path is Content Attribution. This is a code base I developed and then helped to deploy to help measure content performance.

Do you want to learn what content attribution is? Check out this webinar I was one of the speakers in. The other speakers were Charlotte Bourne from Cardinal Path and Erika Akella from Intel.

In this webinar, you’ll discover ways to apply both strategy and measurement to your content including:

  • Setting baseline content KPIs & aligning to audience
  • Identifying the value of your content to better inform your team
  • Understanding what a successful content path looks like
  • Identifying value of always-on content vs. campaign content

You can also read the Q&A blog post to see what questions were asked after the webinar. Some really detailed questions were asked so there’s certainly a learning opportunity for anyone who wants to optimise their content measurement!

Data Science for Marketing Lift

One of the first webinar’s I was on was to highlight data science capabilities. You can find the recording here on Cardinal Path’s site.

In this on-demand webinar, our experts go over a broad range of data science techniques and expose how major global brands are using them for valuable business insights including:

  • Customer lifetime value for customer segmentation and activation
  • Forecasting and predictive analytics with machine learning
  • Natural language processing for digital marketing optimization

The Q&A portion of the webinar can be found here.

Check them out if you are interested in learning more about either topic!

 

What Attribution Doesn’t Measure

Attribution isn’t the only way to analyze media, and in some cases, it isn’t the most appropriate tool. Consider media mix modeling when you want to measure offline and branding initiatives more than the direct response to media:

Attribution, whether rules based or data-driven, captures credit towards a conversion or purchase.

So what doesn’t get credit? Attribution models cannot measure things like brand awareness, recognition, and loyalty. Even if all advertising was turned off, some customers would still purchase with a brand because of this. These feelings can’t necessarily be measured in a hard and fast way using attribution pathing. Further, branding initiatives on offline channels such as billboards or TV ads may not get credit since attribution pathing can rarely connect a user in the offline world to the online world and all these pieces must come together.

This is where media mix modeling can complement your attribution. In media mix modeling, weekly data rather than user level data is analyzed to find out what weekly volumes were contributing the most to weekly sales. And one of the benefits of looking at this more aggregated data is that we can measure the impact of the base (what volume of revenue was not due to media effects). Further, seasonality, weather, offline spend, and other exogenous factors can be put into the model to get credit towards sales, provided the data exists.

Read my post on the Cardinal Path blog here.

Adding Context to Attribution Data

As with all things, without context, attribution data is hard to interpret.  I discuss on Cardinal Path’s blog why it’s so important to make sure your attribution analysis includes controlling for the cost of the media spent.

As marketers continue to grapple with assessing every channel and marketing touchpoint, it’s important to note — attribution data needs context to be fully actionable. Whether you’re using rule-based attribution where all credit is assigned to a channel depending on its position in the conversion path, or data-driven attribution, where credit is fairly attributed to channels depending on how much it increased the probability of conversion, cost data adds this necessary context.

This is best explained through an example.

Read my post on the Cardinal Path blog here.

Getting as Complex as Necessary: Attribution Modelling

Attribution is a hot topic, but it can be daunting to know where to begin. Read my post on Cardinal Path’s blog to find out where you should begin.

What does getting as complicated as necessary look like for you? That depends on where you are on the path to attribution. For example, for a Google Analytics user it might look something like this:

  • Starting out in attribution? Use the rules-based attribution reports available in the free version Google Analytics.
  • Explored rules-based and need to get a bit deeper but have low budget? Use customization of these rules based approaches. Further, Google’s announcement on free Google Attribution can also help get you started.
  • Ready to start getting serious about attribution? Google Analytics 360 will allow you to use Data Driven attribution (on top of all the other great reasons to use Google Analytics 360.)
  • Explored all the previous options and ready to get more serious about attribution in particular, and have larger budget? Time for an attribution specific vendor (for example Google Attribution 360.)

Get more details here.

Is your website helping drive conversions? Use Content Attribution to find out!

Read my post on Cardinal Path’s blog to learn more about content attribution.

Attribution is used to determine which marketing events contribute the most (and most often) to sales conversions. But in almost all cases, customers will see more than just advertising before they convert. Oftentimes, they will be fed content across your website which is just starting to educate, inform, and ‘warm them up’ – gradually bolstering their willingness to convert.

Rather than thinking about which marketing channels lead to the highest lift in conversion rate, what if you asked which content types are leading to the highest lift in conversion rate? Here are some questions that you should ask:

Learn more here.

Uplift Modeling: measuring true campaign impact

How can you tell if your campaign truly worked? Read this post on Cardinal Path’s blog that I wrote to learn about how Uplift Modeling can help you.

You’ve just finished running a new advertising campaign. Now you want to know the answer to your obvious and most pressing question: did it work? That is, did the campaign create uplift, causing more customers to purchase your product than would have without the campaign?

Naïve and incorrect approach

The conversion rate after receiving the offer was 7%, and so you conclude that 7% of customers bought your product because they saw the ad.

This is incorrect. What if those same customers were going to purchase your product anyways? Without comparing the conversion rate of those who saw the ad to those that didn’t, you aren’t getting a true measure of how much ‘lift’ your campaign actually caused.

In other words, you should not use the approach outlined above.

Go here if you want to learn what approach you should use!