AI-Powered Insights: The Liberation From Spurious Correlations

January 26, 2024

Chapter 1

The Paradigm That Holds Us Back

Dr. Frank Buckler

Founder, Success Drivers

It was the summer of 1998 and I was about to graduate with a degree in electrical engineering and marketing. I received an invitation in the post from McKinsey and it made my heart happy. A one-day talent event in which we would be able to get to know the company through case studies. At the last minute, I bought an overpriced yellow tie that would go well with my green suit. It must have been my choice of color that disqualified me from McKinsey. But what I still remember well from this event was these case studies.

The room was full of “high performers” from all disciplines. Everyone was given a case study description and we had 15 minutes to solve it. It was described very specifically how many man-years (MY) the consultancy had available to help a company.

Then came the bang. Mark was the first to present his result. He was a budding physicist and began with his solution: “According to my calculations, we will have gained 23 million watts”. We all sat open-mouthed, including the McKinsey consultant. “What are they talking about?” was written above our heads.

It turned out that the case study abbreviated “man-years” to MJ and that MJ in physics stands for Mega-Joule. The eager Mark had taken the DATA for what he thought it was … and so produced non-sense. He himself seemed to think this was plausible, as he was the first to point it out.

“Data is gold” is a metaphor that is often used. Mark couldn’t do anything with the “gold”. Why? Because the metaphor is misleading. 

Data is more like atoms. Humans consist of 99% carbon, hydrogen and oxygen atoms. If these atoms were neatly separated “in three piles”, it would be impossible to build a human being from them. 

Data are building blocks, but the magic lies in the context, not in the data. The magic lies in the MEANING of the data. This is what this book is about and what this means for the marketing management of companies.

The magic of data analysis lies in the context of meaning. However, the prevailing understanding in marketing management is more like this: 

  • “Collect lots of data” – this is based on the belief that the more “data” you have, the more “insights” you can gain from it.

  • Give the data to the geeks and the Ph.Ds. Those are the ones with the glasses. Down there in the basement where they sit, they use it to calculate and produce insights.

  • Then look at the results and if they are not plausible, send the geeks back to the basement. 

This understanding is not only very one-eyed, but also dangerous. With this book, I would like to contribute to a more enlightened and effective use of data for business decisions.

Architects Not Just Construction Workers

Let me illustrate the point with a concrete example. Many years ago, a coffee brand that sells its products through direct sales invited me to join them. They were looking for a data analytics consultant to help them with customer churn. The Head of Customer Relationship Management got straight to the point with his first slide: “Here’s our data. Your job is to determine for us who will churn.

It should become apparent at five stages that the task was formulated somewhat short-sightedly.

The first question that did not arise from the data was: “What is a churned customer?”. Because the brand did not have a subscription model. The customers only showed irregular ordering behavior. So we decided not to predict the “churn”, but the time until the next order.

The next question to be answered is which data should be used as input. Sure, you could simply take “all” the data that belongs to a customer. However, a customer sometimes has a long history. 

How relevant is the data on purchases from 3 years ago for customer churn today? It turns out that “common sense” helps to achieve better results faster.

Then there is data that describes the context or situation and does not directly relate to the customer. Seasonality, the weather, advertising activities and much more can have an influence on customer behavior. Much of this data can be obtained with limited effort, but is not included in the “huge” data set.

Again, it is the technical understanding, not the analytical competence, that leads to a successful analysis.

But the story continues. With the “right” input data and the appropriate target variable (time until the next order), we were now able to train a cutting-edge machine learning model that estimated the order time with astonishing precision.

Nevertheless, it turned out that this approach was still missing the point to a large extent. For the coffee brand, it was of course much more important not to lose customers who brought in a lot of sales. In this respect, the objective function “predict the next order process” was not aligned with the actual objective “avoid losing sales”. 

One way of dealing with this is to weight the data records with their customer value accordingly in the machine learning process. A machine learning method that allows this must be used.

Again, an understanding of the content of the actual objectives was required in order to align the analysis method with them. This understanding cannot be found in the data.

This case study is still not finished here. 

The machine learning model now predicts the time until the next order, whereby the forecasts for customers with lower sales are expected to have a higher error than those for customers with high sales, as the analysis worked with weighted data. 

What do we do now with this forecast? We can filter out customers with a very high current value and make them happy with customer loyalty measures. But what threshold value should we use? Another question that the data does not answer.

Looking at the two errors (so-called alpha and beta errors) quickly makes this clear. Figure 1 illustrates this. No matter which threshold value we use, it will happen that the forecast selects a customer, but this person actually has a lower value. This is a clear false alarm. The cost of this error is that the customer retention campaign would not have been necessary. For the coffee brand, this was 20 euros.

The second error is that a customer is not selected in the forecast, even though he is a churned customer. The cost of this error lies in the customer value of the lost customer multiplied by the probability of the measure being effective. For the coffee brand, this was on average 240 euros times 30% = 80 euros.

When I talk to marketing managers about churn, I often hear statements like “We have a hit rate of 90%. That’s not bad, is it?”. In fact, the hit rate is a number without any meaning. 

Why? It’s not about how many churners and non-churners are detected, it’s about avoiding the expensive mistakes. In our case, not detecting churners was four times as expensive as a false alarm. The threshold value was therefore selected in such a way that the opportunity costs were minimized by sending too many customer loyalty campaigns.

Let me summarize. It was essential to incorporate management context knowledge in at least 5 places

  1. The task was already set incorrectly. The task should not be to predict which customer will churn. Rather, the task is to control customer retention campaigns in an ROI-optimized way. That is a huge difference.

  2. It was a question of content as to which customer is now actually considered to have migrated. We opted for a non-binary target. In other words, we decided against categorizing customers as “churned”, as this is not actually known.

  3. Even the selection of data, which is necessary due to long customer histories, was a question of content. Here it turns out that central situational information that influences success still needs to be alluded to

  4. It turned out that it was necessary to weight the analysis with customer value to ensure that the machine learning algorithms were pursuing the same goals as marketing management.

  5. Last but not least, the threshold for a “churn classification” and customer retention campaign was chosen to optimize ROI, not hit rate.

Hardly any data science knowledge was required for all this controlling knowledge that made the results successful. But it was crucial for the meaningfulness of the whole process.

Sometimes a picture is worth a thousand words. Do you know this too? You go to the garage because something is squeaking on your car. The mechanic thinks that the shock absorbers need to be replaced, the tires should also be replaced, he says, and the wiper fluid with antifreeze too.

Nothing he recommends is wrong per se. But you were only concerned about driving safety and not the squeaking. You don’t need anti-freeze in your region because you don’t plan to drive in the mountains. You only drive your car occasionally and the tires would certainly last another two years.

It’s your car and only you know what to do with it and what you want. A car mechanic can’t do that for you.

A more controversial but all the more fitting image is that of a doctor. Some people go to the doctor and blindly follow their recommendations. They thoughtlessly take symptom-relieving medication, unaware that this is usually counterproductive for the healing process.

Every medical treatment has an alpha and beta error. Every medical treatment has the risk of side effects (analogous to the cost of a churn measure) as well as the risk of disease consequences if not treated. 

Who should weigh up these two errors? The doctor or the patient? 

Experts tend to throw smoke and mirrors with their in-depth specialist knowledge. They have an unconscious interest in underpinning their raison d’être. They know a thousand problems and reasons why simple strategies are problematic. 

Data scientists are no different. Marketing management that takes responsibility is therefore crucial for success.

What do you think? Would an engineer 100 years ago ever have come up with the idea of offering a Model T that the customer couldn’t configure? 

“You can have any color as long as it is black”, Henry Ford is quoted as saying.

Only standardization made it possible to produce a car that everyone could now afford. I am sure that engineers must have dismissed this bizarre idea as nonsensical at the time. 

Your data science engineers need your guidance, just like Henry Ford’s engineers needed it. Data is not gold. They are building blocks. It takes architects, not just builders, to build a great house.

Marketing Managers Must Take Responsibility

In addition to the myth that “data is gold”, I am experiencing another unhelpful school of thought at management level. 

The principle of “management by plausibility” works like this: look at the results of the data scientists and if they are not plausible, send the “geeks back to the basement”. 

I had my own personal “aha” moment almost twenty years ago when I was in charge of a sales team in Corporate America. I sat down with the sales managers every month to look at the figures in the performance review. 

“Why has sales in this region slumped here?” I asked Joachim. He expanded and told me three very plausible details about customers who were probably the main contributors.

Suddenly I realized that the data filter was accidentally set to the previous year. I changed it and suddenly there was an increase in turnover. Joachim took a quick breath and began with three very clear developments that justified these figures.

I realized that the main part of management’s job was to listen to subordinates’ stories, check them for plausibility and provide further impetus for action.

The more I thought about it, the more I realized that plausibility is not a particularly good indicator of truth. It merely says whether a story is consistent with previous beliefs. Plausible stories may be true. But there is a high probability that they are absolute nonsense. Non-plausible stories can also be true. In fact, the most groundbreaking findings are “implausible” because they contradict current false beliefs.

The same applies to data science. Marketing management should not see itself as a reactive controller. Rather, the specialist knowledge of the specialist department is the decisive input for successful data analysis and not just a controlling authority.

Data alone is “nothing”. Without data, everything is nothing. Artificial intelligence can create value from data. But only if it runs in the right direction with human expertise.

To use a metaphor, management by plausibility is like a mute restaurant guest. Someone chooses the food for him and the only thing left for him to do is to spit it out if he doesn’t like it.

The new model is a restaurant guest who chooses their own food. It is not necessary for them to be able to cook themselves. Knowledge of ingredients is nevertheless beneficial. This way, the guest can not only be sure that the food will taste good. They can also ensure that it is healthy and easily digestible and thus meets their expectations.

Of course, a restaurant guest with little education who knows nothing but fast food will have little control. But that’s the way it is in life. You don’t have to be an expert in everything. But it is helpful if you take responsibility for your life, your body and your relationships and acquire a little wisdom about what is useful.

This is how marketeers should behave. Take responsibility and ensure that data science is moving in the right direction.

Data Scientists Are Not The Problem

I don’t want to say that data science is going in the wrong direction or that we are simple-minded idiots. On the contrary. Many are doing a great job. Many save what management fails to do. 

But trusting this is negligent. It’s like getting into a cab and trusting that the driver will know where you want to go. When in doubt, the cab driver will go where he wants to go. 

From Correlation To Causality

The scandals about discriminatory machine learning models are another example of how central the content framework that you set as management is.

A prominent example of discrimination through machine learning is a lending system used by a large bank. Research showed that this system systematically disadvantaged applicants from certain ethnic minorities by offering them less favorable credit terms or rejecting their applications more frequently, even when their financial circumstances were comparable to those of preferred groups. 

Another case concerns software used in the US justice system to predict criminal recidivism. Studies showed that this system falsely classified black people as future recidivists significantly more often than white people, although this was not the case with otherwise identical personal histories.

The reason for the discrimination that arises is a misunderstanding that even many experienced data scientists still fall prey to. They train a machine learning model and use as many descriptive properties (e.g. skin color) as possible to then predict a target property (e.g. credit default). The problem is that the descriptive characteristics are often related and therefore correlated. For example, people with white skin color tend to have a higher salary. 

Even if salary is the true cause of creditworthiness, many machine learning algorithms also use information on skin color. “The main thing is that the prediction error in the data is small” is the algorithms’ motto. 

But the motto should be “the main thing is that the information used is causally correct”. This motto will accompany us throughout this book. It is the guiding principle of all Causal AI methods. We will see how it leads not only to non-discriminatory models, but also to more stable, better models. 

Based on their data analysis, lottery companies are usually convinced that their target group is older people. This conclusion comes about because we tend to equate correlation and causality. Conventional machine learning approaches do the same – only in a multidimensional space.

Wise managers question this. Causal AI methods on lottery company data revealed a different picture. All other things being equal, older people are less likely to start or increase their lottery play. Nevertheless, the fact that older people play the lottery more is true. 

Causal AI clarified the paradox. Over time (i.e. years of life), playing the lottery becomes a habit. It also increases the chance of winning something. This winning experience in turn increases customer loyalty. Similar to the discriminatory models, it is not a personal characteristic (age), but certain experiences that customers have that are the cause of their behavior. 

The consequences of this realization for management could not be greater. The target group is the young, not the old. The goal is habitualization, not increasing casual play by promoting jackpots. Frequent winning experiences keep players happy, while the promotion of jackpots leads to customers playing less often and waiting until the money pot is full again.

The other day I wanted to play football outside with my two children. I took a look at the terrace. It was wet. “Boys, we can’t play football, it’s raining,” I said. My boys were not happy with my announcement and stormed out onto the terrace with a view of the sky. “It’s not raining” they said.

Indeed. I had behaved like many data scientists and many managers. I had confused correlation and causality. The terrace was still wet from the morning’s rain. But now it hadn’t just stopped raining. The clouds were even breaking up here and there.

In the same way, conventional machine learning algorithms like to use substitute information to predict target values. When used in practice, this approach often goes wrong.

What do we learn from this?

On my way to the office in the morning, I usually stop at an Italian coffee cart and get caught up in a “chat” with other guests. When asked “what do you do for a living”, I usually answer “my company gains insights from data with artificial intelligence”. Most people then say that I work in the “IT industry”, which always confuses me.

The situation is similar with data analysis in companies. When it comes to analyzing data, you either think of IT specialists or data scientists. But let’s be honest. What job today doesn’t involve data? What job doesn’t use a computer? If my grandpa was still alive, he would describe my job as “something to do with computers”.

My call in this book is to understand the profession of marketing managers in such a way that they also take responsibility for analyzing their data and managing its content. For this task, I would like to show in this book what needs to be considered in order to make effective, data-based decisions.

You don’t need a medical degree to become an empowered patient. It’s enough to internalize a few guidelines and ask the right questions. For example, I ask doctors when I meet them

  • Does this medicine treat the cause or the symptom?

  • How do the side effects and their probability compare with the possible consequences and their probability if I do not take the medication?

  • What are the consequences if I postpone the decision (wait & see strategy)?

  • If you were in my shoes, what would you decide?

I inform the doctors about my goals (e.g. avoiding suffering or accepting suffering if it benefits my health in the long term). It turns out that some doctors don’t really know how to deal with mature patients. 

This can also happen with data scientists. “If you have a hammer, every problem looks like a nail”. If your requirements do not fit into the toolbox of your in-house data scientist, you may need to talk to them.

Some doctors then use lots of technical terms and a lot of Latin. A data scientist could try to rebuff you with similar eloquence. That’s why rule number 1 applies: if something is not understandable, just ask. 

Another analogy from the medical world should encourage you to do so. The study and practice of medicine is 99% about curing disease and serious illness. However, this significant expertise has limited use for the questions of how to maintain health and achieve physical resilience. Every specialty has its blind spots. 

Therefore, don’t assume that a data scientist will get the best possible insights from your data just because they have a Ph.D. in it.

So let me repeat it once again: for you as a marketing manager, data analysis is YOUR job. Because data analysis today means nothing more than “learning from experience”. What you learn about your field should be important to you. You should not leave this to a black box.

Learning #1:

Marketing expertise is the key design component of data analysis.

Learning #2:

Equating correlation and causality is the cardinal error that unites management and data science to this day.

IMPRINT
Success Drivers GmbH
Johann-Heinrich-Platz 4
50935 Cologne
Germany

Success Drivers GmbH holds the copyright for any material, included translations, published at www.success-drivers.de