Star Rating or Satisfaction Score: What does a number mean?
A fictional story. Imagine running a transportation service for individuals with disabilities. This is a shared service with limited seating and a limited pool of drivers. There is also an app that lets users schedule and monitor vehicles. Because you are data-driven, after each ride, the app will encourage users to rate the driver on a 1-5 rating scale based on their satisfaction score. Build a dashboard and monitor it over time. The average is 4.67. First, I set the overall target of 4.3 as the minimum and 4.6 as the stretch target. You will beat your stretch goal! yay. Simple: 4.67 Satisfaction score is pretty good, so everything is running smoothly, right? right?
it depends
Well, like you say, “The devil is in the details”…humans are complicated. They can see the same question, the same context, the same everything, but it’s a different interpretation. Not to mention artificial intelligence (AI). All the ingredients are there. Still, something’s off…
So is the 4.67 satisfaction score good? Those who work with me in the data (particularly research and evaluation) probably already have heard the answer.
It depends on how you interpret the outcome and what you are going to do about it.
If you’re not trying to take any action, that’s pretty good. But why collect data in the first place?
What does a star rating of 4.67 mean?
I hope you don’t plan your actions based on a single metric (not to mention the averages that were magically created from stars), but I’ll assume that the number means a lot to your organization. Let’s first look at the pros of a single question approach:
You care.
Show that you care about your customers and ensure that all drivers operate according to the strict standards set. Collect data.
Data collection is scalable, consistent and “reliable” as long as the app works. Long research will not overwhelm your customers.
A single question. Just after the ride is finished, always the same, always the last, and at the same time. Consistency is important. Monitor your data.
It’s not as a single metric, it’s become popular over time. A good start! Segment data.
There is an aggressive plan to act immediately if something happens, such as a vehicle, a route, a driver. It’s good that you have a data strategy. We will make decisions and act on the outcome.
In the long run, you don’t know how many dashboards will die without meaningful decisions based on them.
What is the problem with this approach?
Ah, more details… Before we get into the details, let’s start with an experiment. Read this article now wherever you are. He says the word “fine” loudly. Simply say the words. Hopefully you didn’t raise some concerns. Now, imagine the following scenario where the answer is the same word “fine”. You don’t need to say it out loud unless you really want to entertain the people around you.
a) Boring mother scenario
After three missed calls from her mother, she finally picked up the phone and told her, “How are you?” – it’s okay.
b) Manager’s fear scenario
The manager unexpectedly asks to come to the office (or a simple one-on-one virtual call) and says, “How are you?” – okay (?)
c) Your phone is important for US scenarios
After three transfers and 45 minutes of customer service, a fourth department agent will finally answer the call. With all your enthusiasm, the agent opens a combo: “How are you?” – okay!
Context and Perception Issues
What does this experiment have to do with the satisfaction survey? Context and perception issues! Who will ask you questions, when they ask you questions, how they ask you, how often they ask you questions… All the details are important.
Your answer may be the same, but what that means may not be the case. When you are in a direct conversation with someone, they can read your tone, your body language, etc. However, sending survey questions is different. You’re losing context. Are you sure you’re measuring what you’re measuring? Are you sure your data is reliable? Are you sure your “insight” is correct? bamm, there are many things to consider!
In my Data Literacy Workshop, I collectively call these potential issues BAMM (bias, assumptions, myths, misunderstandings).
Here are some details about what’s wrong from end to end when I get Bamm’:
Lack of context
You have an agenda and goals in mind. However, it takes too long to explain the context, so I just summarize it in my question. All context stays in your mind. On paper, it is a single sentence for interpretation. Selection bias
You need to decide on your audience. everyone? every time? sample? An anonymous, pseudo-anonymous, tracking user ID? This brings data privacy and data security to the mix. Misunderstandings and misunderstandings
You will then need to determine the exact word you are using. every. single. words. problem. (Have you ever tried to get consensus on a simple research question through marketing, legal, product, HR, etc.?) Misunderstandings of data classification
You need to decide what data you are collecting. The type of question you are asking determines the data type (but should, although it does not fall into data classification here). True or false? Likert scale? slider? Single Select? Multi Select? matrix? Open text? combination? Timing of the investigation
Finally, we land on Questions and Types. Who will ask this question? when? how? Validity issues
In our story, they decided to focus on the driver and include the question in the app right after the ride is finished. Data is valid for one purpose, but for another. For example, it’s fine to use letters on discs to talk about preferences, but don’t use them to puncture people holes in their work. Interpretation and context
Customers receive questions. Remember the “great” experiment? The context in which the customer answers the question is important, but you only get stars. The stars can capture emotions that are unrelated to what you actually want. bias
Conscious and unconscious factors can interfere with customer responses. Road anger, for example, is often an impulsive response to past experiences. Loaded Questions
every. words. problem. For loaded questions, you will receive loaded answers. For example, you could use positive words to express your question verbally, such as “Please tell me how good your customer service representative is…” Ambiguity
What is a one star and two start? The customer selects the number of stars. In your mind there is a context related to each star. One is a shortpper, which requires immediate intervention. Five are great experiences. Well, again, it’s in your heart. I know people who never give one or five. They book it for extreme events. Data manipulation
You will receive the data. But we’re not talking about stars anymore. The 5-star rating is changed to numbers to assume a smooth scale of 1-5. Is getting 3 to 4 really the same as going from 4 to 5? Technically, we introduced rounding errors. If you treat data as a continuous 1-5 range, but don’t let the customer choose any number, you round the results to an integer. Use round values
Calculate the average. Rounding is fine, but you should be aware that you use rounded values for further calculations. Essentially, you force your customers to choose an integer, but do you argue that the second digit is important on average? Also, will that be average? Median? Do you want to see the distribution? Outlier? Shape of your data? Or just a simple single number.
And the list could continue…
What other biases could interfere?
Your app will pop a satisfaction score question at the end of the ride. This can lead to a bias in survival as it only gives you feedback when there is a ride. How about cancellations? Don’t you want to know how happy the customer is when they have to cancel their ride?
In general, people tend to submit a more positive response with satisfaction scores than they actually feel. This may be a combination of factors. Social expectations, want to maintain services because there is no alternative. Choosing the answer they expect is how they feel. If you have multiple questions, the order of questions can interfere. The first answer might “ancho” the rest. The order of options can also be an issue. There are ways to mitigate bias, but that is only possible if you recognize their potential and have a plan ahead.
How can I improve my question to alleviate bias?
One approach is to provide conditional open text if the answer is not liked. If you do that with a single question, it will help your customers scale their choices. Make sure it is optional. Now you have both the quantitative and qualitative data you want to use. That’s more subtle.
However, if you have multiple questions using the same method within a survey, you may be perceived as potentially annoying because you are extending the survey time. People already hate investigations, so when they feel that length is “cheating” it gets ugly.
Final Words About 4.67
Let’s go back to our story. Interpreting 4.67 as the overall satisfaction score for the ride can be misleading. Always make sure you measure what you are trying to measure. This provides practical insights for the purpose created. When I ask about the driver, the data is about the driver, not about the drive itself. Personally, for learning research, I found that using Will Talheimer’s approach can provide more practical and meaningful data for many of these factors mentioned above. [1].
References:
[1] The effectiveness of learners’ research and learning is Will Talheimer
Originally published on www.linkedin.com
Source link