Clients who are kicking off a market research effort often ask how to get “statistically significant” results. Some vendors may be quick to provide a number to move a project forward. At Coleman, we pride ourselves on being transparent thought partners. Our main goal is to deliver actionable insights, so we can work with clients to help them make decisions with more confidence. It’s important, therefore, to break down the complexities around statistical significance in market research and uncover why the desired answer is not often achieved by spitting out a number from a quick calculation.
Many of us who have taken an introductory course in Statistics (and even those who haven’t) are familiar with the scientific-sounding terminology. There are user-friendly equations and tools sprinkled all over the internet to “calculate” margin of error, confidence levels, and statistical significance. Simply plugging numbers into an equation may result in an answer, but what does that answer mean?
The underlying assumption that underpins statistical equations is based on the concept of probability sampling, which states that each member of the population of interest has an equal and non-zero chance of being included in the study sample. What’s an easy way to know if our study meets this requirement? Think about the target audience you are trying to reach in your survey. Do you have a list of every single possible participant that meets your desired qualifications for the study? For the majority of market research efforts, the answer is a resounding “no.”
If you aren’t starting with a list of all participants, also known as a sample frame, then your study is not being conducted from the foundation of a probability sample, but rather one of the often-used non-probability sampling methodologies that dominate the landscape of market research – convenience sampling, snowball sampling, quota sampling, river/intercept sampling, and the like.
If you are not conducting a survey with probability sampling techniques (simple random sample, stratified sample, etc.), then you cannot mathematically quantify the risk of making inferences about the total population based on the results acquired through a smaller sample of that population. This encompasses buzzword terms like margin of error, confidence level, or statistical significance. Those numbers don’t mean anything unless you are using probability sampling. This is true for face-to-face, phone, and online surveys alike. This is true for studies with 100n sample sizes and those that collect information from thousands of respondents. This is true for B2B and B2C studies in all geographies. This is true for every market research sample provider who does not begin with a sample frame for your unique study.
Now that we know that most market research is conducted with non-probability sampling techniques, what does that imply about the validity of the results? Because we cannot mathematically calculate the probability that the responses we collect do or do not represent everyone outside of the sample, extreme caution should be used when attempting to make any inferences about the larger population. There is no way to guarantee that the respondents who took your survey are representative of the population as a whole. There is also no way to mathematically prove they are not, either – we just don’t know! It is, however, valid and potentially valuable to use descriptive statistics to analyze the data from the respondents in your study.
For example, suppose 25% of men 18-24 years old in your study use Product A. You can gather information about why they chose Product A, how much they pay for Product A, what they like and dislike about Product A, and if they would recommend Product A to someone else. This can provide a lot of valuable information. It would be unwise, however, to suggest that 25% of all men 18-24 years old use Product A. It would also be incorrect to assume the other young men not included in the study would pay the same for Product A, like or dislike the same features, and so on.
Sure, the more data points you gather from a particular target audience, the more you may feel like the results should mirror the total population. However, without the use of probability sampling, we have no way to know or quantify the likelihood that the sample results might be representative. Furthermore, it is often cost or time prohibitive to survey all or most members of a population – this is the whole reason we use smaller samples for research in the first place.
In summary, be wary of vendors who “plug-and-chug” statistical equations to provide magic numbers that make your sample “significant.” Odds are that they are using non-probability sampling techniques, and therefore the equations used to calculate such numbers are invalid. Ultimately, the results of a market research study, regardless of sample size, cannot be mathematically projected onto the population as a whole, but rather should be used to gauge the experiences and sentiments of the participants in the study alone.