Way back in 1954, journalist Darrell Huff wrote this quirky little book as a ‘primer’ on how to lie with statistics and how to spot the tricks. It’s only 124 pages and filled with cartoons. You don’t need to know much about statistics and could finish it in a couple of hours. Despite some rather dated examples it’s as relevant today as it was just over 60 years ago. You’d be surprised to see how many anecdotes have modern day equivalents.
“The secret language of statistics, so appealing in a fact minded culture is employed to sensationalise, inflate, confuse and oversimplify.” While statistical methods and reporting are necessary, without writers who are honest and readers who know what they mean, the “result can only be semantic nonsense”.
Huff wryly notes that although one does not always intend to deceive, “the fact is that statistics is as much an art as it is a science…often the statistician must choose between methods, a subjective process…in commercial practice he is about as unlikely to select an unfavourable method as a copywriter is call his sponsor’s product flimsy and cheap when he might well say light and economical”
Surveys and statistical claims are everywhere and it makes sense to have a closer look at the numbers. By the same token it makes no sense to reject all statistics. To do so is like refusing to read because writers sometimes use words to hide facts.
The ‘tour’ on how charlatans lie with statistics and how to spot it includes:
Use a sample with a built in bias. If your sample isn’t representative and random or is too small, then your study may be less reliable than an intelligent guess. Bias creeps in very easily and even if the rest of the study is letter perfect, the result can be rubbish. Huff recalls the Literary Digest poll on the 1936 USA presidential election, which predicted Landon to beat Roosevelt by a landslide. Despite a huge sample, the researchers forgot that landline phones and magazine subscriptions were not the norm and come voting day, Roosevelt trounced Landon.
Even back then random samples were expensive and difficult and so stratified random samples were popular. That’s where the problems start, since the strata is often subjective. Bias in selection is everywhere, even the unconscious bias of face-to-face fieldworkers on who to approach. A study showed that simply altering the demographic of the interviewer produced wildly different results.
One should take special care of surveys with low response rates. There is nothing to say that the opinion of those who respond are reflective of the ones who don’t. Say 80% like your product, but only 5% responded to the invite: be careful what you read into the numbers.
In a survey there can be 3 layers of sampling: the population sample that is often far from random, a questionnaire which is just a sample of the questions you could ask, and the answers that may only be a sample of the respondent’s possible attitudes.
Being unclear about which average you use. There are 3 types of averages, the mean (the average of all responses), the median (the midpoint number) and the mode (the most frequent number). Beware the average that doesn’t explain the average of what. Like the estate agent who showed the average income in the area as $10,000 using the mean by including a few highflying millionaires, and then showing the same area’s average income as $2,000 in trying to keep rates down by showing the median.
Leaving out the little figures. Be wary of reports and infographics that tell you nothing about the sample or how it was derived. Small sample sizes are wonderful for producing misleading statistics. Take Doakes who wanted to show that their toothpaste reduced cavities. So they ran a number of experiments with tiny sample size, until by chance one popped up with a juicy 23% reduction.
It’s not only the numbers that mislead, words can do the job too. Like saying 75% of farms had electricity available with no explanation of what ‘available’ meant.
Make a fuss about nothing. Statistics have a margin of error. Your answer lies within a range and making a big deal about an insignificant difference can lead to silly behaviour. And no, a 2% difference on a sample of 100 is not significant. Don’t pretend it is.
Huff notes marginal differences on rankings are also open to abuse. Someone has to be top and someone has to be last, no matter how small the difference. Like the time Gold cigarettes advertised they had the least poison in their product. Except the difference was miniscule and the advertising board made them withdraw the claim.
The gee-whizz graph and distorted pictures. People are terrified of numbers, and when words won’t do, we use pictures. There are loads of ways to deceive here and you don’t even need to lie. Take a line graph showing say a small percentage change in revenue from last year. If your vertical axis runs from 0-100 the line may look pretty flat. Make it from 0-10 and all of a sudden the change can look big. It’s the visual equivalent of adding the adjective ENORMOUS in front of an innocuous figure.
Infographics are also fertile ground for visual deception. Huff uses the example of wanting to show income in one area was double another. Instead of showing numbers, he used 2 bags of money, but one was 4 times the size of the other. Even though the number was accurate, the visual deception can be effective in exaggerating the difference.
Like the map that showed the percentage of USA income that went to the government. By highlighting big, sparsely populated states, it looked like most money went to tax, but you could just as well have highlighted New York and Pennsylvania for a very different visual effect.
The semi attached figure. By attaching two numbers that look like they are linked, but aren’t really you can produce a statistical sleight of hand. A bit like a politician talking about a completely different topic when asked a difficult question. Take the advert for a juice extractor that claimed to extract 26% more juice. 26% more than what one might ask? Turns out it is 26% more than doing it by hand, not more than competing extractors.
Watch out for tricks like Governor Dewey used in a New York election campaign. He claimed to have moved the minimum wage for teachers saying some teachers in (rural New York state) districts earned less than $900 last year, while this year he noted, in New York City the minimum wage was $2,500. Sounds impressive even though the numbers actually have nothing to do with each other.
Correlation is not causation. Confusing the two can lead to totally unwarranted conclusions. Whether the link is pure chance on a project, or the numbers are completely unconnected or there is a link but you can’t tell cause and effect. Correlations are a lot easier find, but don’t necessarily mean anything. I personally think this is one of the bigger sources of confusion in studies today.
If all else fails, make it sound like your work is very scientific. Add a decimal place and your answers appear so certain. 49.6% sounds so precise. Much better than saying around half. Except most of the time you can’t be certain of the 49 much less the .6.
Huff finishes off with 5 questions we can use to check the veracity of a statistic. As researchers all our reports should answer these clearly.
Who says so? It is useful to know who sponsored the project. Plus, look out for the ‘OK’ name. Just because Mr. X from Harvard says so, does not mean Harvard the institution says so…be sure the name stands behind and not alongside the assertion.
How does he know? It also helps to know the response rate. There are numerous examples to show you cannot extrapolate the answers of those that respond to those that don’t. Look out for samples that are small, biased, or unrepresentative.
What’s missing? Be wary of statistics with missing data, like the sample size, response rate or a lack of context. Take the company that said sales are up 25% on last March (and forgot to mention Easter is in March this year, and April last year)
Did somebody change the subject? Watch out when the writer suddenly switches the subject. Like saying it is more expensive to keep someone in a prison than a hotel. One figure shows total cost, the other room rent.
Does it make sense? There is an old saying that common sense isn’t so common. If the extrapolations sound a little wild then apply some before just believing what you read.
I think the book is a worthwhile read. There seem to be numerous sites offering pdf copies of the book for download. You can also get a paperback version on Amazon if you prefer that format.