So much for expert judgement
In a test of scientists abilities, the same data was sent to 27 teams of researchers in cognitive psychology. The idea was to test the theoretical inferences they drew. But those expert teams drew conclusions from identical data that varied, oh boy, all the way from “zero to 100 percent.” One of the research team described it as a “jaw dropping” result — where only one third of the experts made the correct inferences about what that data meant. Two thirds of the experts were either totally wrong or just operating “a bit better than pure guessing”.
What are we teaching at universities?
Researchers test expert inferences against known data, find inconsistency
What they found was “enormous variability between researchers in what they inferred from the same sets of data,” Starns says. “For most data sets, the answers ranged from 0 to 100 percent across the 27 responders,” he adds, “that was the most shocking.”
Rotello reports that about one-third of responders “seemed to be doing OK,” one-third did a bit better than pure guessing, and one-third “made misleading conclusions.” She adds, “Our jaws dropped when we saw that. How is it that researchers who have used these tools for years could come to completely different conclusions about what’s going on?”
Starns notes, “Some people made a lot more incorrect calls than they should have. Some incorrect conclusions are unavoidable with noisy data, but they made those incorrect inferences with way too much confidence.
To determine if researchers can use these tools to accurately distinguish memory and bias, the UMass researchers created seven two-condition data sets and sent them to contributors without labels, asking them to indicate whether or not the conditions were from the same or different levels of the memory strength or response bias manipulations. Rotello explains, “These are the same sort of data they’d be confronted with in an experiment in their own labs, but in this case we knew the answers. We asked, ‘did we vary memory strength, response bias, both or neither?'”
The volunteer cognitive psychology researchers could use any analyses they thought were appropriate, Starns adds, and “some applied multiple techniques, or very complex, cutting-edge techniques. We wanted to see if they could make accurate inferences and whether they could accurately gauge uncertainty. Could they say, ‘I think there’s a 20 percent chance that you only manipulated memory in this experiment,’ for example.”
Starns, Rotello and Cataldo were mainly interested in the reported probability that memory strength was manipulated between the two conditions.
Not just psychologists:
“We’d be stunned if the inference problems that we observed are unique. We assume that other disciplines and research areas are at risk for this problem.”
Jeffrey J. Starns et al (2019) Assessing Theoretical Conclusions With Blinded Inference to Investigate a Potential Inference Crisis. Advances in Methods and Practices in Psychological Science, 2019; 251524591986958 DOI: 10.1177/2515245919869583