High-Stakes Tests Are Misleading

Jonah Lehrer, writing last month in the Wall Street Journal, makes the point that the kind of high-stakes testing popular in education is not very useful.

He gives the example of supermarket cashiers who were given brief time trials to see how fast they worked. Careful analysis of electronic scanning records (for which cashiers did not know they were being tested) showed very little correlation between long-term performance and the brief time trials. Conclusion: The test results had no bearing on actual performance. “Maximal performance” is not equal to “typical performance.”

Lehrer continues to say that we are obsessed with maximum performance, because it is quick and results in a number (or a few numbers), which allows us to easily rank people using minimal resources. But the results of high-stakes tests (LSAT, SAT, NFL Combine, for example) have very little correlation with career success.

High-stakes tests are convenient for the testers, and as comforting as a warm bath. However, they are not useful, says Lehrer, because success depends on character traits, such as persistence and self-control, which cannot be measured in brief tests. (But see another Lehrer article, this one from the New Yorker, which discusses psychological studies of self-control in young children, and their correlation with future success.) Rather, such character traits are apparent only after careful observations of typical performance over a long period of time.

Lehrer:

Do people persevere, even in the face of difficulty? How do they act when no one else is watching? Such traits often matter more than raw talent. We hear about them in letters of recommendation, but hard numbers take priority.

The larger lesson is that we’ve built our society around tests of performance that fail to predict what really matters: what happens once the test is over.

Similar points about the inability of brief testing to predict future performance are discussed by Malcolm Gladwell in his book What the Dog Saw (see the chapter entitled Most Likely To Succeed: How do we hire when we don’t know who’s right for the job?, pages 314 ff).

Gladwell compares the difficulty in predicting who will be a good teacher or a good NFL quarterback. Both jobs are extremely complex, and it is difficult to identify just which combination of skills makes one good at either. Gladwell notes that the perceived need for improving the quality of teaching usually results in calls for more rigorous standards in teacher training, particularly in academic and cognitive aspects. However, research (quoted by Gladwell) shows that neither teacher certification nor graduate degrees (both of which are expensive and time-consuming) make “a difference in the classroom.”

Gladwell:

Test scores, graduate degrees, and certifications — as much as they appear related to teaching prowess — turn out to be about as useful in predicting success as having a quarterback throw footballs into a bunch of garbage cans.

Gladwell suggests that standards for incoming teachers be lowered, as is done in the financial industry, so that many more candidates are considered. After a period of apprenticeship, where they actually teach for relatively low salaries, the good candidates can be made permanent employees, and the others can be released. However, this would require rejecting about 3/4 of the apprentices, something that Gladwell doubts that unions will ever agree to. And it will also require raising the salaries for experienced teachers, to make it worthwhile for apprentices to embark on the journey, and taxpayers may not accept this. But Gladwell says that the financial industry has been doing this successfully for years.

What does it say about a society that it devotes more care and patience to the selection of those who handle its money than of those who handle its children?

This is the kind of fresh thinking that we sorely need, but instead our politicians fall back on the old stereotypical blame of teachers, or firing of teachers.

I would also add that besides some innovative ways of selecting good teachers, we should also support the improvement of existing teachers with collegial development programs, as they do in some countries (Liping Ma‘s description of how it’s done in China comes to mind).

(This post first appeared at my other (now deleted) blog, and was transferred to this blog on 22 January 2021.)