Monday, July 5, 2010

Are student evaluations an appropriate measure of teaching performance?

This question has been raised (again) by two recent news stories. The first was a study published in the Journal of Political Economy, by Scott E. Carrell and James E. West. Carrell and West used data from the USAF Academy, where students are randomly assigned to required courses. To measure how well professors taught students in those required courses, the study looked at student performance in higher level classes that required those courses as a prerequisite. They refer to this as a "value-added measure." (How much value was added to the students ability to learn, in this case, mathematics?) The value-added measure was compared to student evaluations. If student evaluations are a good way to measure teaching performance, then students would give high marks to the teachers who prepared them well for the higher level courses.

Carrell and West discovered, however, that just the opposite is true. Students thought that the best teachers were the ones who did the worst at preparing them for the higher level classes. Hence, Carrell and West conclude, "students appear to reward higher grades in the introductory course,  but punish professors who increase deep learning."

In the second story, the Texas educational system has taken the notion that students are effective evaluators of teacher performance to a new extreme by letting students decide which professors get teaching rewards of up to $10,000. Dr. Stanley Fish railed against this scheme in a recent column for The New York Times. Fish writes,
Once this gets going (and Texas A&M is already pushing it), you can expect professors to advertise: “Come to my college, sign up for my class, and I can guarantee you a fun-filled time and you won’t have to break a sweat.” If there ever was a recipe for non-risk-taking, entirely formulaic, dumbed-down teaching, this is it.
Part of the problem of using students to evaluate teaching, Fish argues, has to do with "deferred judgment."  Time, sometimes years, if often required to fully understand what you have learned from a class, especially a class well taught.
And that is why student evaluations (against which I have inveighed since I first saw them in the ’60s) are all wrong as a way of assessing teaching performance: they measure present satisfaction in relation to a set of expectations that may have little to do with the deep efficacy of learning. Students tend to like everything neatly laid out; they want to know exactly where they are; they don’t welcome the introduction of multiple perspectives, especially when no master perspective reconciles them; they want the answers.

But sometimes (although not always) effective teaching involves the deliberate inducing of confusion, the withholding of clarity, the refusal to provide answers; sometimes a class or an entire semester is spent being taken down various garden paths leading to dead ends that require inquiry to begin all over again, with the same discombobulating result; sometimes your expectations have been systematically disappointed. And sometimes that disappointment, while extremely annoying at the moment, is the sign that you’ve just been the beneficiary of a great course, although you may not realize it for decades.
Needless to say, that kind of teaching is unlikely to receive high marks on a questionnaire that rewards the linear delivery of information and penalizes a pedagogy that probes, discomforts and fails to provide closure. Student evaluations, by their very nature, can only recognize, and by recognizing encourage, assembly-line teaching that delivers a nicely packaged product that can be assessed as easily and immediately as one assesses the quality of a hamburger.
The problems associated with student evaluations have been know for decades. In the many studies I have seen on the topic, not one has shown student evaluations to be an effective measure of teaching performance. (If you find one, please let me know.) Nonetheless, student evaluations are used as the main source, or the only source, of information to evaluate teaching performance. This incentive structure is known to lead to teaching that tries to be entertaining, lightening of the student's work load, and grade inflation (which has continued to rise since the introduction of student evaluations in the 1960s).

In one now well known experiment, two people were asked to teach the same class on the same topic. Both lectured on the topic and had time at the end for some Q&A with the students. One was a professor and expert on the topic. The other was an actor. The actor was charming, charismatic, funny, and bluffed his way through the whole lecture and Q&A. (You can probably guess where this is going.) The students by far thought the actor was the better teacher, and (here's the kicker) more knowledgeable on the topic.
The obvious question then becomes, if we know that student evaluations are poor measures of teaching performance, and have known this for decades, why do colleges continue to use them to measure teaching performance? The answer, I believe, has to do with a problem common in the social sciences--the data that is most likely to be used is the data that is easiest to collect. Deans and hiring and promotion committees rely upon this data because that is the data they have. They look at all those means and standard errors nicely laid out in those neat little tables and it becomes easy to assume those numbers mean something. The situation reminds me of a joke told by economist Ken Rogoff to explain why economists failed to foresee the current economic crisis. 
A drunk on his way home from a bar one night realizes that he has dropped his keys. He gets down on his hands and knees and starts groping around beneath a lamppost. A policeman asks what he’s doing.
“I lost my keys in the park,” says the drunk.
“Then why are you looking for them under the lamppost?” asks the puzzled cop.
“Because,” says the drunk, “that’s where the light is.”
In the same way that economists failed to understand the economic collapse because they failed to collect the right data, college administrators fail to effectively evaluate teaching because they are collecting the wrong data. 

Related Articles:

Full version of the Carrell and West study in PDF.
Cowen, Tyler. "Does professor quality matter?"
Fish, Stanley. "Student Evaluations, Part Two"
Douthat, Ross. "In Defense of Student Evaluations."
Douthat, Ross. "Now, The Case Against Student Evaluations."
Jacobs, Alan. "Stanley Fish is right again."


Stephen Maynard Caliendo said...

Great blog, as usual. The other obvious question, of course, is what data would be better. One answer would be to track students years after. This has a number of benefits to address Fish's concerns, as well as several drawbacks, both with respect to logistics (mortality - in the scientific, not literal, sense - will be very high, and low response rates will compromise generalizability) and internal validity (will the students remember the course accurately, give "credit" to this course when material was learned in a number of courses or via the broader curriculum, etc.?). Our institution recently went through the accreditation process, and the site visit team had a number of recommendations with respect to assessing students years after they leave the College. It makes a lot of sense in many ways, but there will be problems with such methods, as well. Still, this is important to discuss, so thanks for doing such a nice job with this.

Napp Nazworth said...

Thanks Dr. Caliendo. You're right, it is important to think about the alternatives. I would approach that question like I approach public policy questions. For me to support it, it doesn't have to be perfect, it just has to be better than the status-quo. Also, like many difficult social science questions, it is best tackled using a variety of methods. Each method will have its own weaknesses, but together a variety of methods will help form a fuller picture. One suggestion I read about was to randomly select students from classes to be interviewed by a committee of senior professors. This is a way to get student feedback, but gives the opportunity to explore more deeply why students like or don't like certain professors.