Wednesday, September 23, 2009

Blog Entry #4

In this week's entry, I'll be writing in response to chapter 11 of McKeachie's Teaching Tips and chapter 8 of Curzan's and Damour's First Day to Final Grade. The subject matter of these chapters is grading. I’ll also respond to chapter 10 in Teaching Tips, which is all about the issue of cheating.

Last week I wrote that First Day... is the more engaging of the two texts, but this week I feel the opposite. I found Teaching Tips to be much more thoughtful and thought-provoking in its discussion on grading. First Day... was a disappointment, as you'll see from my entry.

As the majority of the reading focused on grading, I'll write about that first.

Grading

Grading, measuring, and evaluating students is difficult work and, as Teaching Tips appears to recognize more than First Day..., controversial. To help me delve into the topic, I need to introduce readers to Dr. Fred O. Brooks, who wrote the book (well, a book) on the subject of grading. He taught my classroom evaluation and measurement course as an undergraduate teacher-in-training. Dr. Brooks' textbook is called Principles and Practices in Classroom Evaluation. Things that he and his textbook said have stuck with me since I took his course in 1993 and, in fact, I still have the book. My own views about grading were heavily influenced by Dr. Brooks.

Among Dr. Brooks' uncompromising stances towards classroom evaluation is that norm-based grading is never appropriate. His first argument against norm-based grading (or grading "on a curve") was that no classroom is varied or large enough for a teacher to reasonably expect it to conform to a universal "norm." He said that curves are statistically significant only when looking at very large populations of students. He argued, as McKeachie acknowledges as well, that instead teachers create a curve by comparing students within the class to each other, which is unreliable, especially if you happen to have a class attended by mostly gifted students or mostly ungifted ones.

His second argument against norm-based grading was that students ought to be awarded every chance to succeed and that random factors, such as the comparative brilliance or dimness of their classmates, ought to be eliminated as much as possible. He felt that too many teachers taught their classes as if they were "in the business of keeping secrets from [their] students." He believed teachers should make classes so inviting, "so clear, and so obvious to students that they can't help but learn." He recognized that this might be unrealistic, but even so, it is "a worthwhile goal." For Dr. Brooks, being inviting and obvious means telling students exactly what they’re expected to know and do. In other words, teachers should establish clear competencies that all students can (potentially) demonstrate.

In response to concerns about grade inflation, Dr. Brooks felt that teachers should want all their students to do very well. Isn't that why we teach? Moreover, isn’t that why we strive to be good teachers? Isn't a product of good teaching widespread student success? This is not to say that Dr. Brooks thought learning should not be challenging. On the contrary, he felt teachers should have high expectations and demand the best of students. But if, in turn, students gave their best (i.e., earned a high grade), should they not get a good grade? On this question, posed in this context, there was little disagreement in Dr. Brooks' class.

Rather, disagreement, or at least lack of total agreement, came when we began to talk about the details of applying the concepts we were learning to the actual construction of exams. To understand why, it must be clear that Dr. Brooks was a zealot for exam validity and reliability.

Validity, as Teaching Tips defines it, is whether or not the test (or any evaluated assignment, for that matter) measures what the teacher thinks it measures. As an example, let's look at a word problem on a math exam. The teacher's intent in this example is to see how well students can multiply. This word problem asks the students to convert knots to miles per hour. Unless knowledge of the relationship between knots and miles per hour was specifically part of the instruction prior to this exam, this question would be invalid in most classrooms, as all but naval academy cadets would be unlikely to know the ratio. To revalidate the question, the teacher could simply provide the information (1 knot = 1.1507794 statute miles per hour) in the question. The student would then be required and should be able to correctly perform the multiplication, which is what the teacher was trying to measure in the first place.

Reliability is the quality of consistency. More precisely, reliable test items or assignments are those that would be graded the same by anyone at anytime. A multiple choice question asking "Pure water is best described by which of the following formulae?" should find "H2O" the response that any chemistry teacher year after year would agree is correct (disclaimer: despite owning a chemistry set as a child and taking chemistry classes in high school and college, I don't know that there isn't some other formula that better describes pure water, but I think you understand the point I'm making). On the other hand, an essay question asking students to "analyze the effects that racism plays in anti-Barack Obama protests," for example, is not so cut-and-dried, and would likely elicit a variety of critiques from the political science, sociology, or communication department teachers we can imagine might ask such a question. As you may have guessed, the reliability of a test question is dependent on the depth of the question’s objectivity or subjectivity.

Validity is something that most people can and do agree on when they talk about tests and test items. Reliability, on the other hand, is not seen by many as a crucial criterion of tests. But for Dr. Brooks, the two characteristics were equally essential. As a result, subjective test questions like the essay question I offered in the preceding paragraph, were considered dangerous, if not entirely inappropriate. Dr. Brooks, for one, certainly never used them. His exams consisted entirely of multiple choice, true-false, matching, and fill-in-the-blank items.

It's here that I'll reveal that Dr. Brooks had been a high school math teacher before he became an educator of educators. This is relevant because it seemed as the course went on that he believed multiple choice, true-false, and short completion items were all any teacher needed to measure whether students were learning. Now, he would probably argue that I'm overstating his position but, I wager, not by much. In Dr. Brooks' experience, he probably was able to evaluate his students' mastery of math concepts and applications through these kinds of test items. Correct answers to math questions are usually much more discernible and not as open to interpretation as, say, an answer to a typical essay question.

As an English and communication major, I was among several students who failed to see how us these questions would sufficiently capture students' learning in my disciplines, especially higher level learning (in terms of Bloom's taxonomy). Even for lower-level learning, such as knowledge and comprehension, there are limitations to how well certain test items can be used to measure success. To see this, we must distinguish questions that require students to recognize the correct answer from questions that require students to recall the correct answer.

Recognition requires what I call “lineup” knowledge. To recognize the correct answer, students need only to have seen or heard it enough times (perhaps only once in a class lecture) that it makes a connection in their minds when they see it on the exam. Victims of muggings are often unable to describe their attackers accurately from memory, but if the police presents to them four or five suspects in a lineup, the victim is much more likely to be able to pinpoint which of the suspects was the mugger.

Recall, on the other hand, requires deeper knowledge. To recall a correct answer, students need to be familiar enough with it that they can recreate it. To continue with the metaphor above, recall is like asking the mugging victim not only to describe the mugger to a police sketch artist, but to draw the mugger him- or herself.

In many disciplines, multiple choice questions require students only to recognize correct answers, not recall them. Let's look at this history question as an example:

Which naval battle of WWII is considered the turning point for the US in its war against Japan?
a) Coral Sea
b) Midway
c) Pearl Harbor
d) Sea of Japan

On the other hand, math and physics are just two disciplines that can use multiple choice questions with the expectation that students must still know how to perform certain operations, memorize certain formulae, etc. For example:

What is the value of 58 to the power of 3?
a) 195,112
b) 11,316,496
c) 3,364
d) 7.61577311

No one, not even savants, have "memory" of this kind of information. It must always be calculated, however fast some may be able to do it.

We could change the history question above from multiple choice to fill-in-the-blank format in order to make it more difficult and require students to know the material more deeply. Even so, this question is still only a knowledge-level item and would not be able to help the teacher know whether the student comprehends why the Battle of Midway was considered the turning point. To bring the matter home to my own teaching assignments this semester, can I evaluate students’ ability to apply good public speaking techniques and deliver quality speeches by having them take multiple choice exams? The answer is clearly “no,” as McKeachie writes and even Dr. Brooks acknowledges in his book. Students must be required to apply the speaking skills. Grading speeches, as we know, is an inherently subjective exercise.

While it’s clear that the use of subjective measures is very often appropriate and completely valid, I believe that reliability must not be sacrificed. In fact, reliability in grading is completely compatible with the grading philosophies of teachers who grade on a curve, since many who grade on a curve are concerned about what they perceive are deteriorating standards in education. Insisting on reliable measures is to insist that there are agreed-upon standards (of essay writing, public speaking, etc.) that all teachers more or less adhere to. Without insisting on reliable measures, exams and grades become little more than the personal opinions of the teachers who give them (see the section in Teaching Tips, “Can we trust grades?”).

To achieve this in a class like COMM 110, I think the Communication Department needs to collectively identify and define standards of good public speaking and then employ rubrics that illustrate these standards. I know that the classes currently employ what is being called a rubric, but it’s not a rubric. It’s a list of words like “attention getter” and “eye contact” with points assigned to them. An effective rubric clarifies what proper use of “eye contact” is and describes what different point values mean. For example*:

5 points = eye contact, interaction with aids, and physical gestures demonstrate the
speaker’s energy and interest, guiding the listener through the presentation.

3 points = eye contact, interaction with aids, and physical gestures are natural and fluid.

1 point = eye contact with the audience is lacking. Gestures are missing or awkward.
The speaker depends heavily on the written speech or notes.

*The complete version of this rubric can be accessed through the link found at the bottom of this entry. It comes from Tusculum College in Greenville, Tennessee.

Can such a rubric actually promote and enforce reliable standards across a few dozen instructors of public speaking? Yes, and I speak from personal experience. Every year Texan high school students take a standardized exam intended to measure their mastery of secondary education-level competencies. At least one essay question is part of this exam, which must be evaluated by human beings. When I lived in Texas, I was twice employed temporarily (along with about a hundred others) to grade these essays. To ensure that all the evaluators were assigning points reliably, detailed rubrics were provided and we spent close to a full day honing our evaluative instincts and skills to match the rubric. The training was over once all the evaluators were able to assign the same number of points to the same essays.

It appears that the department prefers instead to give instructors the flexibility to set their own standards and create their own rubrics. This would ordinarily not be much cause for alarm, if public speaking were being taught by experienced teachers. As it is, most public speaking instructors at NDSU (and many larger universities and colleges in the US) are people who’ve never taught before (a few are even new to the communication discipline). The risk here is not just that a lack of evaluative standards may lead to unreliable grading, but that some instructors may not even be capable of setting their own standards.

In this debate, there is a great difference between the two books we’ve read for this week. Teaching Tips coolly offers sensible reasons for employing both norm-based and competency-based grading, though McKeachie does say he believes norm-based grading “is educationally dysfunctional.” First Day…, on the other hand, simply provides a short section designed to help teaching assistants “[find their] grading curve” and seems to feel that getting into this debate is beyond its scope or, perhaps, graduate teaching assistants’ capabilities.

Cheating

Chapter 11 of Teaching Tips warned about students who are “performance oriented,” or working primarily for a grade, as opposed to students who seek learning for its own sake. I believe that most students have a healthy mix of the two orientations. McKeachie writes that students who tend to achieve the most in terms of learning have moderate grade motivation and high intrinsic motivation. Students who enjoy learning for its own sake, after all, probably tend to receive good grades. Such students would probably be disappointed on occasions when they don’t receive good grades, as well. I’ve always been someone who genuinely enjoys learning, but I also strive for and recognize the value of high grades. I don’t believe the two goals are mutually exclusive.

The issue of grade motivation is important because cheating, McKeachie says in chapter 10, is often committed because of students’ fixation on high grades at any cost. Statistics showing the prevalence of cheating are always a bit depressing, but it’s important that we be aware of how common cheating can be.

Next semester I will take over COMM 112 and teach it on my own. The class has around 130 students, so monitoring them during exams will not be easy. As such, I read McKeachie’s list of cheating methods with some alarm. Students now have so many more methods at their disposal, but the use of foot tapping and hand codes are particularly frightening because they can be almost impossible to detect. After all, plenty of students tap their feet or make other noises out of sheer anxiety. But in a multiple choice exam, such simple signals could be effectively used to cheat.

So far in COMM 112, we have employed using two different forms of the first exam in an effort to prevent cheating, but McKeatchie cited research showing that scrambling the order of items alone did not reduce cheating. So for the next test, I’ll suggest we also scramble the responses.

I like the idea of trying to prevent cheating before it starts. McKeatchie outlined an “honor system,” wherein classes are invited to vote on whether they’d like to adopt such a system. He says few classes actually vote unanimously to adopt an honor system, but he believes the discussion of academic dishonesty is itself useful. I doubt I’ll try that, but having students sign a pledge of academic integrity prior to each exam seems a good idea. Teachers could place the statement on the exam, so that writing their names on the exam is also signing the pledge. The downside to this approach, I suppose, is that it takes away the sense that students are signing it voluntarily. It’s sort of like having to agree to a computer application’s usage terms before it can be installed. But these kinds of approaches can be more effective and at least feel less draconian than the “Big Brother Is Watching You” style messages that appear in many of the syllabi I’ve seen at NDSU.

I thought McKeatchie made very good suggestions for handling suspected cheating, as well. He gave an example of behavior—seeing a student glance around—that may or may not be cheating. His suggestion of quietly insisting that the student change seats if the wandering looks continue was a good blend of subtlety with effectiveness. However, I wish McKeatchie had described what he would do if confronted with what is a clear indication of cheating, like finding a crib sheet or seeing students pass notes during an exam. Would he then also have sought to be as discreet?

In any event, I think I’d prefer the discreet route as much as possible. Students caught cheating will face plenty of severe consequences without having to be paraded in chains, as it were, before the whole class. On the other hand, the rest of the class knowing that the teacher is paying attention and will take swift, firm action against cheaters is important. McKeatchie rightly noted this, as well The Chinese say that sometimes you have to kill a chicken to frighten the monkeys. Of course, this only works if the monkeys see the chicken get killed, or at least hear about it. Knowing that a teacher will punish students for cheating can be an effective deterrent, at least in that class.

Some links to sample public speaking rubrics can be found below:

http://www.tusculum.edu/research/documents/PublicSpeakingCompetencyRubric.pdf

http://www.awrsd.org/oak/Academics/Rubrics/Public%20Speaking%20Rubric.htm

http://www.oaklandcc.edu/assessment/geassessment/outcomes/geoutcome_communicate_effectively_speaking/Public%20Speaking%20Rubric%20May%202009.pdf

No comments:

Post a Comment