by Justin Baeder, PhD

Evaluating teachers through observation has always been tricky. Teaching is hard to observe—because it's long-term, cognitive work.
And yet all of the alternatives are worse:
- Student test scores don't produce reliable teacher ratings—they're very volatile and reflect nonrandom student assignment
- Student and peer ratings become unreliable once they become high-stakes (though they're usually not far off when low-stakes)
- Evaluating lesson plans is no good—no reason for teachers to write individual lesson plans, and the plan may not reflect the taught lesson
So observation is still our best bet for collecting evidence…but we still get it wrong. We think about it wrong.
We treat it like a discrete, observable performance.
Consider a “judged” athletic event like figure skating. Judging of this type is only possible when there are established criteria.
Each Olympic sport has detailed scoring criteria, of course. And we do in K-12 education, too—or at least, we think we do.
I don't think this gets enough attention, but most teacher evaluation criteria are written to address teachers' overall responsibilities.
An evaluation rubric is NOT an observation rubric.
The Danielson Framework was not originally designed as a lesson observation rubric.
(Indeed, 2 of the 4 domains specifically address outside-of-class-time factors.)
In fact, none of the popular teacher evaluation rubrics are fully scorable based on observation alone.
So what would a real observation rubric focus on?
NOT on some “ideal” lesson. Total dead end.
We get this wrong—usually based on what we taught:
- Former math teachers think every lesson should look like a math lesson
- Old science teachers like me think every lesson should be like a lab
- Nobody thinks about how band, yearbook, and PE are different
Danielson et al. are popular because they cover all subjects K-12 pretty well.
But they do it by being broad.
They aren't specific to what a 3rd grade teacher is doing on day 2 of an ELA unit…
…or how a high school geometry teacher is reviewing for a test.
They can't be that specific.
And no district wants to adopt 72 different teacher observation rubrics.
So let me suggest a different approach.
Instructional Purpose, Not Domain of Responsibility
What if we focused on instructional purpose rather than domain of responsibility?
Given whatever the teacher is trying to do, how well are they doing it?
What does it look like to accomplish this specific purpose well?
This lets us treat various tasks fairly.
If you're passing back papers and going over test answers, that's probably a good use of class time for learning—but rubrics say nothing about it.
If you're doing FASE reading, a rubric specific to that strategy is going to be a lot more helpful than an “Instruction” rubric with generic criteria.
One way to think about instructional purpose is by timeframe:
- Where are students in the unit and lesson? Kicking off a new unit is very different from the middle and home stretch.
- Where are students in this assignment? Are they learning from worked example problems? Are they revising an earlier draft of an essay? It matters.
- What portion of the class period or instructional block am I seeing? Routines will determine a lot of what happens at a specific time.
For example, at Wahitis Elementary School in Othello, Washington, teachers begin every math lesson with an entry task to review prior learning. Lessons also have fact fluency, problem-solving, and conceptual development components. See: Principal Justin Johnson on Tier 3 Systems for Learning on The Eduleadership Show.
Each purpose lends itself to different observation criteria, which will be far more specific than overall evaluation criteria.
What instructional purposes should we make rubrics for? Leave a comment and let me know what you think.
