Justin Baeder, Author at The Principal Center

Free Download: 10 Questions for Better Feedback on Teaching

All Posts by Justin Baeder

The Evidence-Driven Leadership Manifesto

K-12 leaders must abandon the delusion of “data-driven” decision-making, and instead embark on a serious evidence-driven overhaul of learning, teaching, and leadership. 

With learning in particular, we don’t have much evidence of learning because we are have replaced too much content with skills that aren’t really skills. We’ve tried to get students working on higher-order cognitive tasks, without giving them the knowledge they need to do those tasks. 

As a result, we don’t really have much evidence of learning. And without evidence of learning, we have a limited ability to connect teacher practice to its impact on student learning. When we don’t know how we’re impacting teacher practice, we don’t have evidence of improvement. We have only data—and data can’t tell us very much. 

The Data Delusion

Beginning in earnest with the passage of No Child Left Behind, our profession embarked on a decades-long crusade to make K-12 education “data-driven,” a shift that had already been underway in other fields such as business and public health. 

To be sure, bringing in data—especially disaggregated data that helps us see beyond averages that mask inequities—was an overdue and helpful step. But we went too far in suggesting that data should actually drive decisions about policy and practice. Should data inform our decisions? Absolutely. But the idea that data should drive them is absurd.

Imagine making family decisions “driven” by data. Telling your spouse “We need to make a data-driven decision about which grandparents to visit for the holidays” is both unworkable on its face, and an approach that misses the point of the decision. Data might play a role—how much will plane tickets cost? How long of a drive is it? How many days do we have off from school?—but letting data drive the decision would be wrong. 

And we’d never advise our own teen, “Honey, just make a spreadsheet to decide who to take to prom.” I’m sure we all know someone who did that, but it’s not a great way to capture what really matters. 

We know intuitively that we make informed judgments holistically, based on far more than mere data. Yet it’s a truth we seem to forget every time we advocate for data-driven decision-making in K-12 education. 

Knowing Where We’re Going: The Curriculum Gap

The idea of collecting data becomes even more ridiculous when we consider the yawning gap in many schools: a lack of a guaranteed and viable curriculum. When we aren’t clear on what teachers are supposed to be teaching—and what students are supposed to be learning—teaching is reduced to abstract skills that can supposedly be assessed by anyone with a clipboard. 

No less a figure than Robert Marzano has stated unequivocally “The number one factor affecting student achievement is a guaranteed and viable curriculum” (What Works In Schools, 2003). Historically, having a guaranteed and viable curriculum has meant that educators within a school would generally have a shared understanding of what content students would be taught and expected to learn. 

For some reason, though, the idea of content has fallen out of fashion. We’ve started to view teaching as a skill that involves teaching skills to students, rather than a body of professional knowledge that involves teaching students a body of knowledge, and layering higher-order intellectual work on top of that foundation of knowledge. 

I attribute this fad to a popular misconception about Bloom’s Taxonomy (or if you prefer, Webb’s Depth of Knowledge): the idea that higher-order cognitive tasks are actually better, and shouldn’t be just the logical extension of more foundational tasks like knowing and comprehending, but should actually replace them. 

This misconception has spread like wildfire through the education profession because next-generation assessments—like those developed by the PARCC and SBAC consortia to help states assess learning according to the Common Core State Standards—require students to do precisely this type of higher-order intellectual work. 

There’s nothing wrong with requiring students to do higher-order thinking—after all, if high-stakes tests don’t require it, it’s likely to get swept aside in favor of whatever the tests do require (as we’ve seen with science and social studies, which have been de-emphasized in favor of math and reading). 

The problem is that we’re no longer clear about what knowledge we want students to do their higher-order thinking on—largely because the tests themselves aim to be content-neutral when assessing these higher-order skills. 

Starting in the 1950s, Benjamin Bloom convened several panels of experts to develop the first version of his eponymous taxonomy:

It’s no accident that Bloom’s model is often depicted as a pyramid, with the higher levels resting on the foundation provided by the lower levels. Each layer of the taxonomy provides the raw material for the cognitive operation performed at the next level. 

“Reading comprehension” is not a skill that can be exercised in the abstract, because one must have knowledge to comprehend; you can’t comprehend nothing. That’s why, as Daniel Willingham notes, reading tests are really “knowledge tests in disguise” (Wexler, 2019, The Knowledge Gap, p. 55).

The preference for “skills” over knowledge is explored in depth in one of the best books I’ve read this year, Natalie Wexler’s The Knowledge Gap: The Hidden Cause of America's Broken Education System—and How to Fix It. She explains:

[S]kipping the step of building knowledge doesn’t work. The ability to think critically—like the ability to understand what you read—can’t be taught directly and in the abstract. It’s inextricably linked to how much knowledge you have about the situation at hand. 

p. 39

Wexler argues that we’ve started to treat as “skills” things that are actually knowledge, and as a result, we’re teaching unproven “strategies”—in the name of building students’ skills—rather than actually teaching the content we want students to master. Wexler isn’t arguing for direct instruction, but rather the intentional teaching of specific content—using a variety of effective methods—rather than attempting to teach “skills” that aren’t really skills. 

For example, most educators over the age of 30 mastered the “skill” of reading comprehension by learning vocabulary and, well, reading increasingly sophisticated texts—with virtually no “skill-and-strategy” instruction like we see in today’s classrooms. Somehow, the idea that we should explicitly teach students words they’ll need to know has become unpalatable, even regressive in some circles. 

In a recent Facebook discussion, one administrator wondered in a principals’ group “Why are students still asked to write their spelling words 5 times each during seat work??” Dozens of replies poured in, criticizing this practice as archaic at best—if not outright malpractice. Clearly, learning the correct spelling of common words is a lower-level cognitive task, but it’s one that is absolutely foundational to literacy and success with higher-order tasks, like constructing a persuasive argument. 

This aversion to purposefully teaching students what we want them to know is driven by fads among educators, not actual research. Wexler writes:

[T]here’s no evidence at all behind most of the “skills” teachers spend time on. While teachers generally use the terms skills and strategies interchangeably, reading researchers define skills as the kinds of things that students practice in an effort to render them automatic: find the main idea of this passage, identify the supporting details, etc. But strategies are techniques that students will always use consciously, to make themselves aware of their own thinking and therefore better able to control it: asking questions about a text, pausing periodically to summarize what they’ve read, generally monitoring their comprehension.

Instruction in reading skills has been around since the 1950s, but—according to one reading expert—it’s useless, “like pushing the elevator button twice. It makes you feel better, perhaps, but the elevator doesn’t come any more quickly.” And even researchers who endorse strategy instruction don’t advocate putting it in the foreground, as most teachers and textbook publishers do. The focus should be on the content of the text.

p. 56-57

Part of the problem may be that the Common Core State Standards in English Language Arts mainly emphasize skills, while remaining agnostic about the specific content used to teach those skills. This gives teachers flexibility in, say, which specific novels they use in 10th grade English, so it isn’t necessarily a flaw—unless we make the mistake of omitting content entirely, in favor of teaching content-free skills. 

(The Common Core Math Standards, in contrast, make no attempt to separate content from skills, and it’s obvious from reading the Standards that the vocabulary and concepts are inseparable from the skills.)

Yet separating content and skills is precisely what we’ve done in far too many schools—and not just in language arts. Seeking to mirror the standardized test items students will face at the end of the year, we’ve replaced a substantive, content-rich curriculum with out-of-context, skill-and-strategy exercises that contain virtually no content. We once derided these exercises as drill-and-kill test prep, yet somehow they’ve replaced actual content.

Even more perversely, teaching actual content has become unfashionable to the point that content itself has become the target of the “drill-and-kill” epithet.

As a result of these fads, many schools today simply lack a guaranteed and viable curriculum in most subjects, with the notable exception of math. 

Is Teaching A Skill?

For administrators, the view that students should be taught skills rather than content is paralleled by a growing belief that teaching is a set of “skills” that can be assessed through brief observations. 

This hypothesis was put to the test by the Gates-funded Measures of Effective Teaching project, which spent $45 million over a period of three years recording some 20,000 lessons in approximately 3,000 classrooms. Nice-looking reports and upbeat press releases have been written to mask the glaring fact that the project was an abject failure—we are no closer to being able to conduct valid, stable assessments of teacher skill than before. 

Why did MET fail to yield great insights about teaching? Because it misconstrued teaching as a set of abstract skills rather than a body of professional practice that produces context-specific accomplishments. Every principal knows that there’s an integral relationship between the teacher, the students, and the content that “data” (such as state test scores) fail to capture. 

We cannot “measure” teaching as an abstract skill, because it’s not an abstract skill. Teachers always teach specific content to specific students—and the specifics are everything. Yes, there are “best practices,” but best practices must be used on specific content, with specific students—just as reading comprehension strategies must be used on a specific text, using one’s knowledge of vocabulary, along with other background knowledge about the subject matter. 

Teaching is not an abstract skill in the sense that, say, the high dive is a skill. It can’t be rated with a single score the way a high dive can. Involving more “judges” doesn’t improve the quality of any such ratings we might want to create. 

A given teacher’s teaching doesn’t always look the same from one day to the next, or from one class to the next, and it can’t be assessed as if there existed a “platonic ideal” of a lesson. 

To understand the root of the “guaranteed and viable curriculum” problem as well as the teacher appraisal problem, we don’t have to dig very far—Bloom’s Taxonomy provides a robust explanation. 

Bloom’s Taxonomy and the “Data-Driven Decision-Making” Problem

Neither Bloom’s Taxonomy or Wexler’s Knowledge Gap focuses specifically on teacher evaluation, but the parallels are clear. Principals who regularly spend time in classrooms, building rich, firsthand knowledge of teacher practice are in a far better position to do the higher-order instructional leadership work that follows. Knowledge—the foundation of the pyramid—that has been comprehended can then be applied to different situations, and principals who repeatedly discuss and analyze instruction in post-conferences with teachers will be far more prepared to make sound evaluation decisions at the end of the year. 

On the other hand, it’s impossible to fairly analyze and evaluate a teacher’s practice based on just one or two observations or video clips, because such a limited foundation of knowledge affords observers very little opportunity to truly comprehend a teacher’s practice.

Using Bloom’s Taxonomy to understand the failure of the MET project is straightforward, because the resulting diagram is decidedly non-pyramidal: an enormous amount of effort went in to the analysis, synthesis, and evaluation of a very small amount of knowledge of teacher practice, with very few efforts to comprehend or apply insights about the specific instructional situation of each filmed lesson. It’s more of a mushroom than a pyramid. 

By treating teaching as an abstract skill that can be filmed and evaluated—apart from even the most basic awareness of the purpose of the lesson, its place within the broader curriculum, students’ prior knowledge and formative assessment results, and their unique learning needs—the MET project perpetuated the myth that education can be “data-driven.”

It’s time to call an end to the “data-driven” delusion. It’s time to take seriously our duty to ground professional practice in evidence, not just data. It’s time to ensure that all students have equitable access to a guaranteed and viable curriculum. It’s time to treat student learning and teacher practice as the primary forms of evidence about whether a school is improving—and reduce standardized tests to their proper role as merely a data-provider, and not a “driver” of education. 

As leaders, we need clear, shared expectations for student learning and teacher practice. We need direct, firsthand evidence. Only then can we make the right decisions on behalf of students. 

Jimmy Casas, Jeffrey Zoul & Todd Whitaker—10 Perspectives on Innovation in Education

Interview Notes, Resources, & Links

About The Authors

Jimmy Casas is an educator, bestselling author, and speaker with 22 years of school leadership experience. 

Jeff Zoul is a lifelong teacher, leader, and learner. After many years of public school service, Jeff now focuses on writing, speaking, consulting, and organizing What Great Educators Do Differently events.

Todd Whitaker is a professor of educational leadership at the University of Missouri. He is a leading presenter in the field of education and is the bestselling author of more than 50 books.

Rubrics as Growth Pathways for Instructional Practice

Who's the best person to decide what instructional practices to use in a lesson?

Obviously, the teacher who planned the lesson, and who is responsible for teaching it and ensuring that students learn what they're supposed to learn. 

Yet too often, we second-guess our teachers. 

We do it to be helpful—to provide feedback to help teachers grow—but I'd suggest it's often not the best way to help teachers grow.

Over the past couple of days, I've been arguing that we're facing a crisis of credibility in our profession.

Too often, we adopt reductive definitions of teacher practice, because so much of teacher practice can't be seen in a brief observation. 

It's either beneath the surface—the invisible thinking and decision-making that teachers do—or it takes place over too long a span of time. 

We've been calling these two issues “visibility” and “zoom.”

Sometimes, when we second-guess teachers, we tell them they should have used other practices:

“Did you think about doing a jigsaw?”

“Did you think about using small groups for that part of the lesson?”

And hey, this can be helpful. Every day, administrators are giving teachers thousands of good ideas.

But sometimes we're making these suggestions without a clear sense of the teacher's instructional purpose

The practices must match the purpose, and a quick visit may not give us enough information to make truly useful suggestions.

The remedy to most of this is simply to have a conversation with the teacher—to treat feedback as a two-way street rather than a one-way transfer of ideas from leader to teacher. 

But we shouldn't enter into these conversations alone. 

There aren't just two parties involved when a leader speaks with a teacher.

The third party in every conversation should be the instructional framework—the set of shared expectations for practice. 

Why? 

Because a framework serves as an objective standard—an arbiter. 

It turns a conversation from a clash of opinions into a process of triangulation.

A more formal definition:

An instructional framework is a set of shared expectations serving as the basis for conversations about professional practice.

The best frameworks aren't just descriptions—they're leveled descriptions…

Or what we typically call rubrics. 

When you have a rubric, you have a growth pathway.

When teachers can see where their practice currently is—on a rubric, based on evidence—they can get a clear next step.

How?

By simply looking at the next level in the rubric.

If you're at a 3, look at level 4. 

If you're at a 1, look at level 2.

Now, we usually have rubrics for our evaluation criteria.

But what about the instructional practices that teachers are using every day?

Do we have leveled rubrics describing those practices?

Often, we don't bother creating them, because they're so specific to each subject and grade. 

They don't apply to all teachers in all departments, and we prefer to focus on things that we can use with our entire staff. 

So we miss out on one of the highest-leverage opportunities we have in our profession:

The opportunity to create clear descriptions of instructional practice, with subject-specific details that provide every teacher with pathways for growth. 

We can do it. In fact, teachers can do it mostly on their own, with just a bit of guidance. 

So let me ask you: 

What areas of instructional practice could your teachers focus on?

Where would it be helpful to have them develop leveled rubrics?

I'm sure it's specific to your school, and you wouldn't want to just download a rubric from the internet. You'd want teachers to have ownership. 

So what would it be?

Visibility & Zoom: the Evidence of Practice Grid

Is teacher practice always something we can actually see in an
observation?

Sometimes, the answer is clearly yes. But as I've argued over the past few emails, it's not always so simple. 

I thought it might be helpful to plot this visually, along two axes. Let's call this the Evidence of Practice Grid:
If a teaching practice falls in the top-left quadrant, it's probably something you can directly observe, in the moment. 

There's still an “observer effect”—teachers can easily put on a song and dance to show you what you want to see—but at least the practice itself is fundamentally see-able.

If it's in the top-right quadrant, a practice may be visible, but not on the time scale of a typical classroom visit. It might take weeks or months for the practice to play out—for example, building relationships with students. 

The bottom two quadrants include what Charlotte Danielson calls the “cognitive” work of teaching—the thinking and decision-making that depend on teachers' professional judgment. 

These “beneath the surface” aspects of practice are huge, but we can't observe them directly. We must talk with teachers to get at them. 

So, for any given practice, we can figure out how visible it is, and how long it takes to play out, using this grid. 

That's the Evidence of Practice Grid

The horizontal axis in our diagram is zoom—the “grain size” or time scale of the practice.

The vertical axis in our diagram is visibility—how directly observable the practice is.

So how can this grid be useful?

If you're focusing on an area of practice that's on the bottom or to the right, the grid can help you realize that it's something that's hard to directly observe. 

With this knowledge, you can stop yourself and say “Wait…did I actually see conclusive evidence for this practice, or just one brief moment that may or may not be part of a pattern?”

Conversely, when you know you're looking at a tight-zoom, highly visible practice, you don't have to shy away from giving immediate feedback. 

And in all cases, if you want to know more than observation alone can tell you…

You can ask. You can get the teacher talking. 

Conversation makes the invisible visible—and therefore, useful for growth and evaluation. 

Hope this is helpful!

As you gather evidence of teacher practice, and use it to provide feedback or make evaluation decisions…

Make sure you're aware of the zoom level and visibility of the practice you're focusing on.

Make sense?

Give it a try now:

Plot a given practice on this grid—mentally—and think about itsvisibility and zoom level. 

Where does it fall?

What comes up when you try to observe for or give feedback on this area of teacher practice?

Instructional Purpose: The Right Practice for the Right Circumstances

When should teachers use any given instructional practice?

If we're going to give feedback about teachers' instructional practices, it's worth asking:

When is it appropriate to use a given practice? Under what circumstances?

I'm using “practice” to mean professional practice—as in, exercising professional judgment and skill—as well as to mean teaching technique.

So some practices are in use all the time—for example, monitoring student comprehension as you teach, or maintaining a learning-focused classroom environment. 

Other practices are more specific to a particular instructional purpose.

For example, if a teacher is trying to help students think critically about a historical event, she might use higher-order questioning techniques, with plenty of wait time. 

If a teacher is trying to review factual information to prepare students for a test, he might pepper them with lower-level questions, with less wait time. 

If we're going to use instructional practices for the right instructional 
purposes, we have to be OK with not seeing them on command.

If we insist on seeing the practices we want to see, when we want to see them…we'll get what we want.

But it won't be what we really want. It'll be what I call hoop-jumping

Have you ever seen a dog jumping through a hoop?

My 5-year-old saw one at a high school talent show the other day, and it blew her mind. 

The human holds up the hoop, and the dog knows what to do.
(And yes, that's actually a pig in the GIF 🙂

Cute, but a terrible metaphor for instructional leadership, right?

Teachers aren't trained animals doing tricks. 

Yet too often, we treat them that way.

“Hey everyone, this week I'm going to be visiting classrooms and giving feedback on rigor. I'll be looking for higher-order questions, which—as we
learned in our last PD session—are more rigorous.”

We show up, ready to “inspect what we expect.”

Only, if we haven't thought deeply enough about what it is that we expect, or whether it's appropriate for that moment and the teacher's instructional purpose, or whether it's even observable…

Teaching is reduced to jumping through a hoop.

Dutifully, most teachers will do it. 

We'll show up, and teachers will see the hoop.

They'll know they need to ask some higher-order questions while we're in the room, because that's how we've (reductively) defined rigor. 

They know what we're hoping to see, so they'll use our pet strategy (see
what I did there?). 

We'll have something to write down and give feedback on, and we'll go away happy—satisfied that we've instructional-leaded* for the day. 

Yet in reality, we've made things worse. 

We've wasted teachers' time playing a dumb game—a game in which we pretend to give feedback, and teachers pretend to value it, and we all pretend it's beneficial for student learning. 

*And no, “instructional-leaded” is not a grammatically correct term.
I really hope it doesn't catch on.

But when I see dumb practices masquerading as instructional leadership, I feel compelled to give them a conspicuously dumb label. I'm not grumpy—I'm just passionate about this 🙂 

All of this foolishness is avoidable, if we're willing to think a little harder. 

Last week, I shared some thoughts on observability bias—the idea that instructional leaders tend to oversimplify what teachers are really doing, in order make it easier to observe and document. 

We adopt reductive definitions of teacher practice in order to make our lives easier, even if it means giving bad feedback, like “You shouldn't ask so many lower-level questions, because higher-order questions are more rigorous.”

So far, we've identified a couple of different factors to consider when
observing a practice in the classroom:

1. Zoom—is it something you can observe in a moment, or does it play out over days, weeks, or the entire year?

2. Visibility—is it an observable behavior you can see, or is it really invisible thinking and decision-making?
We're calling ^^^ this diagram ^^^ the Evidence of Practice Grid

And now we can add a third factor:

3. Instructional Purpose—under what circumstances is the practice
relevant?

If we ignore instructional purpose, and just expect teachers to use a practice every time we visit because we value it, we'll see “hoop-jumping”
behavior.

We'll walk into a classroom and immediately see the practice we're
focusing on—not because it fits the instructional purpose, because
teachers know we want to see it. 

So if you're seeing this kind of behavior, it's worth asking yourself—whenshould teachers be using this practice, under what circumstances, and
what would be good evidence*** that they're using it appropriately and
well?

***P.S. And if you're thinking “Well, I'd really have to talk with the teacher to know” then I think we're on the same page 🙂

Lecturing from the Back of the Room: The Data Conspiracy

Earlier this week, I asked for examples of oversimplified expectations——when administrators reduce teaching to whatever is easiest to observe and document

…even if that means lower-quality instruction for students…

…and downright absurd expectations for teachers. 

And wow, did people deliver. My favorite example so far:

The main push this year is “where is the teacher standing?” (with the implication that “at the front” = bad).

teachers now lecture from the back of the room (with the projection up front), which is resulting in a diminished learning environment for the students, even while earning more “points” for the teacher from the roaming administrators.

Students have even complained that they have to turn around to even listen well…

…the teachers miss out on many interactions with the students because they can't see the students' faces and reactions to the (poor) lectures.

You can't make this stuff up!

But here's the kicker: at least this school is trying!

The administrators are getting into classrooms, and emphasizing something they think will be better for students. 

That's more than most schools are doing! But we can do better.

Having clear expectations is great.

Getting into classrooms to support those expectations is great. 

Giving teachers feedback on how they're doing relative to shared expectations is great. 

But the “how” matters. It matters enormously. 

So why are schools taking such a reductive, dumbed-down approach to shared expectations? 

I have a one-word answer and explanation: data.

I blame the desire for data. 

To collect data, you MUST define whatever you're measuring reductively. 

If your goal is to have a rich, nuanced conversation, you don't have to resort to crude oversimplifications.

If you talk with teachers in depth about lecturing less and getting around the classroom more as you teach, the possibilities are endless.

But if your goal is to fill out a form or a spreadsheet—well, thenyou have to be reductive

In order to produce a check mark or score from the complex realities of teaching and learning…oversimplifying is the only option. 

So here's my question—and I'd love to have your thoughts on this:

What if we stopped trying to collect data?

What if we said, as a profession, that it's not our job as instructional leaders to collect data?

As a principal and teacher in Seattle Public Schools, I interacted with many university-trained researchers who visited schools to collect data. 

I myself was trained in both qualitative and quantitative research methods as part of my PhD program as well as earlier graduate training. 

knew how to collect data about classroom practice…

But as a principal, I realized that I was the worst person in the worldto actually do this data collection in my school.

Why? Because of what scholars have identified as one of the biggest threats to quality data collection:

Observer effects.

When the principal shows up, teachers behave differently.

When teachers know what the observer wants to see, the song-and-dance commences. 

You want to see students talking with each other? OK, I'll have them “turn and talk” every time you walk into the room, Justin. Write that down on your little clipboard.

You don't want me to lecture from the Smartboard all day? OK, I'll stand at the back, and lecture from there, Colleague.

The late, great Rick DuFour—godfather of Professional Learning Communities—used to tell the story of how he'd prepare his students for formal observations when he was a teacher.

I'm paraphrasing, but it went something like this:

OK, kids—the principal is coming for my observation today, so whenever I ask a question, you all have to raise your hands.

If you know the answer, raise your right hand. If you don't know the answer, raise your left hand, and I won't call on you.

The principal needed “data” on whether students were engaged and understanding the lesson…so the teacher and students obliged with their song-and-dance routine.

Across our profession, in tens of thousands of schools, we're engaged in a conspiracy to manufacture data about classroom practice

It's not a sinister conspiracy. No one is trying to do anything bad. 

We're all behaving rationally and ethically:

—We've been told we need data about teacher practice
—We have a limited number of chances to collect that data from classroom visits
—Teachers know they'll be judged by the data we collect

So they show us what we want to see…

…even if it results in absurd practices like lecturing from the back of the room. 

So here's my suggestion: let's stop collecting data from classroom visits

We already get plenty of quantitative data from assessments, surveys, and other administrative sources. 

We already have enough hats to wear as instructional leaders. We don't need to be clipboard-toting researchers on top of everything else. 

Instead, let's focus on understanding what's happening in classrooms. 

Let's gather evidence in the form of rich, descriptive notes, not oversimplified marks on a form.

Let's talk with teachers about what they're doing, and why, and how it's working. 

Let's stop trying to reduce it all to a score or a check mark. 

The Observability Bias: A Crisis in Instructional Leadership

Our profession is facing a crisis of credibility:

We often don't know good practice when we see it.

Two observers can see the same lesson, and draw very different conclusions. Yet we mischaracterize the nature of the problem.

We think this is a problem of inter-rater reliability. We define it as a calibration issue.

But it's not.

Calibration training—getting administrators to rate the same video clip the same way—won't fix this problem. The crisis runs deeper, because it's a fundamental misunderstanding of the nature of teaching. 

See, we have an “observability bias” crisis in our profession.

I don't mean that observers are biased. I mean that we've warped our understanding of teacher practice, so that we pay a great deal of attention to those aspects of teaching that are easily observed and assessed…

…while undervaluing and overlooking the harder-to-observe aspects of teacher practice, like exercising professional judgment. 

We pay a great deal of attention to surface-level features of teaching, like whether the objective is written on the board…Yet we don't even bother to ask deeper questions, like “How is this lesson based on what the teacher discovered from students' work yesterday?”

The Danielson Framework is easily the best rubric for understanding teacher practice, because it avoids this bias toward the observable, and doesn't shy away from prioritizing hard-to-observe aspects of practice. 

Charlotte Danielson writes:


“Teaching entails expertise; like other professions, professionalism in teaching requires complex decision making in conditions of uncertainty.

If one acknowledges, as one must, the cognitive nature of teaching, then conversations about teaching must be about the cognition.”


Talk About Teaching, pp. 6-7, emphasis in original 

When we forget that teaching is, fundamentally, cognition—not a song and dance at the front of the room—we can distort teaching by emphasizing the wrong “look-fors” in our instructional leadership work. 

It's exceptionally easy to see this problem in the case of questioning strategies, vis-à-vis Bloom's Taxonomy and Webb's Depth of Knowledge (DoK). 

I like Bloom's Taxonomy and DoK. They're great ways to think about the variety of questions we're asking, and to make sure we're asking students to do the right type of intellectual work given our instructional purpose. 

But the pervasive bias toward the easily observable has resulted in what we might call “Rigor for Dummies.”

Rigor for Dummies works like this:

If you're asking higher-order questions, you're providing rigorous instruction.
If you're asking factual recall or other lower-level questions, that's not rigorous. 


Now, to some readers, this will sound too stupid to be true, but I promise, this is what administrators are telling teachers.

Observability bias at work. It's happening every day, all around the US: Administrators are giving teachers feedback that they need to make their questioning more “rigorous” by asking more higher-order questions, and avoiding DoK-1 questions. 

Never mind that neither Bloom nor Webb ever said we should avoid factual-level questions. Never mind that no rigor expert believes factual knowledge is unimportant. 

We want rigor, so we ask ourselves “What does rigor look like?” Then, we come up with the most reductive, oversimplified definition of rigor, so we can assess it without ever talking to the teacher. 

My friend, this will never work. 

We simply cannot understand a teacher's practice without talking with the teacher. Observation alone can't give us true insight into teacher practice.

Why?

Back to Danielson: Because teaching is cognitive work

It's not just behavior.

It can't be reduced to “look-fors” that you can assess in a drive-by observation and check off on a feedback form. 

The Danielson Framework gives us another great example.

Domain 1, Component C, is “Setting Instructional Outcomes.”

(This is a teacher evaluation criterion for at least 40% of teachers in the US.)

How well a teacher sets instructional outcomes is fairly hard to assess based on a single direct observation. 

Danielson describes “Proficient” practice in this area as follows:

“Most outcomes represent rigorous and important learning in the discipline and are clear, are written in the form of student learning, and suggest viable methods of assessment. Outcomes reflect several different types of learning and opportunities for coordination, and they are differentiated, in whatever way is needed, for different groups of students.” (Danielson, Framework for Teaching, 2013)

Is that a great definition? Yes!

But it's hard to observe, so we reduce it to something that's easier to document. We reduce it to “Is the learning target written on the board?”

(And if we're really serious, we might also ask that the teacher cite the standards the lesson addresses, and word the objective in student-friendly “I can…” or “We will…” language.)

Don't get me wrong—clarity is great. Letting teachers know exactly what good practice looks like is incredibly helpful—especially if they're struggling.

And for solid teachers to move from good to great, they need a clearly defined growth pathway, describing the next level of excellence.

But let's not be reductive. Let's not squeeze out all the critical cognitive aspects of teaching, just because they're harder for us to observe. 

Let's embrace the fact that teaching is complex intellectual work.

Let's accept the reality that to give teachers useful feedback, we can't just observe and fill out a form.

We must have a conversation. We must listen. We must inquire about teachers' invisible thinking, not just their observable behavior.  

What do you think?

Are you seeing the same reductive “observability bias” at work in instructional leadership practice?

In what areas of teacher practice? Leave a comment and let me know.

Peter Dewitt—Coach It Further: Using the Art of Coaching to Improve School Leadership

Interview Notes, Resources, & Links

About Dr. Peter Dewitt

Dr. Peter DeWitt is an education consultant focusing on collaborative leadership and fostering inclusive school climates. Within North America, his work has been adopted at the university and state level, and he works with numerous districts, school boards, regional and state organizations where he trains leadership teams and coaches building leaders. He's the author of five books, including his new book Coach It Further.

1 2 3 13