(Note: This post is a follow up to my previous post, Misplaced Priorities.)
When you teach a unit on, say, multiplication, what are you hoping your students will score on an end-of-unit assessment? If you’re like me, you’re probably hoping that most, if not all, of your students will score between 90% and 100%. Considering all the backward designing, the intentional lesson planning, and the re-teaching and support provided to students, it’s not unreasonable to expect that everyone should succeed on that final assessment, right?
So what message does it send to teachers and parents in Texas that STAAR has the following passing rates as an end-of-year assessment?
- 3rd grade – Students only need to score 50% to pass
- 4th grade – Students only need to score 50% to pass
- 5th grade – Students only need to score approximately 47% to pass
Wow! We’ve got really low expectations for Texas students! They can earn an F and still pass the test. How terrible!
Comments like this are what I often hear from teachers, parents, administrators, and other curriculum specialists. I used to believe the same thing and echo these sentiments myself, but not anymore.
Last year, our district’s Teaching & Learning department attended a provocative session hosted by Dr. Kevin Barlow, Executive Director of Research and Accountability in Arlington ISD. He challenged our assumptions about how we interpret passing standards and changed the way I analyze assessments, including STAAR.
The first thing he challenged is this grading scheme as the universal default in schools:
- A = 90% and up
- B = 80-89%
- C = 70-79%
- D = 60-69%
- F = Below 60%
The question he posed to us is, “Who decided 70% is passing? Where did that come from?” He admitted that he’s looked into it, and yet he hasn’t found any evidence for why 70% is the universal benchmark for passing in schools. According to Dr. Barlow, percentages are relative to a given situation and our desired outcome(s):
- Let’s say you’re evaluating an airline pilot. What percentage of flights would you expect the pilot to land safely to be considered a good pilot? Hopefully something in the high 90s like 99.99%!
- Let’s say you’re evaluating a baseball player. What percentage of pitches would you expect a batter to successfully hit to be considered a great baseball player? According to current MLB batting stats, we’re looking at around 34%.
It’s all relative.
Let’s say you’re a 5th grade teacher and your goal, according to state standards, is to ensure your students can multiply up to a three-digit number by a two-digit number. And here’s the assessment you’ve been given for your students to take. How many questions on this assessment would you expect your students to answer correctly to meet the goal you have for them?
- 2 × 3
- 8 × 4
- 23 × 5
- 59 × 37
- 481 × 26
- 195 × 148
- 2,843 × 183
- 7,395 × 6,929
- 23,948 × 8,321
- 93,872 × 93,842
If my students could answer questions 1 through 5 correctly, I would say they’ve met the goal. They have demonstrated they can multiply up to a three-digit number by a two-digit number.
- 2 × 3 (Meets my goal)
- 8 × 4 (Meets my goal)
- 23 × 5 (Meets my goal)
- 59 × 37 (Meets my goal)
- 481 × 26 (Meets my goal)
- 195 × 148 (Beyond my goal)
- 2,843 × 183 (Beyond my goal)
- 7,395 × 6,929 (Beyond my goal)
- 23,948 × 8,321 (Beyond my goal)
- 93,872 × 93,842 (Beyond my goal)
Questions 6 through 10 might be possible for some of my students, but I wouldn’t want to require students to get those questions correct. As a result, my passing rate on this assessment is only 50%. Shouldn’t I think that’s terrible? Isn’t 70% the magic number for passing? But given the assessment, I’m perfectly happy with saying 50% is passing. Expecting an arbitrary 70% on this assessment would mean expecting students to demonstrate proficiency above grade level. That’s not fair to my students.
Some of you might be thinking, “I would never give my students this assessment because questions 6 through 10 are a waste of time because they’re above grade level.” In that case, your assessment might look like this instead:
- 2 × 3
- 8 × 4
- 23 × 5
- 59 × 37
- 481 × 26
It hasn’t changed the expectation of what students have to do to demonstrate proficiency, and yet, to pass this assessment, I would expect students to earn a score of 100%, rather than 50%. Again, I would be unhappy with the arbitrary passing standard of 70%. That would mean it’s okay for students to miss questions that I think they should be able to answer. On this assessment, requiring a score of 100% makes sense because I would expect 5th graders to get all of these problems correct. If they don’t, then they aren’t meeting the goal I’ve set for them.
So why not just give the second assessment where students should all earn 100%? If that’s the expectation, then why bother with the extra questions?
This is exactly the issue going on with STAAR and its perceived low passing rate.
When you have an assessment where 100% of students can answer 100% of the questions correctly, all you learn is that everyone can get all the questions right. It masks the fact that some students actually know more than their peers. In terms of uncovering what our learners actually know, it’s just not very useful data.
More useful (and interesting) is an assessment where we can tell who knows more (or less) and by how much.
STAAR is designed to do this. The assessment is constructed in such a way that we can differentiate between learners to get a better sense of what they know relative to one another. In order to do this, however, it requires constructing an assessment similar to that 10-item multiplication assessment.
Just like how questions 1 through 5 on the multiplication assessment were aligned with the goal for multiplication in 5th grade, about half the questions on STAAR (16 or 17 questions, depending on the grade level) are aligned with Texas’ base level expectations of what students in 3rd, 4th, and 5th grade should be able to do. That half of the assessment is what we expect all of our students to answer correctly, just like we would expect all 5th graders to answer questions 1 through 5 correctly on the 10-item multiplication assessment.
So how do Texas students fare in reality? Here are the numbers of students at each grade level who answered at least half of the questions correctly on STAAR in spring 2018:
- Grade 3 – 77% passed with at least half the question correct (299,275 students out of 386,467 total students)
- Grade 4 – 78% passed with at least half the questions correct (308,760 students out of 397,924 total students)
- Grade 5 – 84% passed with about half of the questions correct (337,891 students out of 400,664 total students)
Not bad! More than three quarters of the students at each grade level demonstrated that they can answer at least half of the questions correctly. These students are meeting, if not exceeding, the base level expectations of their respective grade levels. (Side note: Texas actually says students earning a 50% are Approaching grade level and a higher percentage is called Meets grade level. I’m not going to play with the semantics here. For all intents and purposes, earning a 50% means a student has passed regardless of what you want to call it.) But we’re left with some questions:
- How many of these roughly 300,000 students at each grade level performed just barely above the base level expectations?
- How deep is any given student’s understanding?
- How many of these students exhibited mastery of all the content assessed?
Good news! Because of how the assessment is designed, we have another set of 16 or 17 questions to help us differentiate further among the nearly 300,000 students at each grade level who passed. This other half of the questions on STAAR incrementally ramps up the difficulty beyond that base level of understanding. The more questions students get correct beyond that first half of the assessment, the better we’re able to distinguish not only who knows more but also by how much.
Since percents are relative and 70% is our culturally accepted passing standard, why isn’t the STAAR designed to use that passing standard instead? It would definitely remove the criticisms people have about how students in Texas pass with an F.
Here are two rough draft graphs I created to attempt to illustrate the issue. Both graphs represent the 3rd grade STAAR which has a total of 32 questions. The top graph is showing a hypothetical passing standard of 70% and the bottom graph is showing the actual passing standard of 50%
The first graph represents a 3rd grade STAAR where 70% is designed to be the passing standard. This means 22 questions are needed to represent the base level of understanding (assuming this assessment also has a total of 32 items). Since we’re not changing the level of understanding required to pass, presumably 300,000 students would pass this version of the assessment as well. That leaves only 10 questions to help us differentiate among those 300,000 students who passed to see by how much they’ve exceeded the base level. That’s not a lot of wiggle room.
The second graph represents the current 3rd grade STAAR where 50% is designed to be the passing standard. This means 16 questions are needed to represent the base level of understanding, but now we have another 16 questions to help us differentiate among the 300,000 students who passed. Because there are a number of high performing students in our state, this still won’t let us differentiate completely, but there’s definitely more room for it with the 50% passing standard than the 70% passing standard.
Some points I want to make clear at this point in the post:
- There are definitely issues with an assessment where half of it is by design more difficult than the expectations of the grade level. We have roughly a quarter of students in Texas who can’t even answer the base level questions correctly (half the assessment). Unfortunately, they’re subjected to the full assessment and the base level questions are interspersed throughout. There are a lot of issues around working memory, motivation, and identity that could be considered and discussed here. That’s not what I’m trying to do in this post, however. As I mentioned in my previous post, regardless of how I feel, this is the reality for our teachers and students. I want to understand that reality as best I can because I still have to live and work in it. I can simultaneously try to effect changes around it, but at the end of the day my job requires supporting the teachers and students in my district with STAAR as it is currently designed.
- STAAR is trying to provide a mechanism for differentiating among students…in general. However, having analyzed this data at the campus level (and thanks to the expertise of my colleague Dr. David Osman) it’s clear that STAAR is too difficult for some campuses and too easy for other campuses. In those extremes, it’s not helping those campuses differentiate very well because too many students are either getting questions wrong or right.
- This post is specifically about how STAAR is designed. I can’t make any claims about assessments in other states. However, I hope this post might inspire you to dig more deeply into how your state assessment is constructed.
- I’m not trying to claim that every assessment should be designed this way. I’m sharing what I learned specifically about how STAAR is designed. Teachers and school districts have to make their own decisions about how they want to design their unit assessments and benchmark assessments based around their own goals.
In my next post I’m going to dive into the ways I’ve been analyzing assessment data differently this past year. I wanted to write this post first because this information I learned from Dr. Barlow has completely re-framed how I think about STAAR. I no longer believe that the 50% passing rate is a sign that we have low expectations for Texas students. Rather, STAAR is an assessment that is designed to not only tell us who has met grade level expectations, but also by how much many of our students have exceeded them. With that in mind, we can start to look at our data in interesting and productive ways.