After a brief interlude, it’s time to get back to the blog series I started recently about analyzing assessments.
- In the first post, I shared the importance of digging into the questions, not just the standards they’re correlated to.
- In the second post, I talked about how understanding how a test is designed can help us better understand the results we get.
- In the third post, I shared how I learned to organize assessment data by item difficulty and the implications for supporting our students.
- In this post, I’d like to talk about another way to look at assessment data to uncover areas of celebration and areas of exploration.
Let’s get started!
In my previous post I shared the order of questions based on item difficulty for the 2018 5th grade STAAR for the entire state of Texas. Here it is again:
According to this ordering, question 9 was the most difficult item on the test, followed by question 18, question 8, and so on down to question 10 as the least difficult item (tied with questions 2 and 4).
Here’s my question: What is the likelihood that any given campus across the state would have the exact same order if they analyzed the item difficulty just for their students?
Hopefully you’re like me and you’re thinking, “Not very likely.” Let’s check to see. Here’s the item difficulty of the state of Texas compared to the item difficulty at just one campus with about 80 students. What do you notice? What do you wonder?
Some of my noticings:
- Questions 8, 9, 18, and 21 were some of the most difficult items for both the state and for this particular campus.
- Question 5 was not particular difficulty for the state of Texas as a whole (it’s about midway down the list), but it was surprisingly difficult for this particular campus.
- Question 22 was one of the most difficult items for the state of Texas as a whole, but it was not particularly difficult for this campus (it’s almost halfway down the list).
- Questions 1, 2, 10, 25, and 36 were some of the least difficult items for both the state and for this particular campus.
- Question 4 was tied with questions 2 and 10 for being the least difficult item for the state, but for this particular campus it didn’t crack the top 5 list of least difficult items.
- There were more questions tied for being the most difficult items for the state and more questions tied for being the least difficult items for this particular campus.
What is difficult for the state as a whole might not be difficult for the students at a particular school. Likewise, what is not very difficult for the state as a whole might have been more difficult than expected for the students at a particular school.
But is there an easier way to identify these differences than looking at an item on one list and then hunting it down on the second list? There is!
This image shows the item difficult rank for each question for Texas and for the campus. The final column shows the difference between these rankings.
Just in case you’re having trouble making sense of it, let’s just look at question 9.
As you can see, this was the number 1 most difficult item for the state of Texas, but it was number 3 on the same list for this campus. As a result, the rank difference is 2 because this question was 2 questions less difficult for the campus. However that’s a pretty small difference, which I interpret to mean that this question was generally about as difficult for this campus as it was for the state as a whole. What I’m curious about and interested in finding are the notable differences.
Let’s look at another example, question 5.
This is interesting! This question was number 18 in the item difficulty for Texas, where 1 is the most difficult and 36 is the least difficult. However, this same question was number 5 in the list of questions for the campus. The rank difference is -13 because this questions was 13 questions more difficult for the campus. That’s a huge difference! I call questions like this areas of exploration. These questions are worth exploring because they buck the trend. If instruction at the campus were like the rest of Texas, this question should have been just as difficult for the campus than for the rest of the state…but it wasn’t. That’s a big red flag that I want to start digging to uncover why this question was so much more difficult. There are lots of reasons this could be the case, such as:
- It includes a model the teachers never introduced their students to.
- Teacher(s) at the campus didn’t know how to teach this particular concept well.
- The question included terminology the students hadn’t been exposed to.
- Teacher(s) at the campus skipped this content for one reason or another, or they quickly glossed over it.
In case you’re curious, here’s question 5 so you can see for yourself. Since you weren’t at the school that got this data, your guesses are even more hypothetical than there’s, but it is interesting to wonder.
Let me be clear. Exploring this question isn’t about placing blame. It’s about uncovering, learning what can be learned, and making a plan for future instruction so students at this campus hopefully don’t find questions like this so difficult in the future.
Let’s look at one more question from the rank order list, question 22.
This is sort of the reverse of the previous question. Question 7 was much more difficult for the state as a whole than it was for this campus. So much so that it was 7 questions less difficult for this campus than it was for the state. Whereas question 5 is an area of exploration, I consider question 7 an area of celebration! Something going on at that campus made it so that this particular question was a lot less difficult for the students there.
- Maybe the teachers taught that unit really well and student understanding was solid.
- Maybe the students had encountered some problems very similar to question 7.
- Maybe students were very familiar with the context of the problem.
- Maybe the teachers were especially comfortable with the content from this question.
Again, in case you’re curious, here’s question 22 to get you wondering.
In Texas this is called a griddable question. Rather than being multiple choice, students have to grid their answer like this on their answer sheet:
Griddable items are usually some of the most difficult items on STAAR because of their demand for accuracy. That makes it even more interesting that this item was less difficult at this particular campus.
We can never know exactly why a question was significantly more or less difficult at a particular campus, but analyzing and comparing the rank orders of item difficulty does bring to the surface unexpected, and sometimes tantalizing, differences that are well worth exploring and celebrating.
Just this week I met with teams at a campus in my district to go over their own campus rank order data compared to our district data. They very quickly generated thoughtful hypotheses about why certain questions were more difficult and others were less so based on their memories of last year’s instruction. In meeting with their 5th grade team, for example, we were surprised to find that many of the questions that were much more difficult for their students involved incorrect answers that were most likely caused by calculation errors, especially if decimals were involved. That was very eye opening and got us brainstorming ideas of what we can work on together this year.
This post wraps up my series on analyzing assessment data. I might follow up with some posts specifically about the 2018 STAAR for grades 3-5 to share my analysis of questions from those assessments. At this point, however, I’ve shared the big lessons I’ve learned about how to look at assessments in new ways, particularly with regards to test design and item difficulty.
Before I go, I owe a big thank you to Dr. David Osman, Director of Research and Evaluation at Round Rock ISD, for his help and support with this work. And I also want to thank you for reading. I hope you’ve come away with some new ideas you can try in your own work!