# Difficult

This post is the third in a series where I’m sharing how I’ve changed the ways that I look at assessments and assessment data.

• In the first post, I shared the importance of digging into the questions, not just the standards they’re correlated to.
• In the second post, I talked about how understanding how a test is designed can help us better understand the results we get.
• In this post, I’d like to share one of the ways I’ve learned how to analyze assessment results.

Let’s get started!

Do you know what the most difficult item on an assessment is?

• Is it the one with a pictograph with a scaled interval that involves combining the values from several categories?
• Is it the multi-step story problem involving addition, subtraction, and multiplication?
• Is it the one about matching a set of disorganized data with the correct dot plot out of four possible answer choices?

Here’s the thing I learned from Dr. Kevin Barlow, Executive Director of Research and Accountability in Arlington ISD, no matter how much time and effort someone spends designing an item, from crafting the wording to choosing just the right numbers, the only way to determine the difficulty of an item is to put it in front of students on an assessment. After students are finished, take a look at the results and find the question where the most students were incorrect.

You found it! That’s the most difficult item on the assessment.

Through their responses, our students will tell us every single time which question(s) were the most difficult for them. It’s our responsibility to analyze those questions to determine what made them so challenging.

Fortunately, the Texas Education Agency provides this information to us in Statewide Item Analysis Reports. Unfortunately, it starts out looking like this:

This is a great first step, but it’s not terribly useful in this format. You can’t glance at it and pick out anything meaningful. However, if I copy this data into a spreadsheet and sort it, it becomes so much more useful and meaningful:

Now I’ve sorted the questions based on how students performed, from the item most students answered incorrectly (#9 was the most difficult item on this test) to the item the least number of students answered incorrectly (#2, #4, and #10 were tied for being the least difficult items on this test). It’s interesting to think that #9 and #10, back to back, turned out to be the least and most difficult for 5th graders across the state of Texas!

The items highlighted in red were the most difficult items for 5th graders. Remember, it doesn’t matter how the questions were designed. These items were the most difficult because the least number of students answered them correctly.

The items highlighted in blue, on the other hand, were the least difficult items for 5th graders in Texas. I’m intentional about calling them the least difficult items. We might be inclined to call them the easiest items, but that obscures the fact that these questions were still difficult enough that 14-17% of all Texas 5th graders answered them incorrectly. To put some real numbers with that, anywhere from 56,000 to 68,000 students answered these “easy” items incorrectly. These items were clearly difficult for these students, but they were the least difficult for the population of 5th graders as a whole.

Now what?

We might be inclined to go to the items in red and start analyzing those first. Great idea! But for whom?

Well, since they were the most difficult items, meaning the most students missed them, we should use these items to teach all of our students, right? Clearly everyone had issues with them!

I’m going to disagree with that.

These items were difficult even for some of our strongest students. If they struggled, then the last thing I want to do is bring this level of challenge to all of my students, especially those who struggled throughout the test. Rather, I’ll analyze the most difficult items to get ideas to provide challenge to my higher performing students. These kinds of questions are clearly structured in a way that gets them thinking, challenges them, and perhaps even confuses them. That’s good information to know!

(Please don’t misinterpret this as me saying that I don’t want to challenge all students. Rather, I want to ensure all students are appropriately challenged, and that’s what I’m trying to identify through this kind of analysis. Read on to see what I mean.)

But what about students who struggled throughout the test? For those students, I’m going to analyze the least difficult items. In this case, 14-17% of students in Texas answered even these items incorrectly. These items posed a challenge for quite a number of students, and I want to analyze the items to figure out what made them challenging for these students.

Let’s pretend that this is school data instead of Texas data, and let’s pretend we’re a team of 6th grade teachers analyzing 5th grade data for our 200 6th graders. That would mean at least 28-34 students in our 6th grade did not do well on these least difficult items when they took 5th grade STAAR last spring. That’s a pretty significant number of kids! They could for sure benefit from some form of intervention based on what we learn from analyzing these items.

And that’s where I’m going to leave this in your hands! Here is a document where I’ve collected the most difficult and least difficult items from the 2018 5th grade STAAR. These are the actual test questions along with the percentage of students who selected each answer choice. Spend a little time analyzing them. Here are some questions to guide you:

• What are the features of each question? (How is the question constructed? What are its components and how are they put together in the question?)
• Why do you suppose the features of a given question made it more/less difficult for students?
• What mathematical knowledge and skills are required to be successful with each question?
• What non-mathematical knowledge and skills are required to be successful with each question?
• What can you learn from analyzing the distractors? What do they tell you about the kinds of mistakes students made or the misunderstandings they might have had?
• What lessons can we learn from these questions to guide us in how we support our students? (We don’t want to teach our students these exact questions. That’s not terribly useful since they won’t be taking this exact test again. Rather, seek out general themes or trends that you observe in the questions that can guide your classroom instruction and/or intervention.)

I’ve opened up the document so that anyone can comment. If you’d like to share your thoughts on any of the questions, please do! I look forward to reading your thoughts about the least and most difficult items on the 2018 5th grade STAAR.

I’m giving you a very small set of questions to analyze right now. You may or may not be able to generalize much from them depending on your own experiences analyzing assessment items. However, it’s worth doing regardless of your experience, because now the repertoire of items you’ve analyzed will be that much larger.

As for myself, I’ve been analyzing assessment items like this for several years. What I’d like to do in my next post is share some of the lessons I’ve learned from this analysis across multiple years. I do feel like there are consistent trends (and a few surprises) that can inform our work in ways that simultaneously align with high-quality math instruction (because ultimately this is what I care much more about than testing) while also ensuring students are given the supports they need to succeed on mandatory high stakes tests (because they are a fact of life and it’s our responsibility to ensure students, especially those who are relying on school for this support, are prepared for them).