The tl;dr version is that I concluded the series in May 2018 with these parting thoughts:

…what started as a blog series where I was planning to reflect on the changes I might make for next year has instead reaffirmed that the work I’ve done with my teachers over the past three years has resulted in six scope and sequences that make sense and don’t actually require much tweaking at all. I’m proud of what we’ve accomplished. Are they perfect? Probably not. But they appear to be working for our teachers and students, and at the end of the day that’s what matters.

Source

Fast forward to this post I wrote reeling from my experiences at the Math Perspectives Leadership Institute in late June:

There is a HUGE disconnect between what [Kathy Richardson’s] experience says students are ready to learn in grades K-2 and what our state standards expect students to learn in those grades. I’ve been trying to reconcile this disconnect ever since, and I can tell it’s not going to be easy… I’m very conflicted right now. I’ve got two very different trajectories in front of me… Kathy Richardson is all about insight and understanding. Students are not ready to see…until they are. “We’re not in control of student learning. All we can do is stimulate learning.” Our standards on the other hand are all about getting answers and going at a pace that is likely too fast for many of our students. We end up with classrooms where many students are just imitating procedures or saying words they do not really understand. How long before these students find themselves in intervention? We blame the students (and they likely blame themselves) and put the burden on teachers down the road to try to build the foundation because we never gave it the time it deserved.

Source

What a difference a month makes.

In May I was feeling proud and confident of the work I’d accomplished developing and revising our elementary scope and sequence documents. A month later I’m calling everything into question and having a crisis of conscience about whether the scope and sequences I’ve planned are actually creating some of the struggles I was trying to prevent.

Back in July I closed my post with no answers:

But how to provide that time? That’s the question I need to explore going forward. If you were hoping for any answers in this post, I don’t have them. Rather, if you have any advice or insights, I’d love to hear them, and if I learn anything interesting along the way, I’ll be sure to share on my blog.

Source

This big question of how to reconcile the pace of learning for our youngest students with the pace of the state standards has been on my mind for months. Throughout the fall semester, I had countless conversations with colleagues in and out of my district. These conversations culminated in my taking a stab at revising our scope and sequences in grades K and 1 as well as proposing a new instructional model in grades K and 1. (Ultimately I made revisions to the scope and sequence documents for grades K-4, but I’m going to focus on K and 1 in this post.)

I’ve been sharing, talking about, and revising these document with teachers, instructional coaches, and curriculum specialists in my district for a couple of months now, and I feel like they’re finally in a shape that I want to share them here so you can see where all of this thinking has taken me since I last wrote about this in July.

As a point of reference, here are the Kindergarten and 1st grade units for the 2018-19 school year.

**Kindergarten 2018-19**

**1st Grade 2018-19**

Our curriculum is now open to the public, so if you’re interested in visiting any of these units to see unit rationales, standards, lessons, etc., you can do that here.

Contrast that with these proposed units for the 2019-20 school year:

*Proposed* Kindergarten 2019-20

**Fall Semester**- Unit 1 – I Am a Mathematician! (21 days)
- Unit 2 – Beginning Number Concepts (30 days)
- Unit 3 – Sorting and Classifying (30 days)

**Spring Semester**- Unit 4 – The Concepts of More, Less, and the Same (30 days)
- Unit 5 – Joining and Separating Quantities (30 days)
- Unit 6 – Building Number Concepts (30 days)

*Proposed* 1st Grade 2019-20

**Fall Semester**- Unit 1 – I Am a Mathematician! (15 days)
- Unit 2 – Adding and Subtracting (30 days)
- Unit 3 – Exploring Shapes and Fair Shares (27 days)
- Unit 4 – Understanding Money (10 days)

**Spring Semester**- Unit 5 – More Adding and Subtracting (20 days)
- Unit 6 – Collecting and Analyzing Data (10 days)
- Unit 7 – Introducing Unitizing (15 days)
- Unit 8 – Exploring the Place Value System (24 days)

Here are some of the changes and my rationale for them:

- In Kindergarten we drastically reduced the number of units. Instead of 10 units, we’re down to 6. On top of that, the first unit has shifted from counting concepts to “I Am a Mathematician!” What does that mean? Here are the notes I took to describe this unit:
- Exploring manipulatives
- Exploring patterns
- Reading books about counting, shapes, and patterns
- Setting norms and expectations for engaging in a community of mathematicians
- Establishing routines
- Getting to know students’ strengths and areas of growth

- I made the names of the units more vague. Rather than stress teachers out that their students should be counting to 5, then 10, then 20 in lockstep, I’m providing space for students to engage in number concepts in general. Teachers can differentiate as needed so students who need to work within 5 can continue to do that while other students are exploring 8 or 12 or 14.
- I made the units in Kindergarten longer to give students time to “live” in the landscape of these concepts. This goes hand-in-hand with the new instructional model I’m proposing based on the work of Kathy Richardson. Now a typical day will include a short opening activity that’s done together as a whole class. The bulk of math time will be spent in an explore time where students self-select activities that are variations on the core concept of the unit. During this explore time, the teacher’s primary role is to confer with students and continually nudge them along in their understanding. Each day there is a short lesson close to help students reflect on their learning. Here’s a link to a sample suggested unit plan to help teachers envision what a unit might look like in grades K and 1. (Note: If you encounter a link you can’t access in the document it’s likely due to copyright that we don’t control.)
- In 1st grade I reduced the number of units focusing on addition and subtraction. Similar to number concepts in 1st grade, I want to give students an extended amount of time to “live” in these concepts.
- In 1st grade I moved place value to the very end of the year. According to Kathy Richardson, unitizing and place value topics are challenging for 1st graders. However, I have to include them because our state standards require it. In order to reconcile this, I want to give students as much of the year as possible for their brains to develop so they are working with the most up-to-date hardware when they start learning these critical concepts. Putting it at the end of the year also creates more proximity to when students will continue learning about place value in 2nd grade. I’ve even added a 2-digit place value unit to our 2nd grade scope and sequence to create a bridge and continue the learning.
- In 1st grade, I created a unit just on unitizing and followed that up with a unit on place value. Using activities from Kathy Richardson’s Developing Number Concepts series, students will spend three weeks making, naming, and describing groups of 4, groups of 5, groups of 6, and eventually groups of 10. Then they’ll spend almost five weeks extending this as they learn how our place value system is built on groups of 10.

The units are just the tip of the iceberg. The math block in our district is 80 minutes and broken up across three components:

- Focus Instruction (50 minutes)
- Numeracy (10 minutes) – This used to be named Computational Fluency but I’m re-branding it because the names imply different goals.
- Spiral Review (20 minutes)

So when I revised the scope and sequence documents, I also revised the learning across all three components.

**Draft**** Kindergarten At-A-Glance 2019-20**

**Draft**** 1st Grade At-A-Glance 2019-20**

Things to point out:

- I’ve settled on a few anchor instructional routines across all grade levels – number talks, choral counting, and counting collections. That’s not to say that teachers can’t use other routines – I encourage them to – but my goal is to ensure that these three powerful, versatile routines are in everyone’s toolbox.
- Kindergarten only has 60 minutes of math instruction in the fall semester so they don’t start spiral review until the spring semester.
- In 1st grade the numeracy topics are fairly consistent across the year – skip counting, subitizing, making 10, and developing strategies for adding and subtracting within 20. My hope is that the consistency of topics across the year paired with the anchor instructional routines will allow the numeracy work to feel more like an ongoing
*conversation*across the year. - In 1st grade creating, solving, and representing addition and subtraction problems is a spiral review topic over and over again. I want to ensure students have lots and lots of opportunities to engage with problems involving joining, separating, and comparing quantities.

**Parting Thoughts**

Now that I’ve started to get a plan in place, I have a lot of work ahead of me to create all the associated unit documents. I’m also going to be working on gathering teachers who want to pilot these new units. I’m wary of just dumping them on our teachers because they’ve already put so much work into learning the old units, and there are some heavy instructional shifts that might need to be made to make these units work. Thankfully I don’t think it will be too hard to find volunteers. Teachers who’ve looked at these plans and talked about them with me or their instructional coach have been really excited for the changes, so much so that I have an entire Kindergarten team trying out one of the new units right now!

While there are still a lot of unknowns and a lot of work ahead to support teachers, I do feel like all of the reflecting, conversations, and attempts at making a new plan over the past six months have brought me to a place where I feel like I’m moving in a good direction that I’m happy to follow for the time being.

Here’s to the path ahead.

]]>They reached out to me for support, and I thought I’d share with you what I shared with them in case it’s helpful to anyone else creating their own numberless word problems.

First things first, start with the problem you want to transform into a numberless word problem. Here’s the problem I started with for this example:

I type the problem on a slide, either in Powerpoint or Google Slides. You can create your problem on chart paper or on strips of paper if you’re working with a small group. I’m partial to digital slides because of some other features you’ll see later in the post.

From here I create a copy of this slide and remove some of the information. Usually I start by removing the question.

Next I copy this new slide and again decide what information to remove. In this case I decided to remove the entire last sentence. That sentence dramatically changes our understanding of the situation. If you look at the slide below you’ll see that we know the total number of kids eating ice cream and the number of kids eating chocolate ice cream.

The situation is very open right now. The rest of the kids could be eating a variety of different flavors – vanilla, strawberry, chocolate chip. When I reveal the sentence that the rest of the students are eating vanilla ice cream, there’s a nice element of surprise because you aren’t necessarily expecting that the kids are only eating just two different flavors.

My next step is to remove one of the numbers. In this case I’ll take away the number of children eating chocolate ice cream.

Finally, I’ll remove the number in the first sentence to get me to the beginning of this problem. This is the first text students will read.

I structure my slides to minimize changes. I don’t want to overwhelm the students by revealing too much all at once. I will add new sentence, but I avoid changing language that’s already on the slide, if possible. More often than not I’m only changing a word like “some” into a specific quantity. There are rare instances where I’ll have to adjust a sentence as new information is added, but I try not to do that. I want the sentence structure to stay the same so that when the numbers are added that’s the only real change.

You might have noticed that I don’t include pictures on the slides with the text. This is intentional. I used to include pictures, but a colleague shared how distracting the pictures were for her students. Students were looking for meaning in them when they were only there essentially as decoration, with the intent that they would support visualizing. However, the pictures ended up confusing her students rather than helping because the students kept trying to make connections between the pictures and text. Since then I’ve avoided pictures on the text slides unless the picture is absolutely necessary.

The first step was to work backward to plan out each slide so that information is slowly revealed on each slide. Now it’s time to plan the questions I’m going to ask the students at each step along the way. I have two primary goals that I strive for in my questioning:

- I want students to
**visualize**what the story is about as it unfolds. If they’re not “seeing” it, then they’re likely not making much sense of it. - I want students to make
**guesses and estimates about quantitie**s in the story using what they know about the situation and the relationships provided. I want them**reasoning**all along the way so that by the time they get to answering the question they are holding themselves accountable if their answer doesn’t make sense.

So now I go back through the slides in the order they will be presented and add the questions I plan to ask along the way.

When a new slide is presented, I always ask a question to get students to state the new information. I’ve also worded this as, “What changed? What do we know now that we didn’t know before?”

Something I’ve been doing for the past year with numberless word problems is bookending them with visuals to add a little more texture to the experience.

The first thing I do is find a high quality image or two to show the students and have them chat about before we dive into reading any text. My go-to website for images is Pixabay.

I type in a word or phrase related to the story problem, like **ice cream**, and more often than not I hit the jackpot:

I look for a photo that I think will capture kids’ attention and activate their prior knowledge of the context. It allows students who may be less familiar with a situation to hear the relevant language, such as ice cream, chocolate, vanilla, and cone, before we dive into reading the text.

Here’s the picture I ultimately chose to engage students at the start of this problem, along with some notes of how I’d facilitate the opening discussion with the students.

When I paste the picture on a slide, I always go into the Notes section of the slide and paste the source of the picture(s), usually the URL where I found it. On Pixabay, more often than not the photos have licenses allowing reuse. You can find the license information to the right of each photo. I know in the privacy of your own classroom it feels easy to get away with grabbing whatever picture you can find on Google Images, but it’s good habit to pull legal photos to avoid unforseen issues down the road. And with amazing sites like Pixabay and Wikimedia Commons available, there’s no reason not to at least start by looking for freely available photos.

I’ve been making it a habit to close each numberless word problem with a short video. This serves two goals:

- It further builds students’ knowledge of the situation discussed. In the case of the problem I shared in this post, it was about kids eating ice cream so I found a short video of a kid making ice cream. Even if you can only find longer videos, you don’t have to show the whole thing. You could just watch the first minute (or whichever section is most relevant or interesting).
- It serves as a pay off for all of the hard work students just did to make sense of and solve the problem.

I’m sure you can guess where I go to find videos. YouTube has such an endless supply of videos, that I haven’t yet encountered a situation where I couldn’t find a video worth sharing. Sometimes it’s the first video and sometimes it’s the tenth, but it’s always there waiting to be discovered.

Now that you’ve seen me put together this numberless word problem in pieces, here’s your chance to see the finished product. This link will take you to the slideshow for the finished product.

In the Notes section on some of the slides, you’ll see references to students sketching in boxes. I created a recording sheet to try out when I modeled a different problem recently. If you want to check out the recording sheet, here’s the link. I don’t have a lot of experience using it yet so I don’t want to say more about it right now, but I do want to share in case it’s helpful.

If you have any questions, don’t hesitate to reach out in the comments or tweet me @bstockus. And if you create your own problem, please share it with us on Twitter using the #numberlesswp hashtag.

]]>It’s a talk I’m proud of because in five minutes I was able to share why it’s so urgent to me that we ensure sense making is the focus of our work with students. Students deserve to develop positive relationships with what they’re learning **now**. It’s a disservice to assume they’ll learn to like it later. We never know what doors are closed to students because they learned to hate a subject or grow up thinking they’re not smart enough.

If you’re looking for some more mathematical inspiration as Halloween approaches, check out these three blog posts I wrote which include lots of photos and ideas for how to use them to spark mathematical conversations with your students.

Enjoy!

]]>- In the first post, I shared the importance of digging into the questions, not just the standards they’re correlated to.
- In the second post, I talked about how understanding how a test is designed can help us better understand the results we get.
- In the third post, I shared how I learned to organize assessment data by item difficulty and the implications for supporting our students.
- In this post, I’d like to talk about another way to look at assessment data to uncover areas of celebration and areas of exploration.

Let’s get started!

In my previous post I shared the order of questions based on item difficulty for the 2018 5th grade STAAR for the entire state of Texas. Here it is again:

According to this ordering, question 9 was the most difficult item on the test, followed by question 18, question 8, and so on down to question 10 as the least difficult item (tied with questions 2 and 4).

Here’s my question: What is the likelihood that any given campus across the state would have the exact same order if they analyzed the item difficulty just for their students?

Hopefully you’re like me and you’re thinking, “Not very likely.” Let’s check to see. Here’s the item difficulty of the state of Texas compared to the item difficulty at just one campus with about 80 students. What do you notice? What do you wonder?

Some of my noticings:

- Questions 8, 9, 18, and 21 were some of the most difficult items for
*both*the state and for this particular campus. - Question 5 was not particular difficulty for the state of Texas as a whole (it’s about midway down the list), but it was
*surprisingly difficult*for this particular campus. - Question 22 was one of the most difficult items for the state of Texas as a whole, but it was
*not particularly difficult*for this campus (it’s almost halfway down the list). - Questions 1, 2, 10, 25, and 36 were some of the least difficult items for
*both*the state and for this particular campus. - Question 4 was tied with questions 2 and 10 for being the least difficult item for the state, but for this particular campus it didn’t crack the top 5 list of least difficult items.
- There were more questions tied for being the most difficult items for the state and more questions tied for being the least difficult items for this particular campus.

My takeaway?

What is difficult for the state as a whole might not be difficult for the students at a particular school. Likewise, what is not very difficult for the state as a whole might have been more difficult than expected for the students at a particular school.

But is there an easier way to identify these differences than looking at an item on one list and then hunting it down on the second list? There is!

This image shows the item difficult rank for each question for Texas and for the campus. The final column shows the difference between these rankings.

Just in case you’re having trouble making sense of it, let’s just look at question 9.

As you can see, this was the number 1 most difficult item for the state of Texas, but it was number 3 on the same list for this campus. As a result, the rank difference is 2 because this question was 2 questions less difficult for the campus. However that’s a pretty small difference, which I interpret to mean that this question was generally about as difficult for this campus as it was for the state as a whole. What I’m curious about and interested in finding are the *notable* differences.

Let’s look at another example, question 5.

This is interesting! This question was number 18 in the item difficulty for Texas, where 1 is the most difficult and 36 is the least difficult. However, this same question was number 5 in the list of questions for the campus. The rank difference is -13 because this questions was 13 questions *more difficult* for the campus. That’s a huge difference! I call questions like this **areas of exploration**. These questions are worth exploring because they buck the trend. If instruction at the campus were like the rest of Texas, this question should have been just as difficult for the campus than for the rest of the state…but it wasn’t. That’s a big red flag that I want to start digging to uncover why this question was so much more difficult. There are lots of reasons this could be the case, such as:

- It includes a model the teachers never introduced their students to.
- Teacher(s) at the campus didn’t know how to teach this particular concept well.
- The question included terminology the students hadn’t been exposed to.
- Teacher(s) at the campus skipped this content for one reason or another, or they quickly glossed over it.

In case you’re curious, here’s question 5 so you can see for yourself. Since you weren’t at the school that got this data, your guesses are even more hypothetical than there’s, but it is interesting to wonder.

Let me be clear. Exploring this question isn’t about placing blame. It’s about uncovering, learning what can be learned, and making a plan for future instruction so students at this campus hopefully don’t find questions like this so difficult in the future.

Let’s look at one more question from the rank order list, question 22.

This is sort of the reverse of the previous question. Question 7 was much more difficult for the state as a whole than it was for this campus. So much so that it was 7 questions *less difficult* for this campus than it was for the state. Whereas question 5 is an area of exploration, I consider question 7 an **area of celebration**! Something going on at that campus made it so that this particular question was a lot less difficult for the students there.

- Maybe the teachers taught that unit really well and student understanding was solid.
- Maybe the students had encountered some problems very similar to question 7.
- Maybe students were very familiar with the context of the problem.
- Maybe the teachers were especially comfortable with the content from this question.

Again, in case you’re curious, here’s question 22 to get you wondering.

In Texas this is called a griddable question. Rather than being multiple choice, students have to grid their answer like this on their answer sheet:

Griddable items are usually some of the most difficult items on STAAR because of their demand for accuracy. That makes it even more interesting that this item was less difficult at this particular campus.

We can never know exactly why a question was significantly more or less difficult at a particular campus, but analyzing and comparing the rank orders of item difficulty does bring to the surface unexpected, and sometimes tantalizing, differences that are well worth exploring and celebrating.

Just this week I met with teams at a campus in my district to go over their own campus rank order data compared to our district data. They very quickly generated thoughtful hypotheses about why certain questions were more difficult and others were less so based on their memories of last year’s instruction. In meeting with their 5th grade team, for example, we were surprised to find that many of the questions that were much more difficult for their students involved incorrect answers that were most likely caused by calculation errors, especially if decimals were involved. That was very eye opening and got us brainstorming ideas of what we can work on together this year.

This post wraps up my series on analyzing assessment data. I might follow up with some posts specifically about the 2018 STAAR for grades 3-5 to share my analysis of questions from those assessments. At this point, however, I’ve shared the big lessons I’ve learned about how to look at assessments in new ways, particularly with regards to test design and item difficulty.

Before I go, I owe a big thank you to Dr. David Osman, Director of Research and Evaluation at Round Rock ISD, for his help and support with this work. And I also want to thank *you* for reading. I hope you’ve come away with some new ideas you can try in your own work!

My daughter has a board game called Unicorn Glitterluck.

It’s super cute, but not the most engrossing game. She and I especially like the purple cloud crystals, so this weekend I started brainstorming a math game I could make for us to play together. I know number combinations is an important idea she’ll be working on in 1st grade, so I thought about how to build a game around that while also incorporating the crystals.

Introducing…**Crystal Capture**!

Knowing that certain totals have greater probabilities of appearing than others, I created a game board that takes advantage of this. Totals like 6, 7, and 8 get rolled fairly frequently, so those spaces only get 1 crystal each. Totals like 2, 3, 11, and 12, on the other hand, have less chance of being rolled, so I only put 1 space above each of these numbers, but that space has 3 crystals.

I mocked up a game board and we did a little play testing. I quickly learned a few things:

I originally thought we would play until the board was cleared. Everything was going so well until all we had left was the one space above 12. We spent a good 15 minutes rolling and re-rolling. We just couldn’t roll a 12!! That was getting boring fast which led me to introduce a **special move** when you roll a double. That at least gave us something to do while we waited to finally roll a 12.

That evening I made a fancier game board in Powerpoint and we played the game again this morning:

Since clearing the board can potentially take a long time, which sucks the life out of the game, I changed the end condition. Now, if all nine of the spaces above 6, 7, and 8 are empty, the game ends. Since these numbers get rolled more frequently, the game has a much greater chance of ending without dragging on too long.

I did keep the special move when you roll doubles though. This adds a little strategic element. When you roll a double, you can replenish the crystals in any one space on the board. Will you refill a space above 6, 7, or 8 to keep the game going just a little bit longer? Or will you replenish one of the three-crystal spaces in hopes of rolling that number and claiming the crystals for yourself?

All in all, my daughter and I had a good time playing the game, and I learned a lot about where she’s at in her thinking about number combinations. Some observations:

- She is very comfortable using her fingers to find totals.
- Even though she knows each hand has 5 fingers, she’ll still count all 5 fingers one-at-a-time about 75% of the time.
- She is pretty comfortable with most of her doubles. She knows double 5 is 10, for example. She gets confused whether double 3 or double 4 is 8. We rarely rolled double 6, so I have no idea what she knows about that one.
- In the context of this game at least, she is not thinking about counting on from the larger number…yet. She doesn’t have a repertoire of strategies to help her even if she did stop and analyze the two dice. If she sees 1 and 5, she’ll put 1 finger up on one hand and 5 on the other, then she’ll count all.
- I did see hints of some combinations slowly sinking in. That’s one benefit to dice games like this. As students continue to roll the same combinations over and over, they’ll start to internalize them.

Several folks on Twitter expressed interest in the game, so I wanted to write up this post and share the materials in case anyone out there wants to play it with their own children or students.

You’ll have to scrounge up your own crystals to put in the spaces, but even if you don’t have fancy purple ones like we do, small objects like buttons, along with a little imagination, work just as well. Oh, and if you can get your hands on sparkly dice, that helps, too. My daughter loves the sparkly dice I found in a bag of dice I had lying around.

Have fun!

]]>- In the first post, I shared the importance of digging into the questions, not just the standards they’re correlated to.
- In the second post, I talked about how understanding how a test is designed can help us better understand the results we get.
- In this post, I’d like to share one of the ways I’ve learned how to analyze assessment results.

Let’s get started!

Do you know what the most difficult item on an assessment is?

- Is it the one with a pictograph with a scaled interval that involves combining the values from several categories?
- Is it the multi-step story problem involving addition, subtraction, and multiplication?
- Is it the one about matching a set of disorganized data with the correct dot plot out of four possible answer choices?

Here’s the thing I learned from Dr. Kevin Barlow, Executive Director of Research and Accountability in Arlington ISD, no matter how much time and effort someone spends designing an item, from crafting the wording to choosing just the right numbers, the only way to determine the difficulty of an item is to put it in front of students on an assessment. After students are finished, take a look at the results and *find the question where the most students were incorrect*.

You found it! That’s the most difficult item on the assessment.

Through their responses, our students will tell us every single time which question(s) were the most difficult for them. It’s our responsibility to analyze those questions to determine what made them so challenging.

Fortunately, the Texas Education Agency provides this information to us in Statewide Item Analysis Reports. Unfortunately, it starts out looking like this:

This is a great first step, but it’s not terribly useful in this format. You can’t glance at it and pick out anything meaningful. However, if I copy this data into a spreadsheet and sort it, it becomes so much more useful and meaningful:

Now I’ve sorted the questions based on how students performed, from the item most students answered incorrectly (#9 was the most difficult item on this test) to the item the least number of students answered incorrectly (#2, #4, and #10 were tied for being the least difficult items on this test). It’s interesting to think that #9 and #10, back to back, turned out to be the least and most difficult for 5th graders across the state of Texas!

The items highlighted in red were the **most difficult** items for 5th graders. Remember, it doesn’t matter how the questions were designed. These items were the most difficult because the least number of students answered them correctly.

The items highlighted in blue, on the other hand, were the **least difficult** items for 5th graders in Texas. I’m intentional about calling them the least difficult items. We might be inclined to call them the easiest items, but that obscures the fact that these questions were still *difficult enough* that 14-17% of all Texas 5th graders answered them incorrectly. To put some real numbers with that, anywhere from 56,000 to 68,000 students answered these “easy” items incorrectly. These items were clearly difficult for these students, but they were the least difficult for the population of 5th graders as a whole.

Now what?

We might be inclined to go to the items in red and start analyzing those first. Great idea! But for whom?

Well, since they were the most difficult items, meaning the most students missed them, we should use these items to teach all of our students, right? Clearly everyone had issues with them!

I’m going to disagree with that.

These items were difficult *even for some of our strongest students*. If they struggled, then the last thing I want to do is bring this level of challenge to all of my students, especially those who struggled throughout the test. Rather, I’ll analyze the most difficult items to get ideas to provide challenge to my higher performing students. These kinds of questions are clearly structured in a way that gets them thinking, challenges them, and perhaps even confuses them. That’s good information to know!

(Please don’t misinterpret this as me saying that I don’t want to challenge all students. Rather, I want to ensure all students are *appropriately* challenged, and that’s what I’m trying to identify through this kind of analysis. Read on to see what I mean.)

But what about students who struggled throughout the test? For those students, I’m going to analyze the **least difficult** items. In this case, 14-17% of students in Texas answered even these items incorrectly. These items posed a challenge for quite a number of students, and I want to analyze the items to figure out what made them challenging for these students.

Let’s pretend that this is school data instead of Texas data, and let’s pretend we’re a team of 6th grade teachers analyzing 5th grade data for our 200 6th graders. That would mean at least 28-34 students in our 6th grade did not do well on these least difficult items when they took 5th grade STAAR last spring. That’s a pretty significant number of kids! They could for sure benefit from some form of intervention based on what we learn from analyzing these items.

And that’s where I’m going to leave this in your hands! Here is a document where I’ve collected the most difficult and least difficult items from the 2018 5th grade STAAR. These are the actual test questions along with the percentage of students who selected each answer choice. Spend a little time analyzing them. Here are some questions to guide you:

- What are the features of each question? (How is the question constructed? What are its components and how are they put together in the question?)
- Why do you suppose the features of a given question made it more/less difficult for students?
- What mathematical knowledge and skills are required to be successful with each question?
- What non-mathematical knowledge and skills are required to be successful with each question?
- What can you learn from analyzing the distractors? What do they tell you about the kinds of mistakes students made or the misunderstandings they might have had?
- What lessons can we learn from these questions to guide us in how we support our students? (We don’t want to teach our students
*these exact questions*. That’s not terribly useful since they won’t be taking this exact test again. Rather, seek out general themes or trends that you observe in the questions that can guide your classroom instruction and/or intervention.)

I’ve opened up the document so that anyone can comment. If you’d like to share your thoughts on any of the questions, please do! I look forward to reading your thoughts about the least and most difficult items on the 2018 5th grade STAAR.

I’m giving you a very small set of questions to analyze right now. You may or may not be able to generalize much from them depending on your own experiences analyzing assessment items. However, it’s worth doing regardless of your experience, because now the repertoire of items you’ve analyzed will be that much larger.

As for myself, I’ve been analyzing assessment items like this for several years. What I’d like to do in my next post is share some of the lessons I’ve learned from this analysis across multiple years. I do feel like there are consistent trends (and a few surprises) that can inform our work in ways that simultaneously align with high-quality math instruction (because ultimately this is what I care much more about than testing) while also ensuring students are given the supports they need to succeed on mandatory high stakes tests (because they are a fact of life and it’s our responsibility to ensure students, especially those who are relying on school for this support, are prepared for them).

]]>

When you teach a unit on, say, multiplication, what are you hoping your students will score on an end-of-unit assessment? If you’re like me, you’re probably hoping that most, if not all, of your students will score between 90% and 100%. Considering all the backward designing, the intentional lesson planning, and the re-teaching and support provided to students, it’s not unreasonable to expect that *everyone* should succeed on that final assessment, right?

So what message does it send to teachers and parents in Texas that STAAR has the following passing rates as an end-of-year assessment?

- 3rd grade – Students only need to score 50% to pass
- 4th grade – Students only need to score 50% to pass
- 5th grade – Students only need to score approximately 47% to pass

Wow! We’ve got really low expectations for Texas students! They can earn an F and still pass the test. How terrible!

Comments like this are what I often hear from teachers, parents, administrators, and other curriculum specialists. I used to believe the same thing and echo these sentiments myself, but not anymore.

Last year, our district’s Teaching & Learning department attended a provocative session hosted by Dr. Kevin Barlow, Executive Director of Research and Accountability in Arlington ISD. He challenged our assumptions about how we interpret passing standards and changed the way I analyze assessments, including STAAR.

The first thing he challenged is this grading scheme as the universal default in schools:

- A = 90% and up
- B = 80-89%
- C = 70-79%
- D = 60-69%
- F = Below 60%

The question he posed to us is, “Who decided 70% is passing? Where did that come from?” He admitted that he’s looked into it, and yet he hasn’t found any evidence for why 70% is the universal benchmark for passing in schools. According to Dr. Barlow, percentages are relative to a given situation and our desired outcome(s):

- Let’s say you’re evaluating an airline pilot. What percentage of flights would you expect the pilot to land safely to be considered a good pilot? Hopefully something in the high 90s like 99.99%!
- Let’s say you’re evaluating a baseball player. What percentage of pitches would you expect a batter to successfully hit to be considered a great baseball player? According to current MLB batting stats, we’re looking at around 34%.

It’s all relative.

Let’s say you’re a 5th grade teacher and your goal, according to state standards, is to ensure your students can multiply up to a three-digit number by a two-digit number. And here’s the assessment you’ve been given for your students to take. How many questions on this assessment would you expect your students to answer correctly to meet the goal you have for them?

- 2 × 3
- 8 × 4
- 23 × 5
- 59 × 37
- 481 × 26
- 195 × 148
- 2,843 × 183
- 7,395 × 6,929
- 23,948 × 8,321
- 93,872 × 93,842

If my students could answer questions 1 through 5 correctly, I would say they’ve met the goal. They have demonstrated they can multiply up to a three-digit number by a two-digit number.

**2 × 3 (Meets my goal)****8 × 4 (Meets my goal)****23 × 5 (Meets my goal)****59 × 37 (Meets my goal)****481 × 26 (Meets my goal)**- 195 × 148 (Beyond my goal)
- 2,843 × 183 (Beyond my goal)
- 7,395 × 6,929 (Beyond my goal)
- 23,948 × 8,321 (Beyond my goal)
- 93,872 × 93,842 (Beyond my goal)

Questions 6 through 10 might be *possible* for some of my students, but I wouldn’t want to *require* students to get those questions correct. As a result, my passing rate on this assessment is only 50%. Shouldn’t I think that’s terrible? Isn’t 70% the magic number for passing? But given the assessment, I’m perfectly happy with saying 50% is passing. Expecting an arbitrary 70% on this assessment would mean expecting students to demonstrate proficiency above grade level. That’s not fair to my students.

Some of you might be thinking, “I would never give my students this assessment because questions 6 through 10 are a waste of time because they’re above grade level.” In that case, your assessment might look like this instead:

- 2 × 3
- 8 × 4
- 23 × 5
- 59 × 37
- 481 × 26

It hasn’t changed the expectation of what students have to do to demonstrate proficiency, and yet, to pass this assessment, I would expect students to earn a score of 100%, rather than 50%. Again, I would be unhappy with the arbitrary passing standard of 70%. That would mean it’s okay for students to miss questions that I think they should be able to answer. On this assessment, requiring a score of 100% makes sense because I would expect 5th graders to get all of these problems correct. If they don’t, then they aren’t meeting the goal I’ve set for them.

So why not just give the second assessment where students should all earn 100%? If that’s the expectation, then why bother with the extra questions?

This is exactly the issue going on with STAAR and its *perceived* low passing rate.

When you have an assessment where 100% of students can answer 100% of the questions correctly, all you learn is that everyone can get all the questions right. It masks the fact that some students actually know more than their peers. In terms of uncovering what our learners actually know, it’s just not very useful data.

More useful (and interesting) is an assessment where we can tell who knows more (or less) and *by how much*.

STAAR is designed to do this. The assessment is constructed in such a way that we can differentiate between learners to get a better sense of what they know relative to one another. In order to do this, however, it requires constructing an assessment similar to that 10-item multiplication assessment.

Just like how questions 1 through 5 on the multiplication assessment were aligned with the goal for multiplication in 5th grade, about half the questions on STAAR (16 or 17 questions, depending on the grade level) are aligned with Texas’ base level expectations of what students in 3rd, 4th, and 5th grade should be able to do. That half of the assessment is what we expect *all* of our students to answer correctly, just like we would expect all 5th graders to answer questions 1 through 5 correctly on the 10-item multiplication assessment.

So how do Texas students fare in reality? Here are the numbers of students at each grade level who answered at least half of the questions correctly on STAAR in spring 2018:

- Grade 3 – 77% passed with at least half the question correct (299,275 students out of 386,467 total students)
- Grade 4 – 78% passed with at least half the questions correct (308,760 students out of 397,924 total students)
- Grade 5 – 84% passed with about half of the questions correct (337,891 students out of 400,664 total students)

Not bad! More than three quarters of the students at each grade level demonstrated that they can answer at least half of the questions correctly. These students are meeting, if not exceeding, the base level expectations of their respective grade levels. (Side note: Texas actually says students earning a 50% are Approaching grade level and a higher percentage is called Meets grade level. I’m not going to play with the semantics here. For all intents and purposes, earning a 50% means a student has passed regardless of what you want to call it.) But we’re left with some questions:

- How many of these roughly 300,000 students at each grade level performed just barely above the base level expectations?
- How deep is any given student’s understanding?
- How many of these students exhibited mastery of all the content assessed?

Good news! Because of how the assessment is designed, we have another set of 16 or 17 questions to help us differentiate further among the nearly 300,000 students at each grade level who passed. This other half of the questions on STAAR incrementally ramps up the difficulty beyond that base level of understanding. The more questions students get correct beyond that first half of the assessment, the better we’re able to distinguish not only who knows *more* but also by *how much*.

Since percents are relative and 70% is our culturally accepted passing standard, why isn’t the STAAR designed to use that passing standard instead? It would definitely remove the criticisms people have about how students in Texas pass with an F.

Here are two rough draft graphs I created to attempt to illustrate the issue. Both graphs represent the 3rd grade STAAR which has a total of 32 questions. The top graph is showing a hypothetical passing standard of 70% and the bottom graph is showing the actual passing standard of 50%

The first graph represents a 3rd grade STAAR where 70% is designed to be the passing standard. This means 22 questions are needed to represent the base level of understanding (assuming this assessment also has a total of 32 items). Since we’re not changing the level of understanding required to pass, presumably 300,000 students would pass this version of the assessment as well. That leaves only 10 questions to help us differentiate among those 300,000 students who passed to see by how much they’ve exceeded the base level. That’s not a lot of wiggle room.

The second graph represents the current 3rd grade STAAR where 50% is designed to be the passing standard. This means 16 questions are needed to represent the base level of understanding, but now we have *another* 16 questions to help us differentiate among the 300,000 students who passed. Because there are a number of high performing students in our state, this still won’t let us differentiate completely, but there’s definitely more room for it with the 50% passing standard than the 70% passing standard.

Some points I want to make clear at this point in the post:

- There are definitely issues with an assessment where half of it is
*by design*more difficult than the expectations of the grade level. We have roughly a quarter of students in Texas who can’t even answer the base level questions correctly (half the assessment). Unfortunately, they’re subjected to the full assessment and the base level questions are interspersed throughout. There are a lot of issues around working memory, motivation, and identity that could be considered and discussed here. That’s not what I’m trying to do in this post, however. As I mentioned in my previous post, regardless of how I feel, this is the reality for our teachers and students. I want to understand that reality as best I can because I still have to live and work in it. I can simultaneously try to effect changes around it, but at the end of the day my job requires supporting the teachers and students in my district with STAAR as it is currently designed. - STAAR is trying to provide a mechanism for differentiating among students…in general. However, having analyzed this data at the campus level (and thanks to the expertise of my colleague Dr. David Osman) it’s clear that STAAR is too difficult for some campuses and too easy for other campuses. In those extremes, it’s not helping those campuses differentiate very well because too many students are either getting questions wrong or right.
- This post is specifically about how STAAR is designed. I can’t make any claims about assessments in other states. However, I hope this post might inspire you to dig more deeply into how your state assessment is constructed.
- I’m not trying to claim that every assessment should be designed this way. I’m sharing what I learned specifically about how STAAR is designed. Teachers and school districts have to make their own decisions about how they want to design their unit assessments and benchmark assessments based around their own goals.

In my next post I’m going to dive into the ways I’ve been analyzing assessment data differently this past year. I wanted to write this post first because this information I learned from Dr. Barlow has completely re-framed how I think about STAAR. I no longer believe that the 50% passing rate is a sign that we have low expectations for Texas students. Rather, STAAR is an assessment that is designed to not only tell us who has met grade level expectations, but also *by how much* many of our students have exceeded them. With that in mind, we can start to look at our data in interesting and productive ways.

But what if I told you this well-intentioned practice may be sending us in unproductive directions? Rather than focusing on what our students really need, we may be spending time on topics and/or skills that are not the priority.

Let me illustrate what I mean with a story. I was working with a 4th grade team after a district benchmark we call STAAR Ready. Every spring in my district we give our students a released STAAR to gauge readiness for the actual STAAR coming up in May. Afterward, teams analyze the data to determine which topics to revisit and which students to put into intervention groups.

As I met with this 4th grade team, they showed me a list of the low-performing TEKS (Side note: this is what we call our standards in Texas – the Texas Essential Knowledge and Skills, TEKS for short) they had identified after analyzing the STAAR Ready data. One of the TEKS jumped out at me immediately because I was familiar with the test:

**TEKS 4.4A add and subtract whole numbers and decimals to the hundredths place using the standard algorithm;**

I asked them to tell me more, and the team told me they had identified students who performed poorly on the questions correlated to this standard. They created an intervention group with these students to work on adding and subtracting whole numbers and decimals to make sure they could do these computations accurately.

I followed up with a question, “Have you looked at the actual questions correlated to these TEKS?” Because they were looking at so much data and so many standards, they hadn’t gotten back into the test. Instead they’d just been identifying high-priority TEKS based on student performance on the questions.

I pulled up the test and showed them this question that had immediately come to mind when they told me they were making a group focused on TEKS 4.4A:

Take a moment and analyze the question.

- Can you see how it involves adding and/or subtracting with whole numbers and/or decimals?
- But what other skills are involved in answering this question correctly?
- What features of the problem might have made it more difficult for the students to answer correctly?

As it turns out, this was an incredibly difficult problem for students! When it was given to students on the actual STAAR in spring 2016, only 43% of students across the state of Texas were able to answer correctly. That means 57% of Texas 4th graders, or roughly 209,390 students, couldn’t find the total cost of three items in a shopping basket. That’s…concerning.

In my own school district, we used the 2016 released STAAR as our STAAR Ready in spring 2017. This allowed me to collect data Texas doesn’t make available to everyone. When we gave the test in spring 2017, the problem was nearly as difficult for our students. About 48% of students in my district answered it correctly. I was also able to determine this was the 6th most difficult item on the entire test of 48 questions!

What’s going on? A lot actually, for such a short question. For starters, key information is spread across two sentences. The first sentence of the problem indicates the quantities of items purchased – 1 hat and 2 skirts. The second sentence indicates their prices. This is subtle, but separating that information across two sentences upped the level of difficulty significantly for 9 and 10 year olds. Students who are not reading closely can quickly jump to the conclusion that they only need to add the two prices shown without realizing that one of those prices needs to be used twice.

The second feature of this problem that ups the difficulty is the fact that it is an open response question, not multiple choice. On this kind of question, a student’s answer has to be absolutely 100% accurate. If they’re off by even 1 penny, the answer is marked wrong. No pressure, kids!

I was curious which feature made the problem more difficult for the students in my district, so I dove into the data. One thing I had available that Texas doesn’t release is the actual answers every student submitted for this problem. I was able to analyze roughly 3,600 answers to see what students were doing. Here’s what I found out.

While only 48% of students got this question correct, there was a chunk of students whose answers were in the ballpark. These are kids who likely made a small calculation error. Unfortunately, if I calculate the percent of students who got it right or reasonably close, that only brings it up to 51% of our 4th graders. That’s not terribly impressive.

So what was everyone else doing? Here’s where it gets interesting. I predicted that these students only found the cost of 1 hat and 1 skirt, and it turns out that’s exactly what 33% of students in my district did. Nearly 1,200 students failed to comprehend that the total cost is composed of a hat, a skirt, and another skirt.

Going back to the team I was working with, I asked, “So now that we’ve analyzed this question, do you think the issue is that your students are struggling with adding and subtracting whole numbers and decimals?” We talked about it and they agreed that the bigger issue is how their students read and comprehend word problems.

Looking just at the standards is a very limiting view of analyzing data. There are often many different ways to assess a standard, and if we don’t take the time to look at the exact questions our students interact with, we might be missing critical information. Had this team done an intervention on pure addition and subtraction of whole numbers and decimals their kids would have gotten better at those skills for sure. But is that really what they needed?

Over the past year, I’ve been analyzing assessment data differently than in the past. In follow up posts I’d like to share some of that with you. In the meantime, please dive into your assessments and analyze those questions, not just the standards. You’ll hopefully come away with a truer picture of what’s challenging your students so that you can more accurately target with what and how to support them.

]]>

I devoured it in a couple of days.

Since then I’ve purchased multiple copies for all 34 elementary campuses, led campus and district PD sessions on the critical learning phases, and led a book study with over a hundred math interventionists. The book is so eye opening because it makes tangible and explicit just how rigorous it is for young children to grapple with and learn counting concepts that are second nature to us as adults.

I was so excited for the opportunity to learn from Kathy Richardson in person this summer, and she didn’t disappoint. If you’d like to see what I learned from the institute, check out this collection of tweets I put together. It’s a gold mine, full of nuggets of wisdom. I’ll probably be referring back to it regularly going forward.

As happy as I am for the opportunity I had to learn with her, I also left the institute in a bit of a crisis. There is a HUGE disconnect between what her experience says students are ready to learn in grades K-2 and what our state standards expect students to learn in those grades. I’ve been trying to reconcile this disconnect ever since, and I can tell it’s not going to be easy. I wanted to share about it in this blog post, and I’ll also be thinking about it and talking to folks a lot about it throughout our next school year.

So what’s the disconnect?

Here’s a (very) basic K-2 trajectory laid out by Kathy Richardson:

**Kindergarten**- Throughout the year, students learn to count increasingly larger collections of objects. Students might start the year counting collections less than 10 and end the year counting collections of 30 or more.
- Students work on learning that there are numbers within numbers. Depending on their readiness and the experiences they’re provided, they may get this insight in Kindergarten or they might not. If students don’t have this idea by the end of Kindergarten, it needs to be developed immediately in 1st grade because this is a necessary idea before students can start working on number relationships, addition, and subtraction.

**1st Grade**- Students begin to develop an understanding of number relationships. After a year of work, Kathy Richardson says that typical 1st graders end the year internalizing numbers combinations for numbers up to around 6 or 7. For example, the number combinations for 6 are 1 & 5, 2 & 4, 3 & 3, 4 & 2, and 5 & 1. Students can solve addition and subtraction problems beyond this, but they will most likely be counting all or counting on to find these sums or differences rather than having internalized them.
- Students can just begin building the idea of unitizing as they work with teen numbers. Students can begin to see teen numbers as composed of 1 group of ten and some ones, extending the idea that teen numbers are composed of 10 and some more.

**2nd Grade**- Students are finally ready to learn about place value, specifically unitizing groups of ten to make 2-digit numbers. According to Kathy Richardson, she says teachers should spend as much time as possible on 2-digit place value throughout 2nd grade.
- Students apply what they learn about place value to add and subtract 2-digit numbers. By the end of the year, students typically are at a point where they need to practice this skill – which needs to happen in 3rd grade. It is typically not mastered by the end of 2nd grade.

And here’s what’s expected by the Texas math standards:

**Kindergarten**- Lots of number concepts within 20. Most of these aren’t too bad. The biggest offender that Kathy Richardson doesn’t think typical Kindergarten students are ready for is K.2I
*compose and decompose numbers up to 10 with objects and pictures*. If students don’t yet grasp that there are numbers within numbers, then they are not ready for this standard. - One way to tell if a student is ready is to ask them to change one number into another and see how they react. For example, put 5 cubes in front of a student and say, “Change this to 8 cubes.” If the student is able to add on more cubes to make it 8, then they demonstrate an understanding that there are numbers within numbers. If, on the other hand, the student removes all 5 cubes and counts out 8 more, or if the student just adds 8 more cubes to the pile of 5, then they do not yet see that there are numbers within numbers.
- My biggest revelation with the Kindergarten standards is that students are going to be all over the map regarding what they’re ready to learn and what they actually learn during the year. Age is a huge factor at the primary grades. A Kindergarten student with a birthday in September is going to be in a much different place than a Kindergarten student with a birthday in May. It’s only a difference of 8 months, but when you’ve only been alive 60 months and you’re going through a period of life involving lots of growth and development, that difference matters. It makes me want to gather some data on what our Kindergarten students truly understand at the end of Kindergarten compared to what our standards expect them to learn.

- Lots of number concepts within 20. Most of these aren’t too bad. The biggest offender that Kathy Richardson doesn’t think typical Kindergarten students are ready for is K.2I
**1st Grade**- Our standards want students to do a lot of adding and subtracting within 20. Kathy Richardson believes this is possible. Students can get answers to addition and subtraction problems within 20, but this doesn’t tell us what they understand about number relationships. If we have students adding and subtracting before they understand that there are numbers within numbers, then it’s likely to be just a counting exercise to them. These students are not going to be anywhere near ready to develop strategies related to addition and subtraction. And then there’s that typical threshold where most 1st graders don’t internalize number combinations past 6 or 7. So despite working on combinations to 20 all year, many students aren’t even internalizing combinations for half the numbers required by the standards.
- The bigger issue is place value. The 1st grade standards require students to learn 2-digit place value, something Kathy Richardson says students aren’t really ready for until 2nd grade. And yet our standards want students to:
- compose and decompose numbers to 120 in more than one way as so many hundreds, so many tens, and so many ones;
- use objects, pictures, and expanded and standard forms to represent numbers up to 120;
- generate a number that is greater than or less than a given whole number up to 120;
- use place value to compare whole numbers up to 120 using comparing language; and
- order whole numbers up to 120 using place value and open number lines.

- I’m at a loss for how to reconcile her experience that students in 1st grade are ready to start putting their toes into the water of unitizing as they work with teen numbers and our Texas standards that expect not only facility with 2-digit place value but also numbers up to 120.

**2nd Grade**- And then there’s second grade where students have to do all of the same things they did in 1st grade, but now with numbers up to
*1,200*! Thankfully 2-digit addition and subtraction isn’t introduced until 2nd grade, which is where Kathy Richardson said students should work on it, but they also have to add and subtract 3-digit numbers according to our standards. Kathy Richardson brought up numerous times how 2nd grade is the year students are ready to*begin*learning about place value with 2-digit numbers, and she kept emphasizing that she felt like as much of the year as possible should be spent on 2-digit place value. If the disconnect in 1st grade was difficult to reconcile, the disconnect in 2nd grade feels downright impossible to bridge.

- And then there’s second grade where students have to do all of the same things they did in 1st grade, but now with numbers up to

I’m very conflicted right now. I’ve got two very different trajectories in front of me. One is based on years upon years of experience of a woman working with actual young children and the other is based on a set of standards created by committee to create a direct path from Kindergarten to College and Career Ready. Why are they so different, especially the pacing of what students are expected to learn each year? It’s one thing to demand high expectations and it’s another to provide reasonable expectations.

And what do these different trajectories imply about what it means to learn mathematics? Kathy Richardson is all about insight and understanding. Students are not ready to see…until they are. “We’re not in control of student learning. All we can do is stimulate learning.”

Our standards on the other hand are all about getting answers and going at a pace that is likely too fast for many of our students. We end up with classrooms where many students are just imitating procedures or saying words they do not really understand. How long before these students find themselves in intervention? We blame the students (and they likely blame themselves) and put the burden on teachers down the road to try to build the foundation because we never gave it the time it deserved.

But how to provide that time? That’s the question I need to explore going forward. If you were hoping for any answers in this post, I don’t have them. Rather, if you have any advice or insights, I’d love to hear them, and if I learn anything interesting along the way, I’ll be sure to share on my blog.

]]>