Part 2: 3 Small Tweaks to Set Students Up for More Success with Data in the New Year
Yeah, the next small adjustment to your teaching with data for the new year is here!
As a reminder, we want to share three small tweaks that you can use NEXT WEEK regardless of whatever curriculum you are using AND regardless of whatever ways your students are graphing (as a note see “How Are/Should We Make the Graph?” blog post on ways to graph and “Benefits & Limitations of Different Graphing Tools” blog post on our perspective on graphing tools).
In this series we are discussing three tweaks and what it could look like in your classroom. To see the first tweak check out “Tweak #1: Move away from one&done graphing” in our 12/1/22 blog post.
Tweak #2: Don’t average your data BEFORE you graph it
So often we ask kids to take 35 measurements when they are collecting data (here Sponge Number 13 —>).
Why? Because replicates!
That it is HUGELY important and a key part of doing science. We recognize that the more data values we have (aka the larger our sample size) the better.
Whoot! Whoot!
But then a funny thing happens on the way to graphing our hard collected data...
So often our students take all of those beautiful replicates and AVERAGE them (<— and make a bar chart of the averages)!
Dun, Dun, Dun.
What?! We just averaged away the nuance and richness of the dataset. They lose the ability to understand what is in the dataset before they even have a chance to look at the data.
We are taking SO much precious time for them to collect those data, but we are not gaining any of the benefits of that data for their making sense of it. Let’s explore why this is an issue with an another example…
In this example we are going to pretend that we have gone on a road trip to Yellowstone National Park. But despite our best efforts we arrived just after Old Faithful erupted. Ugh!
Fortunately there is a helpful Park Ranger who shares some data with us so we can figure out when the next eruption may be.
Great, data! But wow this is a weird looking data table.
A bit of orientation: each row is a day and each column communicates the number of minutes between eruptions that day.
The Park Ranger reminds us that Old Faithful has its name because…it is very faithful. So although these data are from 1990 we can still use it to figure out “When will the next eruption be?”
Ok, so how could we use these data to answer our question?
>>> PAUSE to consider <<<
Or, maybe we take an average of all the values (either overall or taking an average of the daily averages *more on that below), which gives us an average of 72 minutes.
So, maybe we should head to another part of the park and come back in 70 minutes to see the next eruption…
But when we plot all the eruption time data we can quickly see that Old Faithful is bimodal. There is one set of eruptions around 50 minutes and another set of eruptions around 78 minutes.
So, if we aim for the average we either will miss the first set (which would be a bummer) or have to wait around for the second set (not the worst, but takes time from something else).
**Note, the fact that Old Faithful is bimodal is interesting geologically. There are activities about predicting timing here: https://www.nps.gov/yell/learn/photosmultimedia/indepthpredictingoldfaithful.htm.
When it comes to this example, it isn’t the end of things that our use of the average was off…but it highlights why taking an average of our data BEFORE we visually look at our data can be problematic.
Here are some of the hiccups…

Just looking at the average of Old Faithful eruptions means that we miss the nuanced and richness of the dataset that it erupts around two time frames, neither of which are the average. This is interesting to think about (let alone could influence your planning at the park ;)) and could lead to lots of questions and things to explore within and from the data. Meaning by just looking at the average we prematurely cut the data exploration, and learning about the phenomenon, short.

An average only provides an accurate summary of a variable if the values of that variable are normally distributed around one mode (aka in a “bell curve”). That is why technically before we take an average we are supposed to always check for normality. Now, we DO NOT need to teach out students how to calculate this. BUT we can teach them to plot the data to visually look at the data. As soon as we plotted all the eruption times for Old Faithful it quickly became apparent that the eruption times are not normally distributed. Great, then we should use mode (or in this case modes) to summarize the data instead of the mean (average). The mode is more accurate as the mean is misleading. Also, this is a great way to help students better understand in application why we learn “median, mode, and mean” in math classes :)
Why did I want to share this example?
Because think about how often it is that the FIRST thing that we do with lots of data is take the average.
SO. MANY. TIMES.
Ok, but why all the capitalized words and periods? Well, because as we have explored with this example of Old Faithful 1) we are taking away the richness of a dataset if we average away all the data which limits the kinds of questions and exploration our students can do right out of the gates with data, and 2) it is actually incorrect for us to take an average if the data values are not normally distributed (so we are actually teaching bad habits to our students).
As in, we are expecting them to gain an understanding of the process of working with data and the process of doing science, but we are not actually setting them up for success when we average away the data without letting them look at all the data they have nor with having them consider the values before making calculations from our data.
Now, I am NOT suggesting that your students need to dive into calculating tests for normality.
But, I AM suggesting that your students could graph the data values for each variable individually (as a note this skill and these dot/line plots are a Common Core Math 6th grade standard), talk about it with a partner, and then decide whether they should calculate an average to summarize the data and help make sense of the data. Even that those two small extra steps of 1) talking to someone else while you are in the middle of working with the data, and 2) reviewing the data values before taking an average would be HUGE in terms of helping students understand far better what goes into working with data and doing science.
So my next challenge to you (should you choose to accept it :)) is: Where next week/month/calendar year can you have your students visualize their data values BEFORE they calculate an average to explore it?
Share your thoughts, comments, wins, and flops! We would love to hear.