Hello James,
Stem-and-leaf plots are used to provide visually powerful representations of value frequencies in a collection of data, most often as a supplement to some other form of data reporting. I have known many people who found themselves confused by stem-and-leaf plots, but once you understand how they work, they are very useful. The most important thing to keep in mind is that their usefulness is restricted to situations where the frequency of occurrence of values in the data set is relevant and/or important.
Say you're teaching a class, and you administer a test. After words, you want to analyze the performance of your students, in an effort to judge the difficulty of the test, in case it warrants a curve. The grades on the test are as follows (we're assuming no partial points here):
75, 35, 60, 85, 83, 64, 99, 45, 79, 81, 82, 50, 89, 67, 91, 94, 97, 87, 86, 71, 75, 54, 69
You could base your analysis on the common statistical analyses (mean, median, and mode), but in case like this, the mean (74.69 rounded up to 75) is slightly skewed because of the two outlying and extremely low grades, the median (79) would give a fair representation but fail to show an important detail (see below), and the mode (75) simply wouldn't be representative at all. You could also plot out the data in a histogram, but that would be unduly time consuming and complex. The stem-and-leaf plot (at least in cases such as this) corrects for all of these short-comings.
For our data the resulting stem-and-leaf plot looks like this:
stem | leaf
3 | 5
4 | 5
5 | 0 4
6 | 0 4 7 9
7 | 1 5 5 9
8 | 1 2 3 5 6 7 9
9 | 1 4 7 9
key: 7 | 5 means 75
[Herein lies the power of the stem-and-leaf: Just by looking at the plot we can be certain that the test in question was not "too hard", as most of the grades hover around the 'C' mark. We can also see (what the median would fail to show) that the highest concentration is in the 'B' range. If anything, it might make sense to make the test a little more difficult.]
If you're having trouble performing the construction of a stem-and-leaf plot from a data-set, do the following:
- Take all the data and arrange them in a reverse hierarchy (lowest to highest)
- Split each two-digit number into the number in the 'tens' position and the number in the 'ones' position
- Populate the stem column with the 'tens', starting with the first relevant 'ten' (in our case 35; since there were no grades below 35 we need not included 0-2 in the stem)
- Populate the leaf column with the instances of 'ones' by their respective 'ten' and in reverse hierarchy, making certain to include all repetition. (I have found this last point tends to be a stumbling block for many.)
Though you can perform most of these operations in your mind, if you don't feel comfortable doing so you can process the data on paper (until you do feel comfortable) like this:
First, put the data in reverse hierarchy, in this case:
75, 35, 60, 85, 83, 64, 99, 45, 79, 81, 82, 50, 89, 67, 91, 94, 97, 87, 86, 71, 75, 54, 69
turns into:
35, 45, 50, 54, 60, 64, 67, 69, 71, 75, 75, 79, 81, 82, 83, 85, 86, 87, 89, 91, 94, 97, 99
Then group the data with regard to the digits in the 'tens' position:
[35] [45] [50, 54] [60, 64, 67, 69] [71, 75, 75, 79] [81, 82, 83, 85, 86, 87, 89] [91, 94, 97, 99]
Then split each number into its respective 'tens' and 'ones' digits:
[3|5] [4|5] [5|0, 5|4] [6|0, 6|4, 6|7, 6|9] [7|1, 7|5, 7|5, 7|9] [8|1, 8|2, 8|3, 8|5, 8|6, 8|7, 8|9] [9|1, 9|4, 9|7, 9|9]
Now cross-out the extraneous 'tens' values:
[3|5] [4|5] [5|0, 5|4] [6|0, 6|4, 6|7, 6|9] [7|1, 7|5, 7|5, 7|9] [8|1, 8|2, 8|3, 8|5, 8|6, 8|7, 8|9] [9|1, 9|4, 9|7, 9|9]
Then create your stem from the remaining 'tens' values (underlined):
[3|5] [4|5] [5|0, |4] [6|0, |4, |7, |9] [7|1, |5, |5, |9] [8|1, |2, |3, |5, |6, |7, |9] [9|1, |4, |7, |9]
And populate the leaf with its respective 'ones' values (again underlined):
[3|5] [4|5] [5|0, |4] [6|0, |4, |7, |9] [7|1, |5, |5, |9] [8|1, |2, |3, |5, |6, |7, |9] [9|1, |4, |7, |9]
And poof, now Bob's yer uncle.
A few notes:
- It is technically correct to provide a key at the bottom of the stem-and-leaf, as in the initial presentation above, however this is generally considered necessary when the data has some ambiguity which might lead to its being misread or misinterpreted, say when you are plotting decimals, or plotting 'hundreds' in the stem and 'tens' in the leaf because all of the 'ones' positions are populated by zeroes.
- When using a stem-and-leaf to plot decimals, you simply make the decimal the point of division (when reasonable, you round to the nearest tenth to make life easier).
- You can also plot two leaves off of the same stem, which can allow for interesting correlational analysis.
I hope I have answered you question completely.
-Dennis