Search

How to make a box plot to graphically depict quartile values in a data set.

The general form for a box-and-whisker plot is really easy. Let's take a simple data set.
 
8.2, 15.9, 12.8, 7.4, 24.7, 23.2, 9.6, 7.9, 8.3, 10.2
 
First, we need to take those data and put them in numerical order.  When we do that, this is what the data set looks like:

7.4, 7.9, 8.2, 8.3, 9.6, 10.2, 12.8, 15.9, 23.2, 24.7
 
[Note: Any computer program that runs spreadsheets or statistical analysis will probably accept the data in any sequence.  Ordering the data is only necessary when doing this process by hand.]

Once ordered, we need to find the median of the set. The median means the "middle" value. In this case, the set has 10 values, so there's no singular "middle" value of the set when ordered least to greatest. To create one, we'll take the two middle values and average them.

(9.6 + 10.2)/2 = 9.9

[The only reason we took an average is because there is not "middle" value in a set with an even number of values. If there were a set of 5 values, for example, the median would be the third value.]

The median of the set is 9.9; this value is also called the 50th percentile.

Next, we're going to divide the set into two halves: upper and lower. Names will vary a bit- you may see "lower and higher" or "lesser and greater" or something else to that effect- remember to be flexible. The lower set is all the values less than the median. The upper set is all the values greater than the median. We want to identify the median value of each individual set.

Lower: 7.4, 7.9, 8.2, 8.3, 9.6

Upper: 10.2, 12.8, 15.9, 23.2, 24.7

The median of the lower subset is called the 25th percentile and the median of the upper subset is called the 75th percentile.  Again, the median is the "middle" value.  In a data set of five values, the median is the third value when organized by size.  The median values of the upper and lower ranges are colored in orange.

At this point, get out a sheet of graph paper and set up it up as appropriate to the scale of the data set. Because box plots are usually used for categorical or nominal independent variables, you only need to set up one axis. Four our data set, which runs from 7.4 to 24.7, a scale of 0 to 30 would be appropriate.

So far, we've identified three values- the 25th, 50th, and 75 percentiles (which are 8.2, 9.9, and 15.9 respectively). Mark these three points on your graph off of the labeled axis (this is a single-dimension graph, so data should be marked next to the axis, not directly on it). Next, draw a rectangle that runs from the 25th to 50th percentiles (8.2 to 9.9) and another rectangle that runs from the 50th to 75th percentiles (9.9 to 15.9). These rectangles will naturally share a common side. The range of values inside these boxes are the second and third quartiles respectively. The combined area within both boxes is referred to as the interquartile range.

At this point you've made the "box" portion of your box and whiskers plot. Make a dash-mark for the lowest and highest values in the data set: 7.4 and 24.7. Connect each dash to the boxes with a line. These are called the whiskers, and mark the range of the 1st and 4th quartiles.

For the purposes of most HS and intro college classes, you're done. However, there are some advanced techniques you should be aware of.
  • Some graphs will calculate the mean (average) of the data set and mark that point with an ×. This is almost always found within the interquartile range, but not always.
  • Some graphs choose to run the "whiskers" not to the lowest and highest values, but to some other values. Data points outside the whiskers will be marked with dots and are considered outlier values- values not representative of the data set. In this case, the description of the graph will always specify what values the whiskers indicate. For the purposes of an exam, always run the whiskers to the lowest and highest values in the data set unless you're told to do otherwise.

You can see an example of a box and whiskers plot with all of these features labeled here.

Not shown are outlier values (dots, as previously mentioned) or the axis, which would typically be located to the left of the plot.  Keep in mind that there is only one axis because box plots show data ranges that are within distinct categories. The quartiles are also not properly marked. Remember that the box plot indicates four ranges: from the 1st-25th percentile, from the 25th-50th percentile, from the 50th-75th percentile, and from the 75th to 99th percentile. The 1st through 4th quartiles refer to these four ranges in order.

$39p/h

Martin S.

Academic Coaching from a Certified HS Teacher

20+ hours
if (isMyPost) { }