# Central Tendency – Which Measure is Best?

### Written by tutor Yvonne W.

To understand a set of data, it is helpful to organize it and provide summary
descriptions of the set. Central tendency measures are used to describe the
middle value of a data set. There are (at least) three different ways to describe
the middle value: mean, median and mode. Which method you use depends on
the characteristics of the data set and how you plan to use the information. Let us
explore this a bit more. Before we get started, please refer to Table 1 for a review
of the definitions for mean, median and mode.

Table 1 : Review of Central Tendency Measures

 Data Set – Weight in lbs. of bags of sand: 10, 8, 4, 9, 5, 5, 10, 3, 10, 2, 5 Measurement Definition Example Mean Sum of values/Total number of items in data set 71/11 = 6.4545 lbs. Median The middle value in a set when the numbers are arranged in order. If the set has an even number of items then the median is equal to the mean of the middle two items. 2, 3, 4, 5, 5, 5, 8, 9, 10, 10, 10 Mode The most frequently occurring value in the set. It is possible to have no mode, one mode, or more than one mode. 5 lbs. and 10 lbs.

### Criterion 1: Choice driven by intended use of measurement

Let’s consider a candy shop that sells mints, chocolate and taffy. In this case, what you
want to know about your sales will influence the measure you select to describe your data. For example, you would use the mode if you wanted to know the most
popularly sold item. The mode is generally used to describe the most common or most popular item in the data set. You would also choose this measure if you
wanted to know the maximum number of customers waiting for service per day or the day of the month where you had the most product become stale.

The mean would be selected if you wanted to know how much money your shop
collected per customer this week. The mean is used when you want to know the
average value in a set of values. This number represents the value that will produce
the lowest amount of error from all other values in the data set each time you take
the measure, run the test or ask the question. Other examples for using the mean
include the average number of boxes of chocolate sold each year around Valentine’s
Day or the usual number of hours an employee works in the month of December.

If you wanted to describe how much money a typical customer spent at the
candy shop you would use the median. Median is chosen when you want to be sure
that the number represents the midpoint in a list of values. This measure is used
often in survey research. Let’s say that you run a customer satisfaction survey to
determine how successful you are in generating repeat business. You already know
of 10 for total satisfaction. If you surveyed 10 customers, you would want to be
sure that at least 50% of your customers gave you a rating of 4 or higher. To be
sure that you have an accurate picture of customer’s opinions, you would want to
know the median satisfaction rate of the customers surveyed.

### Criterion 2: Choice driven by characteristics of the data collected

Characteristics of the data being measured will sometimes drive your choice of
measurement. These characteristics are summarized in Table 2 below.

Table 2 : Data Characteristics

 Characteristic Definition Examples Numerical values discrete Describes data with a finite set of values or data which can only take certain values. This is data which can be counted. Number of coins in a purse (you can’t have half a coin) Number of customers in a store (can’t have half a person) Values of 0, 1, 2, 3, 4, 5 for a satisfaction survey (finite set of values) Number of putts it takes to score a hole-in-one (the possible values are infinite, but you can only putt 1 time or 2 times, etc.) continuous Describes any data that can take on any value in a range. This is data that is measured. Time to drive home Height of a tree A person’s weight Sample distribution normal spread Data is evenly (or almost evenly) distributed about a central value. Distribution looks like a bell curve. Height of people SAT scores Annual average temperatures skewed spread Data has more higher values than lower values or vice versa, more lower values than higher values. Or the data can be mixed up. Employees’ salaries Opinion surveys Age distribution of respondents to a day-time residential phone survey Home values Data type nominal Categorical data Eye color (brown, blue, green, etc.) Tastes (bitter, sweet, sour, salty, etc.) ordinal Data where order has meaning, but the interval doesn’t matter Finish positions in a race (doesn’t matter if you are first “by a nose” or “distanced”) Your position in a line (wait time for each person can vary) interval Ordinal data where the intervals between each value are equally split; zero does not mean none Temperature (zero does not mean no temperature) Longitude and latitude ratio Interval data with zero meaning none Time Weight Distance

## Mode

Mode is best used with categorical (nominal) or discrete data. It is
difficult to use it with continuous data because often a single value is not repeated
exactly. There often are one or two distinct favorites in categorical or discrete
data. Mode has a drawback in that it may not be a measure of centrality if the most
common item is away from the rest of the data set.

## Mean

Measure is most often chosen when the data is continuous and
symmetrical (normal). If the data has outliers or is skewed, then the mean would
paint a skewed view of centrality. Mean should be used carefully with ordinal data.
For instance, the mean placement of all the runners in an eight-person race will
always be 4.5 and as such really does not deliver meaningful information. Mean
is best used with interval data or ratio data. It is chosen when it is important to
reduce the amount of error in a prediction.

## Median

Median is especially useful with skewed distributions as it draws the
line right in the middle of your data set. It provides a better measure of centrality
as 50% of your data is above the median. Median can be used with interval or ratio
data. Median is usually the preferred measurement to use with ordinal data.

## Representing Data Practice Quiz

Problem 1: Use the data below to answer the following questions.
During the past week the candy shop sold 25 boxes of chocolate, 18 boxes of
mints and 40 boxes of taffy. There were five customers—Customer A spent
\$93, Customer B spent \$152, Customer C spent \$219, Customer D spent
\$108 and Customer E spent \$123.

What was the most popular item?

A.
Chocolate
B.
Mints
C.
Taffy
The correct answer here would be C. Mode: chocolate=25, mints=18, taffy=40.

How much did a typical customer spend?

A.
\$93
B.
\$123
C.
\$219
The correct answer here would be B. Median: \$93, \$108, \$123, \$152, \$219

What was the average amount collected per customer?

A.
\$93
B.
\$152
C.
\$139
The correct answer here would be C. Mean: (\$93 + \$108 + \$123 + \$152 + \$219) / 5 = \$139

Problem 2:

Select the measure you would use to discover the most

A.
Mean
B.
Median
C.
Mode
The correct answer here would be C. Mode, the data is categorical and I want to know the most frequently
purchased sandwich.

Problem 3: Use the data below to answer the following questions.
There are 3 bags containing seven checks each.
Bag A has \$5, \$10, \$20, \$20, \$50, \$50, \$125
Bag B has \$5, \$10, \$20, \$20, \$75, \$75, \$75
Bag C has \$10, \$10, \$10, \$50, \$50, \$50, \$100

Choose the best bag using the mean, median or mode to make your choice.

Which bag has the most total cash?

A.
Bag A
B.
Bag B
C.
Bag C
D.
All of the bags have the same total cash
The correct answer here would be D.

If you removed the highest check from each bag and kept the remaining
checks, which bag would have the most cash?

A.
Bag A
B.
Bag B
C.
Bag C
D.
All of the bags would still have the same total cash
The correct answer here would be B. Bag B would have the most money because its mean is the highest.

 Choices Mean of 6 Items Bag A \$25.83 Bag B \$34.17 Bag C \$30

Critical Thinking: You can draw one bill from the bag of your choice to keep. Which bag
would you choose to draw from?

Depends on which measure you used.
If you chose to use the mode, the answer is Bag B. This bag has
more \$75 checks than any other check in that bag. I am more likely to draw
\$75 than any other check value. For bag A, I would have an equal chance to
draw a \$20 or a \$50 bill, both whose value is lower than the most frequently
occurring bill in bag B. Likewise, for bag C, I would most likely draw a \$10 or
a \$50 bill.

If you chose to use the median, the answer is Bag C because you
would have a 50% chance of drawing a bill of \$50 or more. Whereas, for
bags A and B, you would have a 50% chance of drawing a bill of \$20 or
more. Bag C has more bills greater than \$20 than either of the other two
bags.

Scroll to Top