Data
Data can be defined as groups of information that represent the qualitative or
quantitative attributes of a variable or set of variables, which is the same as
saying that data can be any set of information that describes a given entity. Data
in statistics can be classified into grouped data and ungrouped data.
Any data that you first gather is ungrouped data. Ungrouped data is data in the
raw. An example of ungrouped data is a any list of numbers that you can think of.
Grouped Data
Grouped data is data that has been organized into groups known as classes. Grouped
data has been ‘classified’ and thus some level of data analysis has taken place,
which means that the data is no longer raw.
A data class is group of data which is related by some user defined property. For
example, if you were collecting the ages of the people you met as you walked down
the street, you could group them into classes as those in their teens, twenties,
thirties, forties and so on. Each of those groups is called a class.
Each of those classes is of a certain width and this is referred to as the Class
Interval or Class Size. This class interval is very important when
it comes to drawing Histograms and Frequency diagrams. All the classes may have
the same class size or they may have different classes sizes depending on how you
group your data. The class interval is always a whole number.
Below is an example of grouped data where the classes have the same class interval.
Age (years) | Frequency |
---|---|
0 – 9 | 12 |
10 – 19 | 30 |
20 – 29 | 18 |
30 – 39 | 12 |
40 – 49 | 9 |
50 – 59 | 6 |
60 – 69 | 0 |
Solution:
Below is an example of grouped data where the classes have different class interval.
Age (years) | Frequency | Class Interval |
---|---|---|
0 – 9 | 15 | 10 |
10 – 19 | 18 | 10 |
20 – 29 | 17 | 10 |
30 – 49 | 35 | 20 |
50 – 79 | 20 | 30 |
Calculating Class Interval
Given a set of raw or ungrouped data, how would you group that data into suitable
classes that are easy to work with and at the same time meaningful?
The first step is to determine how many classes you want to have. Next, you subtract
the lowest value in the data set from the highest value in the data set and then
you divide by the number of classes that you want to have:
Example 1:
Group the following raw data into ten classes.
Solution:
The first step is to identify the highest and lowest number
Class interval should always be a whole number and yet in this case we have a decimal
number. The solution to this problem is to round off to the nearest whole number.
In this example, 2.8 gets rounded up to 3. So now our class width will be 3; meaning
that we group the above data into groups of 3 as in the table below.
Number | Frequency |
---|---|
1 – 3 | 7 |
4 – 6 | 6 |
7 – 9 | 4 |
10 – 12 | 2 |
13 – 15 | 2 |
16 – 18 | 8 |
19 – 21 | 1 |
22 – 24 | 2 |
25 – 27 | 3 |
28 – 30 | 2 |
Class Limits and Class Boundaries
Class limits refer to the actual values that you see in the table. Taking an example
of the table above, 1 and 3 would be the class limits of the first
class. Class limits are divided into two categories: lower class limit and upper
class limit. In the table above, for the first class, 1 is the lower class
limit while 3 is the upper class limit.
On the other hand, class boundaries are not always observed in the frequency table.
Class boundaries give the true class interval, and similar to class limits, are
also divided into lower and upper class boundaries.
The relationship between the class boundaries and the class interval is given as
follows:
Class boundaries are related to class limits by the given relationships:
As a result of the above, the lower class boundary of one class is equal to the
upper class boundary of the previous class.
Class limits and class boundaries play separate roles when it comes to representing
statistical data diagrammatically as we shall see in a moment.