Web Lesson 13: Measures of Spread I
Measures of Spread
There are three different measures of
spread:
Range
(often used with together with the mode)
The range tells us the difference between the highest and
lowest values in the data
|
Inter-quartile Range (often
used with together with the median)
The inter-quartile range tells the the range for the
central 50% of the data. This is more useful as it excludes
the extreme values which make up the range
|
Standard Deviation (often
used with together with the mean)
The standard deviation is a more complicated. It it used a
lot in 'A' level statistics. In general, it tells us the range
for the central 68% of the data
|
Each one is worked out in a different way:
Finding the Range
1. If you have a list of data:
-
Find the highest value in the data
-
Find the lowest
value in the data
-
The range is found by subtracting
these
e.g. Find the range of this data: 2, 4, 6, 5, 3, 5 ,3, 2, 7, 3, 8, 4, 3, 6, 3, 4, 2, 3
Here we have a "list of data"
We can see that the highest value is '8'
And the lowest value is '2'
So the range is 8 - 2 = 6
Note: The range can also be written as '2 8', meaning the values lie between 2 & 8
|
2. If you have a table of ungrouped
data:
-
Label the rows as x (for the values) and
f (for the frequencies)
-
Subtract the highest value of 'x' from the lowest
value value of 'x'
(Note: First cross out any classes
where the frequency is zero)
e.g. Find the range of this data:
No of Wins
|
0
|
1
|
2
|
3
|
4
|
frequency
|
12
|
27
|
25
|
13
|
2
|
We have a "table of ungrouped data"
The highest value is '4'
The lowest value is '0'
So, the range is 4 - 0 = 4
Note: The range can also be written as '0 - 4', meaning the values lie between 0 & 4
|
3. If you have a table of grouped
data:
-
Once data is grouped we can not find the range any
more!
e.g. Find the range of this data:
Voltage
|
5.6 5.8
|
5.8 5.9
|
5.9 6.0
|
6.0 6.1
|
6.1 6.4
|
frequency
|
20
|
20
|
80
|
50
|
30
|
We have a "table of grouped data"
It is not possible to find the range because, in the 1st class which is '5.6-5.8',
we don't know what the exact value of the smallest number was
We might instead use the 10th percentile (P10) and the 90th percentile (P90)
as alternatives
╘══════════╤══════════╛ ╘══════════╤═══════════╛
╒═════════╧═════════╕ ╒═════════╧══════════╕
The 10th Percentile The 90th percentile
is the n/10th value is the 9n/10th value
╘═══════════════════╛ ╘════════════════════╛
These can be found using the same interpolation method as you learnt to find the median
|
Question 1: A class of 12 Math'scool students was asked
how many questions do they they think should be set for
homework:
23, 28, 21, 20, 31, 29, 26, 31, 33, 37, 48, 34
Find the range for these data:
Clues: Sorry - no clues for this rather easy question
Question 2: The number of 'sick days' taken by 200
employees in 2001 and in 2002 are shown in the table:
No of sick days
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
frequency (in 2001)
|
5
|
12
|
13
|
45
|
52
|
30
|
13
|
15
|
10
|
5
|
frequency (in 2002)
|
0
|
15
|
22
|
23
|
35
|
35
|
30
|
20
|
10
|
10
|
(a): Find the range for the number of sick
days in 2001
(b): Find the range for the number of sick
days in 2002
Clues: Sorry - you still can't have any help
Question 3: The salaries of the 200 workers was also
recorded:
Salary (£1000s)
|
10-15
|
15-20
|
20-25
|
25-30
|
30-40
|
40-60
|
frequency
|
10
|
55
|
68
|
38
|
17
|
2
|
Explain why is it not possible to find the range for these
data
Clues: Again, sorry - but no-can-do
Finding the Inter-quartile Range
1. If you have a list of data:
e.g. Find the quartiles of this data and find the I.Q.R: 1.3, 1.2, 1.4, 1.5, 1.2, 1.6, 1.5, 1.8, 1.5, 2.0, 1.7
-
Re-write all of the values, in numerical order and
count how many there are:
1.2, 1.2, 1.3, 1.4, 1.5, 1.5, 1.5, 1.6, 1.7, 1.8, 2.0
└ 11 values in the data ┘
n=11
-
The lower quartile (Q1)
is the (Όn+½)th
value; so work out (Όn+½)
and then count across your
list to find that value:
Sometimes, we need to find a value that ends in '½';
such as the 12½th value
In this case, use:
½(12th value + 13th value)
But if we need to find a value that ends in 'Ό'; such
as the 3Όth value
In that case, we round DOWN
and find the 3rd
value instead
|
So Q1 is the (Ό(11)+½)th value = 3Όth value (we round this to the 3rd value):
┌ 3rd value
▼
1.2, 1.2, 1.3, 1.4, 1.5, 1.5, 1.5, 1.6, 1.7, 1.8, 2.0
↓
Q1 is 1.3
-
The upper quartile (Q3)
is the (Ύn+½)th
value; so work out (Ύn+½) and then count across your
list to find that value:
Sometimes, we need to find a value that ends in '½';
such as the 12½th value
In this case, use:
½(12th value + 13th value)
But if we need to find a value that ends in '¾'; such
as the 8¾th value
In that case, we round UP
and find the 9th
value instead
|
So Q3 is the (Ύ(11)+½)th value = 8Ύth value (we round this to the 9th value):
┌ 9th value
▼
1.2, 1.2, 1.3, 1.4, 1.5, 1.5, 1.5, 1.6, 1.7, 1.8, 2.0
↓
Q3 is 1.7
-
I.Q.R. = Q3 - Q1
So the Inter-quartile range: I.Q.R = 1.7 - 1.3 = 0.4
|
2. If you have a table of ungrouped
data:
e.g. Find the quartiles and the inter-quartile range of this
data:
No of Wins
|
0
|
1
|
2
|
3
|
4
|
frequency
|
12
|
27
|
25
|
13
|
2
|
-
Write a Cumulative Frequency Table. The last number in
the F row is called
n
Upper Boundary
|
up to 0
|
up to 1
|
up to 2
|
up to 3
|
up to 4
|
Cumulative
Frequency
|
12
|
12+27=
39
|
12+27
+25=
64
|
12+27
+25+13=
77
|
12+27
+25+13
+2=
79
|
-
The lower quartile (Q1) is the
(Όn+½)th
value; so work out (Όn+½) and then look along the
cumulative frequencies for this number (or above)
Read across to the value of x. This
is Q1
Upper Boundary
|
up to 0
|
up to 1
|
up to 2
|
up to 3
|
up to 4
|
Cumulative
Frequency
|
12
|
12+27=
39
|
12+27
+25=
64
|
12+27
+25+13=
77
|
12+27
+25+13
+2=
79
|
▲
Since we know n = 79 ╚════════════════════╗
To find the Q1, we look up the 20th value ║
i.e. the (Ό(79)+½)th value = 20Όth value ≈ 20th value ║
║
╒═══════════════════╩══════════════════╕
│ Since 20 is NOT THERE, we must look │
│ up the next number ABOVE 20 in the │
│ cumulative frequencies (which is 39) │
╘══════════════════════════════════════╛
So, Q1 is '1'
-
The upper quartile (Q3) is the (Ύn+½)th
value; so work out (Ύn+½) and then look along the
cumulative frequencies for this number (or above)
Read across to the value of x. This
is Q3
Upper Boundary
|
up to 0
|
up to 1
|
up to 2
|
up to 3
|
up to 4
|
Cumulative
Frequency
|
12
|
12+27=
39
|
12+27
+25=
64
|
12+27
+25+13=
77
|
12+27
+25+13
+2=
79
|
▲
Since we know n = 79 ╚═════════╗
To find the Q3, we look up the 60th value ║
i.e. the (Ύ(79)+½)th value = 59Ύth value ≈ 60th value ║
║
╒═══════════════════╩══════════════════╕
│ Since 60 is NOT THERE, we must look │
│ up the next number ABOVE 60 in the │
│ cumulative frequencies (which is 64) │
╘══════════════════════════════════════╛
So, Q3 is '2'
-
I.Q.R. = Q3 - Q1
So the Inter-quartile range: I.Q.R = 2 - 1 = 1
|
Note: Strictly speaking, if n
= 79, then Q1 is the (½(100)+½)th
value (as we did above) - but in practice, when n is
large (bigger than 30)
then the difference between using ¼n+½
and just ¼n
isn't really worth bothering with...
Similarly, for Q3, if n is bigger than 30, then just use: ¾n
|
|
3. If you have a table of grouped data:
e.g. What with all this extra tuition, the amount of stuff a
student has to carry around is too much!
To investigate this, the weights of the school bags of a class
of students were measured:
Mass
|
5 15
|
15 25
|
25 30
|
30 40
|
40 50
|
frequency
|
8
|
10
|
9
|
10
|
3
|
Estimate the I.Q.R. of this data
Rather than use a
"Cumulative Frequency Curve" (as we did at G.C.S.E. level) we can use
the same Interpolation
method that we used to find the median:
-
Write a Cumulative Frequency Table (remember to use the upper class boundary of each
class). The last number in this row is called
n
Mass (U.C.B.)
|
15
|
25
|
30
|
40
|
50
|
cumulative frequency
|
8
|
18
|
27
|
37
|
40
|
-
The lower quartile (Q1)
is the (Όn+½)th
value; so we work out what (Όn+½)
gives:
╒═══════════════════════════════════════════╕
n = 40 ╔══════════════════════════╡ If n is larger than 30 then we can chose │
▼ │ whether we want to use: (Όn + ½)th value │
Ό(n) = Ό(39) ≈ 10 th value │ or just: Όn th value │
│ in this case, it was easier to use: Όn │
╘═══════════════════════════════════════════╛
-
Squeeze an extra column
into our table to help us find this value:
Mass (U.C.B.)
|
15
|
Q1
|
25
|
30
|
40
|
50
|
cumulative frequency
|
8
|
10
|
18
|
27
|
37
|
40
|
● We need to find some differences using our
table: 'Δ1', 'Δ2',
'D1' & 'D2' need to
be found
|
╔═══════ D2=10 ══════╗
|
|
|
|
|
╔═ D1=??? ═╗
|
|
|
|
|
Mass (U.C.B.)
|
15
|
Q1
|
25
|
30
|
40
|
50
|
cumulative frequency
|
8
|
10
|
18
|
27
|
37
|
40
|
|
╚═ Δ1= 2 ═╝
|
|
|
|
|
|
╚═══════ Δ2=10 ══════╝
|
|
|
|
● Among these differences, only D2 is unknown -
but it can be found using:
D1 = Δ1
D2 Δ2
=> D1 = 2
10 10
=> D1 = 2
● D1 tells us what to add to
the class to the left of the median to estimate the
median:
|
╔══ + 2 ══╗
║ ▼
|
|
Mass (U.C.B.)
|
15
|
17
|
25
|
30
|
40
|
50
|
cumulative frequency
|
14
|
10
|
18
|
27
|
37
|
40
|
● If the data is discrete, then round the answer
-
Find Q3 in a similar way
To find Q3, we need to locate the: Ύnth value = 30th value
Start by inserting a row at F = 30:
Mass (U.C.B.)
|
15
|
25
|
30
|
Q3
|
40
|
50
|
cumulative frequency
|
8
|
18
|
27
|
30
|
37
|
40
|
Δ1 = 30 - 27 = 3
Δ2 = 37 - 27 = 10 => D1 = 3
D1 = ??? 10 10
D2 = 10
=> D1 = 3
So: Q3 (estimate) = 30 + 3 = 33
-
I.Q.R. = Q3 - Q1
So the Inter-quartile range: I.Q.R = 33 - 17 = 16
|
Question 4: Find the quartiles and I.Q.R for these data: 23,
30, 29, 20, 22, 29, 27, 23,
27, 24
Clues: This is a "LIST OF DATA"
Start by writing the list in order:
20, 22, 23, 23, 24, 27, 27, 29, 29, 30
└ 10 values in the data ┘
n=10
So Q1 is the (Ό(10)+½)th value = 3rd value:
┌ 3rd value
▼
20, 22, 23, 23, 24, 27, 27, 29, 29, 30
↓
Q1 is 23
So Q3 is the (Ύ(10)+½)th value = 8th value:
Counting across the list, the 8th value is:
, so Q3 is
So the Inter-quartile range: I.Q.R =
- 23 = ...
Question 5: A class of 12 Math'scool students was asked
how many questions they think should be set for
homework:
23, 28, 21, 20, 31, 29, 26, 31, 33, 37, 48, 34
Find the quartiles and the I.Q.R. for these
data:
Clues: This is a "LIST OF DATA"
Firstly, re-write the numbers in numerical order:
20, 21, 23, 26, 28, 29, 31,
,
,
,
,
╘══════════════════════════════════╤═════════════════════════════════╛
n = 12
Since n = 12, Q1 is the (Ό(12)+½)th value, which is the 3½th value:
20, 21, 23, 26, 28, 29, 31,
,
,
,
,
▲ ▲ ▲
3rd value ═════╝ ║ ╚═════ 4th value
║
3½th value would be
between these two
So Q1 = ½(23+26) = ...
Q3 is the (Ύ(12)+½)th value, which is the 9½th value
Now the 9th value is ... and the 10th value is ..., so Q3 = ½(...+...) = ...
Finally: I.Q.R. = Q3 Q1 = ... ... = ...
Question 6: A group of old people were asked to count how
many grey hairs they have:
192, 180, 131, 187, 200, 210, 194, 199, 204, 203, 200
Find the quartiles and the I.Q.R. for these
data:
Clues: This is "A LIST OF DATA"
Start by putting the data in numerical order
n =
Q1 is the (Ό(
)+½)th value, which is the ...th value
╘═════╦═════╛
║
If we need to find a value that ends in 'Ό'; such as the 3Όth value
In that case, we round DOWN and find the 3rd value instead
|
Similarly, to find Q3, we will need to round up after working out Ύn+½
Question 7: I'm fed up that some students don't hand in all
their corrections. I decided to investigate the number of
outstanding corrections for each statistics student before deciding upon
a plan of action!
No of outstanding
corrections
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
No of students
|
2
|
3
|
3
|
4
|
4
|
3
|
3
|
2
|
2
|
(a): Determine the quartiles and the I.Q.R. of these data
(b): I decided that students whose number of outstanding
corrections EXCEEDS the 3rd quartile will be expelled. How many
students will be expelled?
Clues: This is "A TABLE OF UNGROUPED DATA"
We need to start will a table of cumulative frequencies:
No of outstanding
corrections
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
F
|
2
|
5
|
8
|
12
|
|
|
|
|
26
|
n = 26
Q1 is the (Ό(26)+½)th value, which is the 7th value
But, looking along the cumulative frequencies, 7 is NOT THERE!
In that case, we must use the next cumulative frequency that is ABOVE 7
No of outstanding
corrections
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
F
|
2
|
5
|
8
|
12
|
|
|
|
|
26
|
▲
║
╒══════════════════╩══════════════════╕
│ Since 7 is NOT THERE, we must look │
│ up the next number ABOVE 7 in the │
│ cumulative frequencies (which is 8) │
╘═════════════════════════════════════╛
So, Q1 is 2
We find Q3 in the same way
Question 8: The number of 'sick days' taken by 200
employees in 2001 and in 2002 are shown in the table:
No of sick days
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
frequency (in 2001)
|
5
|
12
|
13
|
45
|
52
|
30
|
13
|
15
|
10
|
5
|
frequency (in 2002)
|
0
|
15
|
22
|
23
|
35
|
35
|
30
|
20
|
10
|
10
|
(a): Find the I.Q.R. for the number of sick
days in 2001
(b): Find the I.Q.R. for the number of sick days
in 2002
Clues: This is "A TABLE OF UNGROUPED DATA"
Firstly deal with the data for 2001:
No of sick days
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
frequency (in 2001)
|
5
|
12
|
13
|
45
|
52
|
30
|
13
|
15
|
10
|
5
|
Write a table of cumulative frequencies and then proceed in a similar way to question 7
Then do the data for 2002
Question 9: I asked last year's students how long it took
them to do this question
Time taken (mins)
|
4-5
|
5-6
|
6-7
|
7-8
|
8-10
|
10-15
|
Number of students
|
2
|
5
|
7
|
2
|
4
|
2
|
(a): Determine the interquartile range of
the time taken to do this question
(b): How long did it take you to do that?
Clues: We have a "TABLE OF GROUPED DATA"
So this time, to form our cumulative frequency table, we need to use the UPPER BOUNDARY of each class:
U.C.B.
|
5
|
6
|
7
|
8
|
10
|
15
|
F
|
2
|
7
|
14
|
16
|
20
|
22
|
║ ║
║ ║
╒════════╩══════════╩═══════╕
Q1 is the 6th value
Let's make a little space
between F = 2 and F = 7
╘════════╦══════════╦═══════╛
║ ╚═════════╗
▼ ▼
U.C.B.
|
5
|
Q1
|
6
|
7
|
8
|
10
|
15
|
F
|
2
|
6
|
7
|
14
|
16
|
20
|
22
|
● We need to find some differences using our table: 'Δ1', 'Δ2', 'D1' & 'D2' need to be found
|
╔═══════ D2=1 ═══════╗
|
|
|
|
|
|
╔═ D1=??? ═╗
|
|
|
|
|
|
U.C.B.
|
5
|
Q1
|
6
|
7
|
8
|
10
|
15
|
F
|
2
|
6
|
7
|
14
|
16
|
20
|
22
|
|
╚═ Δ1= 4 ═╝
|
|
|
|
|
|
|
╚═══════ Δ2= 5 ══════╝
|
|
|
|
|
● Among these differences, only D2 is unknown - but it can be found using:
D1 = Δ1
D2 Δ2
=> D1 = 4
1 5
=> D1 = 0.8
● D1 tells us what to add to the class to the left of the median to estimate the median:
|
╔══ +... ══╗
║ ▼
|
|
|
U.C.B.
|
5
|
...
|
6
|
7
|
8
|
10
|
15
|
F
|
2
|
6
|
7
|
14
|
16
|
20
|
22
|
So, that's Q1 found!
Now to find Q3:
Q3 is the
th value, so let's make a little space between the column where F = 16
and the column where F = 20
U.C.B.
|
5
|
6
|
7
|
8
|
Q3
|
10
|
15
|
F
|
2
|
7
|
14
|
16
|
...
|
20
|
22
|
And, in the same way working out D1
And adding it to 8 to get Q3
Question 10: The salaries of the 200 workers was also
recorded:
Salary (£1000s)
|
10-15
|
15-20
|
20-25
|
25-30
|
30-40
|
40-60
|
frequency
|
10
|
55
|
68
|
38
|
17
|
12
|
Find the median and quartiles of the salaries
Clues: ╒═══════════════════════════╕ ╒═════════════════╕
This time, as n is large, We can just use
we don't HAVE to use
╘═════════════╤═════════════╛ ╘════════╤════════╛
╒═══╧════╕ ╒══╧═╕
Q1 ┤ Όn+½ ├┤ Όn │
Median ┤ ½(n+1) ├┤ ½n │
Q3 ┤ Ύn+½ ├┤ Ύn │
╘════════╛ ╘════╛
The pass mark (to avoid additional homework
on this topic) is 8/10
Hand in your workings and answers
|