[This handbook has been prepared by Ian Johnston of Malaspina University-College, Nanaimo, BC, for students in Liberal Studies. The text is in the public domain, released May 2000]
In most of the examples we have been dealing with
so far, our statistical analysis has usually involved a complete set of
information about all the items we wished to study (e.g., all the students in a
class). In other words we have been
dealing with populations (i.e., we had data for all the items in which were
interested).
When our analysis is based upon an entire
population (i.e., all the members of the group under study, each of whom is
taken into account in the analysis), we are interested in data on each member
of the group, and we do not extend our conclusions beyond that particular
group.
In most statistical studies, however, the
population we are interested in is far too large for us to measure each and
every one of the members of it (e.g., all students at Malaspina
University-College, all Canadian voters, all cars made in Detroit, all children
in Nanaimo, and so on). In such cases,
we confine our analysis to a relatively small selection taken from the total
population. Such a selection is called
a sample.
The purpose of dealing with a sample is
straightforward: it enables us to study a large population and to learn things
about it, so that we can draw important inferences, without having to go to the
trouble of collecting data from every member of the entire population.
A very important part of statistics is the study
of the sorts of conclusions we can make about an entire population on the basis
of a relatively small sample. For
instance, if we have measured data on, say, voting patterns for 1000 people,
are we entitled to make any conclusions based on that information about the
voting patterns of the population in general?
And if so, what are the limits to the sorts of generalizations we can
make? What are we not entitled to
conclude about the wider population?
How does my ability to make conclusions about the wider population
change as the size of my sample increases?
How do I test claims made about entire populations on the basis of an
analysis of a single sample? And so on.
In other words, to use statistical information
properly we need to understand something about the relationship between the
information we have collected from a representative group of the entire
population (the sample) and the total population itself, from which the sample
is taken and for which we can never conduct complete measurements, since
obtaining the information would be too time consuming, if not impossible.
Obviously, one important point in working with
samples is the selection of a truly representative sample—a collection of
individual items for observation which accurately represents the larger
population. It is beyond the scope of
this module to explore the various methods statisticians use to make sure their
sampling techniques do not introduce major errors into the calculations (a
complex subject); however, it is appropriate to say a few things about the main
methods.
There are a number of common procedures for
selecting a sample, some simple and some more complicated. Haphazard (or Opportunity) sampling, for
example, relies upon the convenience of the sampler or the self-selection of
the sample (e.g., volunteers who respond to a mailed out questionnaire or who
are picked at random from a crowd) (1). Quota Sampling sets quotas for various
categories in the sample (so many men, so many women, so many over age 45, so
many under age 45, and so on), so as to achieve a representation of the major
divisions in the larger population.
Random Sampling picks members of the sample according to a random
process, thus giving each member of the large population an equal opportunity
of being selected.
In general, of the methods mentioned above, Random
Sampling is the preferred method, with the least built in bias.[1] However, in order for random sampling to be
possible, there must be available a list of everyone in the population to be
sampled (for reasons explained below).
Where that requirement cannot be conveniently met (e.g., in a survey of
all Canadians or all residents of BC), then the simple method of random
sampling outlined below is not appropriate.
In a simple random sample, with a list of the
entire population under investigation, the sampler then assigns a number to
each item in the list and selects the sample by consulting a random number
generator or a table of random numbers.
The process works as outlined below.
Suppose we wish to investigate all the workers in
a particular factory, but we do not have the time or the resources to deal with
them all. So we decide to work with a
sample of 30 workers out of a total factory population of 450. We begin by assigning each member of the
total population a number. Since the
largest number we require (450) has three digits, we give everyone a
three-digit number, starting with 001, 002, 003, 004, and so on, up to 450.
We then consult a list of random numbers. The list of random numbers looks like this
(a portion of a page).
5551 5412 3765 4953 0455 9710 2164 8634
7361 5427 2956 7405 3914 1084 4300 1221
2605 0815 8612 8995 7925 1856 3096 6139
3666 5516 9467 2205 2370 0047 1760 7761
To begin the selection we blindly point to some
number in the table (say, for example, 2956, the figure in bold above). Then, reading across the table we take
three-digit numbers. If they fit
someone in the general population, that person is selected; if they do not,
then we move on to the next three-digit number.
So, starting with 2956, the first-three digit
number is 295. Since we have someone in
our list of 450 with that number, we select that person. The next three-digit number (continuing to
read to the right) is 674. That does
not fit (since we have only 450 in the total population we are studying), so we
move on. The next three-digit number is
053. This number fits, so the person
with this number is selected for the sample.
We continue this process, moving through the table
of random numbers, until we have the number we need for the sample. To complete the selection of the sample of
30, we would obviously need a bigger list of random numbers than the partial
list given above.
There is less bias in this selection because
everyone in the total population has an equal chance of being included in the
sample. We have made no attempt to
organize the population into different sections or proportions. If we were working on sampling merchandise
or samples for experiment, we would proceed in the same way, first assigning a
number to each item in the larger population and then consulting a list of
random numbers to select the items for our sample.
For some opinion polls, a variation of this method
of random sampling can be useful: random digit dialling for a telephone survey
(although such a method is biased in favour of those with telephones or more
than one telephone number or who spend a lot of time at home).
Another important factor in any sample is the
size. The most appropriate size will
depend upon the accuracy we wish and upon the size of the general population we
are sampling. We shall be dealing with
this question later in this section.
Let us assume we have properly identified our
sample from the large population we are interested in. On the basis of the measurements I have made
of the sample I have collected, I have a group of numbers. Thus, I can calculate the mean of this
sample (remember that the mean is the arithmetical average) in the usual way
(adding up all the values and dividing by the total number in the sample or by
entering the measurements on an Excel worksheet and getting Excel to make the
calculation for me). This figure is
called the Sample Mean.
Suppose, now, I conduct another similar sample of
the same general population (not including in the second sample anyone who was
part of the first sample). I will
obtain a second set of measurements from my new sample, and I can calculate the
mean of that collection of numbers. Now
I have a second Sample Mean. If I have
done my sampling without major bias, the second Sample Mean should be close to
the first Sample Mean (since I am sampling the same general population). But the value for the second Sample Mean
will almost certainly be somewhat different from the first (even if the
difference is quite small).
For example, suppose I am investigating the body
length of an adult male lizard. I
collect my first sample of, say, thirty lizards, measure the body length, enter
the data on a worksheet, and obtain a mean value for that sample Suppose this value is 6.56 inches. I then collect a second sample for the same
animal, measure the body lengths, enter the data on a worksheet, and obtain a
mean value for that sample of 6.43 inches.
These two figures are both sample means for the same general population
(all the adult male lizards): Sample Mean 1 and Sample Mean 2.
Suppose I continue in this fashion, making a
number of different samples and calculating the mean of each. Gradually I will collect a list of Sample
Means, one for each of the samples I have collected. I will create a list of numbers, each representing a separate
Sample Mean. These will probably be
quite close to each other in value, but there will be differences. In other words, the value of the Sample
Means will be distributed; we can think of the values we obtain for the different
Sample Means has having a frequency distribution, just like any other list of
numbers.
Make sure you understand this point. The collection of means from different
samples will provide a list of numbers which, like any such list (of the sort
we have been examining) will have a frequency distribution (with a mean value,
a median, a variation, and a standard deviation).
In order to reinforce this last point, let us
continue to work through our example with the adult male lizards. I continue my sampling, measuring, and
calculating, and produce the following results (let us assume for the sake of
argument that each sample contains 30 male lizards):
Sample 1:
S-Mean 1: 6.56 in
Sample 2: S-Mean 2: 6.43 in
Sample 3: S-Mean 3: 6.48 in
Sample 4: S-Mean 4: 6.51 in
Sample 5: S-Mean 5: 6.40 in
Sample 6: S-Mean 6: 6.52 in
Sample 7: S-Mean 7: 6.54 in
Sample 8: S-Mean 8: 6.47 in
Sample 9: S-Mean 9: 6.49 in
Sample 10: S-Mean 10: 6.53 in.
Remember that each of these S-means is the average
for a sample of 30 adult male lizards.
This list of numbers also has a mean value (6.493 in) and a Standard
Deviation (0.0499 in) These, you will recall,
we can have Excel calculate for us (just as for any list of numbers).
You will remember from the previous chapter that
the standard deviation is a measure of the distribution of the frequencies in
the probable results. A small standard
deviation (as in the above example) means that most of the values will lie
close to the overall mean of the numbers in the list.
For reasons which lie outside the scope of this
report, the values of the S-Means will have a frequency distribution
represented by the normal curve (that is, the probabilities that particular
S-means will have certain values will follow the pattern of a normal
distribution, which we discussed in the previous section). Thus, the various probabilistic
characteristics of the normal curve, which we have studied in an earlier
module, will apply to the collection of samples we have made (2). Please make sure you
understand this very important point; everything we do in the rest of this
chapter depends upon it.
We also know from mathematical studies that in
such a normal distribution of all the S-means for a particular population, the
mean value (the mid point, the highest part of the normal curve of S-means)
will be the same as the average for the entire population. We cannot measure all the population and
then calculate the mean, but we can theoretically establish that if we did so,
the mean for the entire population would be same as the average of all the
means of all the samples of that population we could collect (since if our
sampling was complete we would have measured each member of the population).
This point is obvious enough if you think about
it. If I kept collecting samples like
the 10 listed above, eventually I would have sampled the entire population
(assuming no two lizards were in more than one sample). The average of all my samples would then be
the average of the entire population, because all my samples would be the same
as the entire population.
Any particular sample we take of 30 adult male
lizards might be truly representative of the total population (in which case
the mean of the sample would coincide with the mean for the entire population),
or it might misrepresent somewhat the population under study (that is, the
sample mean may be displaced from the population mean). We have no way of directly knowing that
unless we can measure every member of the population.
The more samples we collect and the larger those
samples, the closer the average height obtained by averaging the means of all
the samples will be to the average height for the entire population. If I kept sampling until I had sampled every
adult male lizard, then the average of all the sample means would be the same
as the average for the total population.
Now, in practice we usually do not have time (or
money) to carry out enough measurements of separate samples to calculate the
mean of all the Sample Means (we do not want to carry out a very large number
of samples, find the average of each sample, and then, treat those averages as
a distribution, calculating the mean of the S-Means and the standard deviation,
as we theorized above). Besides, in
many cases (as in the male lizard example) we may never know whether we have
sampled every single member of the population.
In most cases, we are interested in making some
judgement about the entire population on the basis of a single sample (of, say,
50). So what is of immediate interest
is this question: If I use the S-Mean from a single sample of observations to make an estimate about the mean
for the entire population, how likely am I to make a serious mistake?
Note the importance of this question. It poses an vital statistical enquiry: On
the basis of a single sample, what am I entitled to conclude about the entire
population? For example, if I have
randomly selected adult male lizards for a measurement of their body length,
what legitimate conclusions can I draw from this small sample about all the
adult male lizards? How certain can I
be of any such inferences?
It turns out that the error in basing a conclusion
about the entire population on a small sample is likely to be quite small. This vital conclusion follows from the
important fact that the distribution of all possible Sample Means is a normal
curve and that the normal curve has important characteristics (as we have seen
in the previous section).
For we know that in any normal curve, the further
any value falls from the mean, the less likely it is to occur. You will recall that there is approximately
a .68 probability that any value will fall within 1 Standard Deviation on both
sides of the mean, and approximately a .95 probability that any value will fall
within 2 Standard Deviations on both sides of the mean. Thus, from the properties of all normal
distributions, we know that there is only a .05 probability that any value will
lie more than 2 Standard Deviations from the mean. Hence, the more a sample is a poor representative of the entire
population, the less likely it is to occur.
Since the Sample Means are normally distributed
around the value of the mean of the entire population, the further the mean of
any one sample is from this mean of the entire population, the less likely it
is to occur. As one moves from the
mid-point of the distribution in either direction, the number of samples which
produce an Sample Mean much smaller or larger than the mean of the population
gets smaller and smaller (since the means of those samples would have to fit
into the extremes of the normal curve).
What this implies is that if we could ascertain
the Standard Deviation for the distribution of sample means, we would know the
probabilities that any particular sample mean would be close to or far away
from the mean for the entire population.
Remember that we are conceptualizing a normal
distribution curve which represents all the frequencies of all the mean values
for all the samples we might make of a large population. We have ascertained that the mean value of
such a curve will be the same as the mean value for the entire population we are
studying. If we could find out the Standard
Deviation of this normal curve, then we would know how the various values of
the sample means are distributed in relation to the mean of the normal curve.
The Standard Deviation of this normal distribution
of Sample Means is called the Standard
Error or the Standard Error of the
Means. If we had a way of
ascertaining its value, then we could describe the probabilities of the entire
curve, just as we can for any normally distributed value.
Make sure you understand the difference between
the terms Standard Error and Standard Deviation. The standard error is the name of a very
particular standard deviation, the standard deviation of the means of all the
samples we could take of a particular population (e.g., the population of adult
male lizard in the example we have been considering).
To clarify this issue, if it still needs
clarification, let me list here once more some summary points:
1.
When we
collect a sample or deal with the entire population in our measurements, we can
list all the numerical results and then calculate the mean and the standard
deviation of that list by the methods we have already discussed (usually
getting Excel’s Descriptive Statistics function do the work for us).
2.
When we
are dealing with a very large population, we will take a small sample picked so
as to avoid bias. The larger total
population has a mean and a standard deviation, but we do not have the time or
the resources to measure all the cases (even if we could locate them), and
therefore we do not know what these figures are directly. The only direct observations we have are
from the sample we have taken.
3.
However,
the Standard Error, which we are able to calculate from our sample (see below),
will give us the Standard Deviation of all the different averages from all the
samples we could make of the general population (or a figure close enough to
the Standard Deviation of the entire population to use as a substitute for it).
4.
We use
the term Standard Deviation to remind ourselves that the figure we are dealing
with refers to a sample or to an entire population. We use the term Standard Error to remind ourselves that we are
dealing with the distribution of the averages from all possible samples (even
though we have undertaken to measure only a single sample).
In our discussion above, we outlined one method
for calculating the Standard Error.
That was to collect all the possible samples of a population, calculate
the mean, and then calculate the Standard Deviation of the frequency
distribution of Sample Means.
Theoretically, that is fine, but in practice, we simply cannot carry out
sampling until we have included the entire population of our study.
Fortunately, there is another way of calculating
the Standard Error. Mathematicians have
demonstrated that the Standard Error (which tells us the Standard Distribution
in the normal curve of all the possible Sample Means) can be derived from a
single sample (or a value so close to the Standard Distribution of that curve
that for practical purposes we can treat it as the Standard Error). The value is equal to the Standard Deviation
of the sample divided by the square root of the number of items in the sample.
Now, this information, as we shall see, turns out
to be a very powerful piece of information.
From a single sample, we can calculate the standard distribution of the
normal curve depicting the means of all possible samples. Make sure you understand this point; much of
what we do from here on depends upon grasping this idea that from one
relatively small sample of a large population we can draw conclusions about the
distribution of the averages from all possible samples of that same population.
Minimum Sample Size For the
mathematics we have been discussing to work effectively, the sample we select
must not be too small. The minimum
permissible size is 30 observations.
And remember that when we are dealing with samples (as opposed to
total populations), to derive the standard deviation of the sample, we divide
the sum of the squared differences between the mean and the observation by
one less than the number in the sample.
If this is a puzzle to you, do not worry about it, since Excel does
the calculations anyway. But this
practice of dividing by one less than the number in the sample is the reason
why Excel’s calculation of the standard deviation of a list of numbers is
always slightly higher than the result produced by a manual working out of
the result which uses all the numbers in the sample. Excel treats every list of numbers as a
sample not as the total population. In
calculating the standard error, however, we do not follow the same principle
of using one less than the number in the sample. As the formula above indicates, we divide the standard
deviation by the square root of the total number of items in the sample. As you
may have already observed, Excel calculates the standard error for any list
of data and includes the figure in the Descriptive Statistics box. |
The fact that we can calculate the standard error
of the means from a single sample of populations turns out to be
extraordinarily useful. For on the
basis of a single sample (provided it is more than 30 and free from bias), we
can derive the standard deviation of the normal curve representing the means of
all possible samples. And this, in
turn, enables us to calculate the probability that our sample mean is close to
or far away from the mean of all the sample means (which is equivalent to the
mean of the total population).
For instance, suppose, as a consumer advocate, I
am interested in examining the quality of a particular brand of light bulbs, to
see if they are up to the manufacturer’s guarantee. Well, first I collect a random sample of, say, 100 bulbs. I then test that sample, measuring the
number of hours the bulb functions before burning out. This test yields a list of one hundred
results (one for each member of the sample).
From these one hundred numbers, I calculate (or Excel calculates for me)
the mean life of the bulbs in the sample and the standard deviation of the
results listed from the test of the sample.
Mean life of the light bulbs in the
sample: 300 hr
Standard deviation of the sample: 20 hr
From these two figures I can calculate the
standard error: the standard deviation of the sample divided by the square root
of the number of items in the sample or, in this case, 20 divided by the square
root of 100, that is by10, for a result of 2 hr.
We know that the average of all the means of all
the samples is the same as the average for the entire population, and we know
that the standard deviation in the normal curve representing the values for all
the different sample means is equal to the standard error (2 hr).
Therefore, on the basis of my single sample, I can
conclude that there is a .68 probability that the average for the entire
population of all the light bulbs lies within 1 standard error of the mean of
my sample, that is, between (300 - 2) and (300 + 2), or between 298 hr and 302
hr. There is a .95 probability that the
mean of the total population of light bulbs (that is, the average life of all
the light bulbs made by this manufacturer) lies between the sample mean and 2
standard errors, or between (300 - 4) and (300 + 4), that is, between 296 and
304 hr.
Notice the nature of this conclusion. On the basis of a relatively small sample of
a very large population, we can establish a conclusion about that larger population. The conclusion is in the form of a series of
probability statements, each of which defines a range of possible values. This form of conclusion and its uses will become
clearer in some of the examples and exercises which follow.
What does all this add up to? Well, here’s a hypothetical practical
illustration. Suppose I wish to learn
about the mathematical capabilities of all the Grade XII students in
Nanaimo. I have neither the money nor
the time to arrange to have them all tested.
Thus, I organize a random sample of, say, 100 students and give them a
special test on their mathematical skills.
I find that the average score in the sample is 65, with a standard
deviation of 16.74. What can I conclude
on the basis of this information about the average capabilities in mathematics
for all Grade XII student in Nanaimo?
Well, I begin by calculating the standard error
(or reading it off from the Descriptive Statistics table generated by Excel,
once I have entered the observational data onto a worksheet). In this case the standard error is 1.67
marks.
Now the average (mean) score in my sample was
65. And I know that if I analyzed many
similar samples, the averages of the samples would be normally distributed in a
curve where the standard deviation is equal to the standard error calculated
above (1.67 marks).
Thus, if the average in my sample was 65, I can
state that there is a .68 probability that it falls within 1 standard error of
the mean of the total population of all the Nanaimo Grade XII students (either
higher or lower). Thus I am 68 percent
certain that the mean score for all the students in Nanaimo on this mathematics
test is between (65 - 1.67) and (65 + 1.67), that is, between 63.33 and 66.67.
If I want to be more certain than this, I can
state that there is a probability of .95 (or that I am 95 percent certain) that
the average for the entire Nanaimo Grade XII population on this mathematics
test will fall between the sample mean and 2 standard errors, that is, between
[65 - (2 x 1.67)] and [65 + (2 x 1.67)] or between 61.66 and 68.34.
If I want to be even more confident, I can state
with .99 probability (or 99 percent certainty) that the average for the entire
Nanaimo Grade XII population will be with 3 standard errors of the sample mean.
You are interested in finding out about the hours
elementary school children in School District 68 spend in organized
recreational exercise outside of school.
You select a random sample of 50 elementary school students, obtain data
about organized recreational exercise for each of them, enter the data on an
Excel worksheet, and obtain the following result.
Mean time spent
in organized recreational exercise (per week): 2.46 hr
Standard deviation in the sample: 2.01 hr
Use the method we have already gone through with
the light bulbs and the Grade XII students to produce a conclusion about the
average hours of organized recreational exercise for all elementary students in
School District 68. State the
conclusion with .68 probability, with .95 probability, and with .99 probability
(or with 68 percent certainty, with 95 percent certainty, and with 99 percent
certainty).
For an answer to this self-test see the end of
this section of the module.
We have already briefly discussed the nature of
the conclusion we have been drawing from these statements about a total
population based on what we measure in a relatively small sample. These inferences consist of a range of
values and a mathematical figure of probability (e.g., .68 probability, a .95
probability).
Statements like this illustrate what is called a confidence level, a conclusion which
offers a range of values and a statement of probability: we conclude that there
is a p probability that the mean of
the total population falls between figures x
and y. This might also be stated negatively: there is a certain
probability that the average score for the total population does not fall
between x and y.
The figure for the probability (p) is determined by the distance the
limits of the range are from the mean of the sample (measured in standard
errors or, to use language we introduced in an earlier section, measured in the
z-score). As we saw in the last chapter, we can have 68 percent confidence
(or p = .68) that any value in a
normal distribution will fall within one standard deviation of the mean (i.e.,
have a z-score of between -1 and
+1). We can have a 95 percent
confidence (p = .95) that any value
in a normal distribution will fall within 2 standard deviations of the mean,
that is, between a z-score of -2 and
a z-score of +2. And we can have a 99 percent confidence (p = .99) that any value in a normal
distribution will fall between a z-score
of -3 and a z-score of +3.
Notice that, as we would expect, I can increase
the confidence of my conclusions by widening the range within which the value
will fall. The more certain I wish to
be, the wider the range of values. If I
want to narrow the range of values in my conclusion, then I lower the
confidence level.
Understanding Poll Results This
point about confidence levels is important in understanding the way in which
the media publish poll results. For
example, when a newscaster says that a recent poll has just revealed that 42
percent of the electorate would vote Liberal if the election were held
tomorrow, that remark will usually be accompanied by a qualification like the
following: “These results are considered accurate within 2.5 percentage
points nineteen times out of twenty.”
What this qualification means is that the pollsters are 95 percent
confident that (i.e., sure that in 19 cases out of 20) if the election were
held tomorrow, the Liberals would get 42 plus or minus 2.5 percent of the
vote (i.e., between 39.5 and 44.5 percent of the vote). On the basis of their relatively small
sample, they are establishing a confidence level and a range within two
standard errors. |
On the basis of what we have learned so far about
making conclusions about a large population on the basis of a single sample of
more than 30, we can notice some interesting further details about this very
useful procedure made possible by the calculation of the standard error.
First, the size of the confidence interval depends
upon the size of the standard error (which is a measure of the standard
deviation in the distribution of sample means). Thus, if we can lessen the standard error, we can diminish the
range of values in each confidence level (and thus provide more precise
conclusions).
You may recall that we calculate the standard
error from the sample, taking the standard deviation of that sample and
dividing the figure by the square root of the number of observations in the
sample. Since we calculate the standard
error by dividing by the square root of the number in the sample, increasing
the number in the sample may have only a small effect on decreasing the size of
the standard error.
So a question I might like to consider is the
following: in order to lessen the size of the standard error, how much would I
have to increase the size of my sample?
Or, alternatively, will increasing the size of my sample enable me to
narrow the range of the conclusion?
The answer, it turns out for reasons explained
below, is that increasing the sample size can indeed narrow the range of
results, but that the increase in the sample size has to be very large—so
large, in fact, that it may prove to be too costly and time consuming to
implement.
For example, if we were dealing with a sample of
100 students in a study of their skills on a test and if the standard deviation
of the list of results in our sample was, say, 16 marks, then we would
calculate the standard error by dividing the standard deviation by the square
root of the number in the sample, that is, 16 divided by the square root of
100, or 16 divided by 10, or 1.6. Thus,
in estimating the confidence intervals for the entire population of students,
we would be using the figure of 1.6 marks as the basis of our intervals to
calculate the ranges for .68, .95, and .99 probability.
Now, if we wanted a narrower range, in order to
have a more precise result, we would like to reduce the standard error (thus
having a smaller interval). One way we
might like to do this is to increase the size of the sample. If we increase its size, then we increase
the size of its square root and therefore diminish the standard error (which is
produced by dividing the standard deviation by the square root of the number in
the sample).
However, since we are dealing with the square root
of the number in the sample, we will have to increase the sample size
considerably. For instance, in the
example above we dealt with a sample of 100 students and achieved a standard
error of 1.6 by dividing the standard deviation of the sample, 16, by the
square root of 100, or 10. If we wanted
to reduce the standard error by half, we would have to divide 16 by 20. And to be able to do this we would have to
sample 400 students (the square root of 400 is 20).
What this means, in effect, is that in many cases
it is not worth the effort to increase the sample size in order to achieve more
precise results. Since selecting the
sample information is the really time consuming part of the analysis, it is generally
more efficient to keep the sample relatively small (provided it is over 30) and
to concentrate on making it the best sample we can achieve (i.e., least liable
to bias).
This is not to say, of course, that the size the
sample is irrelevant. Obviously, that
is not the case. Increasing the size of
the sample does reduce the standard error and thus makes the conclusions more
precise. In fact, mathematicians have
drawn up guidelines as to the most appropriate sizes for samples relative to
the size of the larger population they are intended to represent and to the
level of accuracy in the sampling revealed.
In this module, as mentioned before, we are not
dealing with the complex rules for proper samplying strategies (other than the
few remarks previously in this section).
So we are not concerning ourselves with the problems of sampling
error. In the various examples we work
through, we shall assume that the sample is a good one and will take into
account the sampling error (as we should if we were being statistically
diligent).
However, for interest only, you might like to see
a list of the recommended sample sizes for different populations. The table below, from a book on surveys,
indicates some recommended sample sizes:
Recommended
Sample Sizes for Different Populations and Permissible Sampling Error |
|||||
Sampling Error Allowed |
Population Size |
||||
|
500 |
1000 |
10,000 |
100,000 |
1
million |
±10 |
83 |
91 |
99 |
100 |
100 |
±5 |
222 |
286 |
385 |
398 |
400 |
±4 |
250 |
385 |
588 |
621 |
625 |
±3 |
250 |
500 |
1000 |
1099 |
1111 |
±2 |
250 |
500 |
2000 |
2439 |
2500 |
±1 |
250 |
500 |
5000 |
9091 |
10,000 |
Let us review one more time the steps in making
confidence generalizations about an entire population from a single sample.
1.
First we
select a sampling strategy (normally using random sampling when the total
population is suitable for this process), select our sample (making sure we
have at least 30 separate observations in it), and collect the information.
2.
Then, we
enter the data on a spreadsheet (like Excel) and apply the Descriptive
Statistics tool in order to ascertain the mean and the standard error of the
sample.
3.
Finally,
we make our conclusions at different confidence levels: 68 percent for a range
within 1 standard error of the mean of our sample (above and below), 95 percent
for a range within 2 standard errors of the mean of the sample, and 99 percent
for a range within 3 standard errors of mean of the sample.
Suppose I wish to know (for purposes of
comparison) the average score for all first-year university students in British
Columbia on a standard intelligence quotient (IQ) test. Going through the steps outlined above, I
complete steps 1 and 2 for a sample of 100 students. The mean score of the sample is 112; the standard deviation is 12
points.
From these two figures I can compute the standard
error (the standard deviation divided by the square root of the number in the
sample): that comes out to 12 divided by 10 or 1.2 points.
Now, I can make my conclusion at different
confidence levels, as follows:
1.
I am 68
percent certain that the average IQ score on this test for all first-year
university students in BC is within a range 1 standard error on either side of
the mean of my sample, that is, between 110.8 and 113.2.
2.
I am 95
percent certain that the average IQ score on this test for all first-year
students in BC is within a range 2 standard errors on either side of the mean
of my sample, that is, between 109.6 and 114.4.
3.
I am 99
percent certain that the average IQ score on this test for all first-year
students in BC is with a range 3 standard errors on either side of the mean for
my sample, that is, between 108.4 and 115.6.
Using the method outlined immediately above, try
the two following problems.
1.
We want
to know the average pulse rate in a population of 1000 track athletes. We sample the pulse rates of 50 athletes
taken at random and calculate the mean pulse rate of the sample to be 79.1
beats per minute, with a standard deviation of 7.6 beats per minute. What can we conclude about the mean value
(in beats per minute) for the entire population of athletes? State your conclusion at three different
confidence intervals (at .68, .95, and .99 probability).
2.
A sample
study of the family incomes in Canada revealed the following: sample size,
1600, mean family income of the sample--$51,300; standard deviation of the
sample--$8000. What can you infer about
the mean family income for the entire population at a confidence level of 95
percent?
Up to this point we have only dealt with three
confidence levels: 68 percent (or .68 probability), 95 percent (or .95
probability), and 99 percent (or .99 probability). We used these because they correspond to the ranges defined by 1,
2, and 3 standard deviations away from the mean (something we learned in the
previous chapter).
In practice, however, we are not limited to just
these three figures. We can establish
any level of confidence we want. But we
will need to know how many standard deviations is represented by the particular
level we choose, so that we can define the range properly.
If this puzzles you, let us go through the point
step by step, as follows:
1.
A normal
curve (the shape of a normal distribution) indicates the relative frequencies
of all the values in the population we are studying. Thus, we can imagine the area under the top line of the curve as
representing the entire population.
2.
If we
think of the population under the curve as an area, then we can see clearly
that in a normal distribution the total population is divided in half by the
mean. There is thus a .5 probability in
any normally distributed population that a particular value will fall in the
area to the right of the mean (i.e., in the upper values), and a .5 probability
that any particular member of the population will fall to the left of the mean
(i.e., in the lower half of the values).
3.
In the
previous chapter, we discussed how in the normally distributed curve, the area
under the curve is always divided in the same way by units of standard deviation:
68 percent of the total population falls within 1 standard deviation of the
mean (34 percent on either side); 95 percent of the population falls within 2
standard deviations of the mean (47.5 on either side); and 99 percent of the
total population falls within 3 standard deviations of the mean (49.5 on either
side).
4.
But
clearly we are not confined to just 1, 2, or 3 standard deviations. There are all sorts of possibilities in
between them (e.g., 1.2 standard deviations, 0.7 standard deviations, and so
on). And each of these will define a
different area under the normal curve.
And each area, so defined, will include its own percentage of the total
population (and thus establish its own confidence level).
5.
Now, the
mathematics of calculating areas under the normal curve for all distances away
from the mean is complex and laborious.
Fortunately, however, mathematicians have created tables for us, using
which we can simply read off particular distances from the mean and their
corresponding areas. Thus, we can
easily determine what level of confidence we want and find the distance
appropriate to it.
On the next page is an example of such a
table. It indicates in the extreme left
hand column (in bold) the distance away from the mean in standard deviation
units (which is, as we mentioned before, the z-score) for one half the curve (this item is important, as we
shall see). This column ranges from 0.0
distance from the mean (i.e., the mean itself) to 3.09, just over three
standard deviations away. As we move
down the column we move in increments of .1.
The other columns indicate the next decimal place
for that particular z-score. Thus, in the first line, the area under
normal curve at a z-score of 0.00 is
.0000. This means that when we are
exactly on the mean, the area under the curve is 0. If we move to the next column (to the right), the z-score is 0.01, and the corresponding
area under the curve between the mean and this distance away from it is
0.0040. Since we are dealing with only
one half the curve, the total area under the curve defined by a z-score of 0.01 on both sides of the
curve is twice the given value, 0.008 (or 0.8 percent of the total area under
the curve is within a z-score of 0.01
on either side of the mean).
If you check now the area figure for a z-score of 1.00 you will notice that it
reads .3413. This means that of all the
scores under the curve 34.13 percent of them will fall between the mean and a z-score of 1 on one side of the
curve. If we want to include all the
scores within 1 standard deviation of the mean on both sides, then we would
double this figure (i.e., to 68.26 percent).
We have been using the figure 68 percent as a convenient approximation
of that value.
This table may at first look somewhat confusing,
but read over the paragraphs above (consulting the table) a few times until you
are familiar with what these numbers mean
Table Showing Area Under the
Normal Curve at Different z-Scores |
||||||||||
z-score |
.00 |
.01 |
.02 |
.03 |
.04 |
.05 |
.06 |
.07 |
.08 |
.09 |
0.0 |
.0000 |
.0040 |
.0080 |
.0120 |
.0160 |
.0199 |
.0239 |
.0279 |
.0319 |
.0359 |
0.1 |
.0398 |
.0438 |
.0478 |
.0517 |
.0557 |
.0596 |
.0636 |
.0675 |
.0714 |
.0753 |
0.2 |
.0793 |
.0832 |
.0871 |
.0910 |
.0948 |
.0987 |
.1026 |
.1064 |
.1103 |
.1141 |
0.3 |
.1179 |
.1217 |
.1255 |
.1293 |
.1331 |
.1368 |
.1406 |
.1443 |
.1480 |
.1517 |
0.4 |
.1554 |
.1591 |
.1628 |
.1664 |
.1700 |
.1736 |
.1772 |
.1808 |
.1844 |
.1879 |
0.5 |
.1915 |
.1950 |
.1985 |
.2019 |
.2054 |
.2088 |
.2123 |
.2157 |
.2190 |
.2224 |
|
|
|
|
|
|
|
|
|
|
|
0.6 |
.2257 |
.2291 |
.2324 |
.2357 |
.2389 |
.2422 |
.2454 |
.2486 |
2517 |
2549 |
0.7 |
.2580 |
.2611 |
.2642 |
.2673 |
.2704 |
.2734 |
.2764 |
.2794 |
.2823 |
.2852 |
0.8 |
.2881 |
.2910 |
.2939 |
.2967 |
.2995 |
.3023 |
.3051 |
.3078 |
.3106 |
.3133 |
0.9 |
.3159 |
.3186 |
.3212 |
.3238 |
.3264 |
.3289 |
.3315 |
.3340 |
.3365 |
.3389 |
1.0 |
.3413 |
.3438 |
.3461 |
.3485 |
.3508 |
.3531 |
.3554 |
.3577 |
.3599 |
.3621 |
|
|
|
|
|
|
|
|
|
|
|
1.1 |
.3643 |
.3665 |
.3686 |
.3708 |
.3729 |
.3749 |
.3770 |
.3790 |
.3810 |
.3830 |
1.2 |
.3849 |
.3869 |
.3888 |
.3907 |
.3925 |
.3944 |
.3962 |
.3980 |
.3997 |
.4015 |
1.3 |
.4032 |
.4049 |
.4066 |
.4082 |
.4099 |
.4115 |
.4131 |
.4147 |
.4162 |
.4177 |
1.4 |
.4192 |
.4207 |
.4222 |
.4236 |
.4251 |
.4265 |
.4279 |
.4292 |
.4306 |
.4319 |
1.5 |
.4332 |
.4345 |
.4357 |
.4370 |
.4382 |
.4394 |
.4406 |
.4418 |
.4429 |
.4441 |
|
|
|
|
|
|
|
|
|
|
|
1.6 |
.4452 |
.4463 |
.4474 |
.4484 |
.4495 |
.4505 |
.4515 |
.4525 |
.4535 |
.4545 |
1.7 |
.4554 |
.4564 |
.4573 |
.4582 |
.4591 |
.4599 |
.4608 |
.4616 |
.4625 |
.4633 |
1.8 |
.4641 |
.4649 |
.4656 |
.4664 |
.4671 |
.4678 |
.4686 |
.4693 |
.4699 |
.4706 |
1.9 |
.4713 |
.4719 |
.4726 |
.4732 |
.4738 |
.4744 |
.4750 |
.4756 |
4761 |
.4767 |
2.0 |
.4772 |
.4778 |
.4783 |
.4788 |
.4793 |
.4798 |
.4803 |
.4808 |
.4812 |
.4817 |
|
|
|
|
|
|
|
|
|
|
|
2.1 |
.4821 |
.4826 |
.4803 |
.4834 |
.4838 |
.4842 |
.4846 |
.4850 |
.4854 |
.4857 |
2.2 |
.4861 |
.4864 |
.4868 |
.4871 |
.4875 |
.4878 |
.4881 |
.4884 |
.4887 |
.4890 |
2.3 |
.4893 |
.4896 |
.4898 |
.4901 |
.4904 |
.4906 |
.4909 |
.4911 |
.4913 |
.4916 |
2.4 |
.4918 |
.4920 |
.4922 |
.4925 |
.4927 |
.4929 |
.4931 |
.4932 |
.4934 |
.4936 |
2.5 |
.4938 |
.4940 |
.4941 |
.4943 |
.4945 |
.4946 |
.4948 |
.4949 |
.4951 |
.4952 |
|
|
|
|
|
|
|
|
|
|
|
2.6 |
.4953 |
.4955 |
.4956 |
.4957 |
.4959 |
.4960 |
.4961 |
.4962 |
.4963 |
.4964 |
2.7 |
.4965 |
.4966 |
.4967 |
.4968 |
.4969 |
.4970 |
.4971 |
.4972 |
.4973 |
.4974 |
2.8 |
.4974 |
.4975 |
.4976 |
.4977 |
.4977 |
.4978 |
.4979 |
.4979 |
.4980 |
.4981 |
2.9 |
.4981 |
.4982 |
.4982 |
.4983 |
.4984 |
.4984 |
.4985 |
.4985 |
4986 |
.4986 |
3.0 |
.4987 |
.4987 |
.4987 |
.4988 |
.4988 |
.4989 |
.4989 |
.4989 |
.4990 |
.4990 |
Answer
to Self-Test on Estimating the Population Average from a Sample
To find the Standard Error we divide
the Standard Deviation of the Sample (2.01 hr) by the square root of the number
in the sample. The square root of 50 is
7.07. Therefore the Standard Error is
2.01 hr divided by 7.07 or .28 hr.
Therefore about the average time
elementary school children in School District 68 spend on organized recrational
exercise out of school, I can make the following conclusions:
I am 68 percent certain that the
average time falls between (2.46 + .28) and (2.46 - .28) or between 2.74 hr and
2.18 hr. I am 95 percent certain that
the average time falls between 3.02 hr and 1.9 hr. And I am 99 percent certain that the average time falls between
3.3 hr and 1.62 hr.
Notice here that the more confident I
wish to be, the wider the range of values I have to accept.
Answer
to the Self-Test on Confidence Levels (Section Q)
1.
The Standard
Error of the sample is the Standard Deviation divided by the square root of the
number in the sample, that is, 7.6 divided by the square root of 50, or 7.6
divided by 7.07, or 1.09 beats per minute.
Thus, I can conclude the following about the population of 1000
athletes: I am 68 percent certain (or the probability is .68) that the average
pulse rate is between (79.1 + 1.09) and (79.1 - 1.09) or between 80.19 and
78.01 beats per minute. I am 95 percent
certain (or the probability is .95) that the average pulse rate is between
81.28 and 76.92 beats per minute. And I
am 99 percent certain (or the probability is .99) that the average pulse rate
is between 82.37 and 75.83 beats per minute.
2.
The
Standard Error of the sample is the Standard Deviation (8000) divided by the
square root of the number in the sample (1600) or 8000 divided by 40, or 200
dollars. Thus, I can be 95 percent
certain that the average family income is between $51,700 and 50,900.
(1) A famous well known example of the sort of bias which
can occur in non-random sampling is Shere Hite’s book Women and Love. The author
mailed out 100,000 questionnaires to women’s organizations (a Haphazard or
Opportunity sample). Only 4.5 percent
were filled out and returned, so that the results were biased in favour of
women who belong to such organizations and who were sufficiently motivated to
respond. [Back
to Text]
(2) This
very important property is true whether or not the population from which the
samples are taken is normally distributed or not. The frequency distribution of the Sample Means from any
population will always follow a normal distribution. [Back to Text]
[Back to johnstonia Home
Page]
[1] A famous well known example of the sort of bias which can occur in non-random sampling is Shere Hite’s book Women and Love. The author mailed out 100,000 questionnaires to women’s organizations (a Haphazard or Opportunity sample). Only 4.5 percent were filled out and returned, so that the results were biased in favour of women who belong to such organizations and who were sufficiently motivated to respond.