I’ll Give You a Definite Maybe

An Introductory Handbook for Probability, Statistics, and Excel

[This handbook has been prepared by Ian Johnston of Malaspina University-College, Nanaimo, BC, for students in Liberal Studies.  The text is in the public domain, released May 2000]

Section Six: Samples and Populations

A. Introduction: Samples and Populations

In most of the examples we have been dealing with so far, our statistical analysis has usually involved a complete set of information about all the items we wished to study (e.g., all the students in a class).  In other words we have been dealing with populations (i.e., we had data for all the items in which were interested).

When our analysis is based upon an entire population (i.e., all the members of the group under study, each of whom is taken into account in the analysis), we are interested in data on each member of the group, and we do not extend our conclusions beyond that particular group.

In most statistical studies, however, the population we are interested in is far too large for us to measure each and every one of the members of it (e.g., all students at Malaspina University-College, all Canadian voters, all cars made in Detroit, all children in Nanaimo, and so on).  In such cases, we confine our analysis to a relatively small selection taken from the total population.  Such a selection is called a sample.

The purpose of dealing with a sample is straightforward: it enables us to study a large population and to learn things about it, so that we can draw important inferences, without having to go to the trouble of collecting data from every member of the entire population.

A very important part of statistics is the study of the sorts of conclusions we can make about an entire population on the basis of a relatively small sample.  For instance, if we have measured data on, say, voting patterns for 1000 people, are we entitled to make any conclusions based on that information about the voting patterns of the population in general?  And if so, what are the limits to the sorts of generalizations we can make?  What are we not entitled to conclude about the wider population?  How does my ability to make conclusions about the wider population change as the size of my sample increases?  How do I test claims made about entire populations on the basis of an analysis of a single sample?  And so on.

In other words, to use statistical information properly we need to understand something about the relationship between the information we have collected from a representative group of the entire population (the sample) and the total population itself, from which the sample is taken and for which we can never conduct complete measurements, since obtaining the information would be too time consuming, if not impossible.

B. Sampling Methods

Obviously, one important point in working with samples is the selection of a truly representative sample—a collection of individual items for observation which accurately represents the larger population.  It is beyond the scope of this module to explore the various methods statisticians use to make sure their sampling techniques do not introduce major errors into the calculations (a complex subject); however, it is appropriate to say a few things about the main methods.

There are a number of common procedures for selecting a sample, some simple and some more complicated.  Haphazard (or Opportunity) sampling, for example, relies upon the convenience of the sampler or the self-selection of the sample (e.g., volunteers who respond to a mailed out questionnaire or who are picked at random from a crowd) (1).  Quota Sampling sets quotas for various categories in the sample (so many men, so many women, so many over age 45, so many under age 45, and so on), so as to achieve a representation of the major divisions in the larger population.  Random Sampling picks members of the sample according to a random process, thus giving each member of the large population an equal opportunity of being selected.

C. Random Sampling

In general, of the methods mentioned above, Random Sampling is the preferred method, with the least built in bias.[1]  However, in order for random sampling to be possible, there must be available a list of everyone in the population to be sampled (for reasons explained below).  Where that requirement cannot be conveniently met (e.g., in a survey of all Canadians or all residents of BC), then the simple method of random sampling outlined below is not appropriate.

In a simple random sample, with a list of the entire population under investigation, the sampler then assigns a number to each item in the list and selects the sample by consulting a random number generator or a table of random numbers.  The process works as outlined below.

Suppose we wish to investigate all the workers in a particular factory, but we do not have the time or the resources to deal with them all.  So we decide to work with a sample of 30 workers out of a total factory population of 450.  We begin by assigning each member of the total population a number.  Since the largest number we require (450) has three digits, we give everyone a three-digit number, starting with 001, 002, 003, 004, and so on, up to 450.

We then consult a list of random numbers.  The list of random numbers looks like this (a portion of a page).

5551  5412  3765  4953  0455  9710  2164  8634
7361  5427  2956  7405  3914  1084  4300  1221
2605  0815  8612  8995  7925  1856  3096  6139
3666  5516  9467  2205  2370  0047  1760  7761

To begin the selection we blindly point to some number in the table (say, for example, 2956, the figure in bold above).  Then, reading across the table we take three-digit numbers.  If they fit someone in the general population, that person is selected; if they do not, then we move on to the next three-digit number.

So, starting with 2956, the first-three digit number is 295.  Since we have someone in our list of 450 with that number, we select that person.  The next three-digit number (continuing to read to the right) is 674.  That does not fit (since we have only 450 in the total population we are studying), so we move on.  The next three-digit number is 053.  This number fits, so the person with this number is selected for the sample.

We continue this process, moving through the table of random numbers, until we have the number we need for the sample.  To complete the selection of the sample of 30, we would obviously need a bigger list of random numbers than the partial list given above.

There is less bias in this selection because everyone in the total population has an equal chance of being included in the sample.  We have made no attempt to organize the population into different sections or proportions.  If we were working on sampling merchandise or samples for experiment, we would proceed in the same way, first assigning a number to each item in the larger population and then consulting a list of random numbers to select the items for our sample.

For some opinion polls, a variation of this method of random sampling can be useful: random digit dialling for a telephone survey (although such a method is biased in favour of those with telephones or more than one telephone number or who spend a lot of time at home).

Another important factor in any sample is the size.  The most appropriate size will depend upon the accuracy we wish and upon the size of the general population we are sampling.  We shall be dealing with this question later in this section.

D. The Sample Mean

Let us assume we have properly identified our sample from the large population we are interested in.  On the basis of the measurements I have made of the sample I have collected, I have a group of numbers.  Thus, I can calculate the mean of this sample (remember that the mean is the arithmetical average) in the usual way (adding up all the values and dividing by the total number in the sample or by entering the measurements on an Excel worksheet and getting Excel to make the calculation for me).  This figure is called the Sample Mean.

Suppose, now, I conduct another similar sample of the same general population (not including in the second sample anyone who was part of the first sample).  I will obtain a second set of measurements from my new sample, and I can calculate the mean of that collection of numbers.  Now I have a second Sample Mean.  If I have done my sampling without major bias, the second Sample Mean should be close to the first Sample Mean (since I am sampling the same general population).  But the value for the second Sample Mean will almost certainly be somewhat different from the first (even if the difference is quite small).

For example, suppose I am investigating the body length of an adult male lizard.  I collect my first sample of, say, thirty lizards, measure the body length, enter the data on a worksheet, and obtain a mean value for that sample  Suppose this value is 6.56 inches.  I then collect a second sample for the same animal, measure the body lengths, enter the data on a worksheet, and obtain a mean value for that sample of 6.43 inches.  These two figures are both sample means for the same general population (all the adult male lizards): Sample Mean 1 and Sample Mean 2.

Suppose I continue in this fashion, making a number of different samples and calculating the mean of each.  Gradually I will collect a list of Sample Means, one for each of the samples I have collected.  I will create a list of numbers, each representing a separate Sample Mean.  These will probably be quite close to each other in value, but there will be differences.  In other words, the value of the Sample Means will be distributed; we can think of the values we obtain for the different Sample Means has having a frequency distribution, just like any other list of numbers.

Make sure you understand this point.  The collection of means from different samples will provide a list of numbers which, like any such list (of the sort we have been examining) will have a frequency distribution (with a mean value, a median, a variation, and a standard deviation).

E. An Example of a Collection of Sample Means (S-Means)

In order to reinforce this last point, let us continue to work through our example with the adult male lizards.  I continue my sampling, measuring, and calculating, and produce the following results (let us assume for the sake of argument that each sample contains 30 male lizards):

Sample 1: S-Mean 1: 6.56 in
Sample 2: S-Mean 2: 6.43 in
Sample 3: S-Mean 3: 6.48 in
Sample 4: S-Mean 4: 6.51 in
Sample 5: S-Mean 5: 6.40 in
Sample 6: S-Mean 6: 6.52 in
Sample 7: S-Mean 7: 6.54 in
Sample 8: S-Mean 8: 6.47 in
Sample 9: S-Mean 9: 6.49 in
Sample 10: S-Mean 10: 6.53 in.

Remember that each of these S-means is the average for a sample of 30 adult male lizards.  This list of numbers also has a mean value (6.493 in) and a Standard Deviation (0.0499 in)  These, you will recall, we can have Excel calculate for us (just as for any list of numbers).

You will remember from the previous chapter that the standard deviation is a measure of the distribution of the frequencies in the probable results.  A small standard deviation (as in the above example) means that most of the values will lie close to the overall mean of the numbers in the list.

F. The Mean of the S-Means

For reasons which lie outside the scope of this report, the values of the S-Means will have a frequency distribution represented by the normal curve (that is, the probabilities that particular S-means will have certain values will follow the pattern of a normal distribution, which we discussed in the previous section).  Thus, the various probabilistic characteristics of the normal curve, which we have studied in an earlier module, will apply to the collection of samples we have made (2).  Please make sure you understand this very important point; everything we do in the rest of this chapter depends upon it.

We also know from mathematical studies that in such a normal distribution of all the S-means for a particular population, the mean value (the mid point, the highest part of the normal curve of S-means) will be the same as the average for the entire population.  We cannot measure all the population and then calculate the mean, but we can theoretically establish that if we did so, the mean for the entire population would be same as the average of all the means of all the samples of that population we could collect (since if our sampling was complete we would have measured each member of the population).

This point is obvious enough if you think about it.  If I kept collecting samples like the 10 listed above, eventually I would have sampled the entire population (assuming no two lizards were in more than one sample).  The average of all my samples would then be the average of the entire population, because all my samples would be the same as the entire population.

Any particular sample we take of 30 adult male lizards might be truly representative of the total population (in which case the mean of the sample would coincide with the mean for the entire population), or it might misrepresent somewhat the population under study (that is, the sample mean may be displaced from the population mean).  We have no way of directly knowing that unless we can measure every member of the population.

The more samples we collect and the larger those samples, the closer the average height obtained by averaging the means of all the samples will be to the average height for the entire population.  If I kept sampling until I had sampled every adult male lizard, then the average of all the sample means would be the same as the average for the total population.

G. The Value of a Single Sample

Now, in practice we usually do not have time (or money) to carry out enough measurements of separate samples to calculate the mean of all the Sample Means (we do not want to carry out a very large number of samples, find the average of each sample, and then, treat those averages as a distribution, calculating the mean of the S-Means and the standard deviation, as we theorized above).  Besides, in many cases (as in the male lizard example) we may never know whether we have sampled every single member of the population.

In most cases, we are interested in making some judgement about the entire population on the basis of a single sample (of, say, 50).  So what is of immediate interest is this question: If I use the S-Mean from a single sample of observations to make an estimate about the mean for the entire population, how likely am I to make a serious mistake?

Note the importance of this question.  It poses an vital statistical enquiry: On the basis of a single sample, what am I entitled to conclude about the entire population?  For example, if I have randomly selected adult male lizards for a measurement of their body length, what legitimate conclusions can I draw from this small sample about all the adult male lizards?  How certain can I be of any such inferences?

It turns out that the error in basing a conclusion about the entire population on a small sample is likely to be quite small.  This vital conclusion follows from the important fact that the distribution of all possible Sample Means is a normal curve and that the normal curve has important characteristics (as we have seen in the previous section).

For we know that in any normal curve, the further any value falls from the mean, the less likely it is to occur.  You will recall that there is approximately a .68 probability that any value will fall within 1 Standard Deviation on both sides of the mean, and approximately a .95 probability that any value will fall within 2 Standard Deviations on both sides of the mean.  Thus, from the properties of all normal distributions, we know that there is only a .05 probability that any value will lie more than 2 Standard Deviations from the mean.  Hence, the more a sample is a poor representative of the entire population, the less likely it is to occur.

Since the Sample Means are normally distributed around the value of the mean of the entire population, the further the mean of any one sample is from this mean of the entire population, the less likely it is to occur.  As one moves from the mid-point of the distribution in either direction, the number of samples which produce an Sample Mean much smaller or larger than the mean of the population gets smaller and smaller (since the means of those samples would have to fit into the extremes of the normal curve).

H. Standard Error

What this implies is that if we could ascertain the Standard Deviation for the distribution of sample means, we would know the probabilities that any particular sample mean would be close to or far away from the mean for the entire population.

Remember that we are conceptualizing a normal distribution curve which represents all the frequencies of all the mean values for all the samples we might make of a large population.  We have ascertained that the mean value of such a curve will be the same as the mean value for the entire population we are studying.  If we could find out the Standard Deviation of this normal curve, then we would know how the various values of the sample means are distributed in relation to the mean of the normal curve.

The Standard Deviation of this normal distribution of Sample Means is called the Standard Error or the Standard Error of the Means.  If we had a way of ascertaining its value, then we could describe the probabilities of the entire curve, just as we can for any normally distributed value.

I. Standard Deviation and Standard Error

Make sure you understand the difference between the terms Standard Error and Standard Deviation.  The standard error is the name of a very particular standard deviation, the standard deviation of the means of all the samples we could take of a particular population (e.g., the population of adult male lizard in the example we have been considering).

To clarify this issue, if it still needs clarification, let me list here once more some summary points:

1.      When we collect a sample or deal with the entire population in our measurements, we can list all the numerical results and then calculate the mean and the standard deviation of that list by the methods we have already discussed (usually getting Excel’s Descriptive Statistics function do the work for us).

2.      When we are dealing with a very large population, we will take a small sample picked so as to avoid bias.  The larger total population has a mean and a standard deviation, but we do not have the time or the resources to measure all the cases (even if we could locate them), and therefore we do not know what these figures are directly.  The only direct observations we have are from the sample we have taken.

3.      However, the Standard Error, which we are able to calculate from our sample (see below), will give us the Standard Deviation of all the different averages from all the samples we could make of the general population (or a figure close enough to the Standard Deviation of the entire population to use as a substitute for it).

4.      We use the term Standard Deviation to remind ourselves that the figure we are dealing with refers to a sample or to an entire population.  We use the term Standard Error to remind ourselves that we are dealing with the distribution of the averages from all possible samples (even though we have undertaken to measure only a single sample).

J. Calculating the Standard Error

In our discussion above, we outlined one method for calculating the Standard Error.  That was to collect all the possible samples of a population, calculate the mean, and then calculate the Standard Deviation of the frequency distribution of Sample Means.  Theoretically, that is fine, but in practice, we simply cannot carry out sampling until we have included the entire population of our study.

Fortunately, there is another way of calculating the Standard Error.  Mathematicians have demonstrated that the Standard Error (which tells us the Standard Distribution in the normal curve of all the possible Sample Means) can be derived from a single sample (or a value so close to the Standard Distribution of that curve that for practical purposes we can treat it as the Standard Error).  The value is equal to the Standard Deviation of the sample divided by the square root of the number of items in the sample.

Now, this information, as we shall see, turns out to be a very powerful piece of information.  From a single sample, we can calculate the standard distribution of the normal curve depicting the means of all possible samples.  Make sure you understand this point; much of what we do from here on depends upon grasping this idea that from one relatively small sample of a large population we can draw conclusions about the distribution of the averages from all possible samples of that same population.

Minimum Sample Size

 

For the mathematics we have been discussing to work effectively, the sample we select must not be too small.  The minimum permissible size is 30 observations.  And remember that when we are dealing with samples (as opposed to total populations), to derive the standard deviation of the sample, we divide the sum of the squared differences between the mean and the observation by one less than the number in the sample.  If this is a puzzle to you, do not worry about it, since Excel does the calculations anyway.  But this practice of dividing by one less than the number in the sample is the reason why Excel’s calculation of the standard deviation of a list of numbers is always slightly higher than the result produced by a manual working out of the result which uses all the numbers in the sample.  Excel treats every list of numbers as a sample not as the total population. 

In calculating the standard error, however, we do not follow the same principle of using one less than the number in the sample.  As the formula above indicates, we divide the standard deviation by the square root of the total number of items in the sample.

As you may have already observed, Excel calculates the standard error for any list of data and includes the figure in the Descriptive Statistics box.

K. A Simple Application of the Sample Mean and Standard Error

The fact that we can calculate the standard error of the means from a single sample of populations turns out to be extraordinarily useful.  For on the basis of a single sample (provided it is more than 30 and free from bias), we can derive the standard deviation of the normal curve representing the means of all possible samples.  And this, in turn, enables us to calculate the probability that our sample mean is close to or far away from the mean of all the sample means (which is equivalent to the mean of the total population).

For instance, suppose, as a consumer advocate, I am interested in examining the quality of a particular brand of light bulbs, to see if they are up to the manufacturer’s guarantee.  Well, first I collect a random sample of, say, 100 bulbs.  I then test that sample, measuring the number of hours the bulb functions before burning out.  This test yields a list of one hundred results (one for each member of the sample).  From these one hundred numbers, I calculate (or Excel calculates for me) the mean life of the bulbs in the sample and the standard deviation of the results listed from the test of the sample.

Mean life of the light bulbs in the sample: 300 hr
Standard deviation of the sample: 20 hr

From these two figures I can calculate the standard error: the standard deviation of the sample divided by the square root of the number of items in the sample or, in this case, 20 divided by the square root of 100, that is by10, for a result of 2 hr.

We know that the average of all the means of all the samples is the same as the average for the entire population, and we know that the standard deviation in the normal curve representing the values for all the different sample means is equal to the standard error (2 hr).

Therefore, on the basis of my single sample, I can conclude that there is a .68 probability that the average for the entire population of all the light bulbs lies within 1 standard error of the mean of my sample, that is, between (300 - 2) and (300 + 2), or between 298 hr and 302 hr.  There is a .95 probability that the mean of the total population of light bulbs (that is, the average life of all the light bulbs made by this manufacturer) lies between the sample mean and 2 standard errors, or between (300 - 4) and (300 + 4), that is, between 296 and 304 hr.

Notice the nature of this conclusion.  On the basis of a relatively small sample of a very large population, we can establish a conclusion about that larger population.  The conclusion is in the form of a series of probability statements, each of which defines a range of possible values.  This form of conclusion and its uses will become clearer in some of the examples and exercises which follow.

L. The Evaluative Use of Standard Error

What does all this add up to?  Well, here’s a hypothetical practical illustration.  Suppose I wish to learn about the mathematical capabilities of all the Grade XII students in Nanaimo.  I have neither the money nor the time to arrange to have them all tested.  Thus, I organize a random sample of, say, 100 students and give them a special test on their mathematical skills.  I find that the average score in the sample is 65, with a standard deviation of 16.74.  What can I conclude on the basis of this information about the average capabilities in mathematics for all Grade XII student in Nanaimo?

Well, I begin by calculating the standard error (or reading it off from the Descriptive Statistics table generated by Excel, once I have entered the observational data onto a worksheet).  In this case the standard error is 1.67 marks.

Now the average (mean) score in my sample was 65.  And I know that if I analyzed many similar samples, the averages of the samples would be normally distributed in a curve where the standard deviation is equal to the standard error calculated above (1.67 marks).

Thus, if the average in my sample was 65, I can state that there is a .68 probability that it falls within 1 standard error of the mean of the total population of all the Nanaimo Grade XII students (either higher or lower).  Thus I am 68 percent certain that the mean score for all the students in Nanaimo on this mathematics test is between (65 - 1.67) and (65 + 1.67), that is, between 63.33 and 66.67.

If I want to be more certain than this, I can state that there is a probability of .95 (or that I am 95 percent certain) that the average for the entire Nanaimo Grade XII population on this mathematics test will fall between the sample mean and 2 standard errors, that is, between [65 - (2 x 1.67)] and [65 + (2 x 1.67)] or between 61.66 and 68.34.

If I want to be even more confident, I can state with .99 probability (or 99 percent certainty) that the average for the entire Nanaimo Grade XII population will be with 3 standard errors of the sample mean.

M. Self-Test on Estimating the Population Average from a Sample

You are interested in finding out about the hours elementary school children in School District 68 spend in organized recreational exercise outside of school.  You select a random sample of 50 elementary school students, obtain data about organized recreational exercise for each of them, enter the data on an Excel worksheet, and obtain the following result.

Mean time spent in organized recreational exercise (per week): 2.46 hr
Standard deviation in the sample: 2.01 hr

Use the method we have already gone through with the light bulbs and the Grade XII students to produce a conclusion about the average hours of organized recreational exercise for all elementary students in School District 68.  State the conclusion with .68 probability, with .95 probability, and with .99 probability (or with 68 percent certainty, with 95 percent certainty, and with 99 percent certainty).

For an answer to this self-test see the end of this section of the module.

N. Confidence Levels

We have already briefly discussed the nature of the conclusion we have been drawing from these statements about a total population based on what we measure in a relatively small sample.  These inferences consist of a range of values and a mathematical figure of probability (e.g., .68 probability, a .95 probability).

Statements like this illustrate what is called a confidence level, a conclusion which offers a range of values and a statement of probability: we conclude that there is a p probability that the mean of the total population falls between figures x and y.  This might also be stated negatively: there is a certain probability that the average score for the total population does not fall between x and y.

The figure for the probability (p) is determined by the distance the limits of the range are from the mean of the sample (measured in standard errors or, to use language we introduced in an earlier section, measured in the z-score).  As we saw in the last chapter, we can have 68 percent confidence (or p = .68) that any value in a normal distribution will fall within one standard deviation of the mean (i.e., have a z-score of between -1 and +1).  We can have a 95 percent confidence (p = .95) that any value in a normal distribution will fall within 2 standard deviations of the mean, that is, between a z-score of -2 and a z-score of +2.  And we can have a 99 percent confidence (p = .99) that any value in a normal distribution will fall between a z-score of -3 and a z-score of +3.

Notice that, as we would expect, I can increase the confidence of my conclusions by widening the range within which the value will fall.  The more certain I wish to be, the wider the range of values.  If I want to narrow the range of values in my conclusion, then I lower the confidence level.

 

Understanding Poll Results

 

This point about confidence levels is important in understanding the way in which the media publish poll results.  For example, when a newscaster says that a recent poll has just revealed that 42 percent of the electorate would vote Liberal if the election were held tomorrow, that remark will usually be accompanied by a qualification like the following: “These results are considered accurate within 2.5 percentage points nineteen times out of twenty.”  What this qualification means is that the pollsters are 95 percent confident that (i.e., sure that in 19 cases out of 20) if the election were held tomorrow, the Liberals would get 42 plus or minus 2.5 percent of the vote (i.e., between 39.5 and 44.5 percent of the vote).  On the basis of their relatively small sample, they are establishing a confidence level and a range within two standard errors.

O. More Curious Observations

On the basis of what we have learned so far about making conclusions about a large population on the basis of a single sample of more than 30, we can notice some interesting further details about this very useful procedure made possible by the calculation of the standard error.

First, the size of the confidence interval depends upon the size of the standard error (which is a measure of the standard deviation in the distribution of sample means).  Thus, if we can lessen the standard error, we can diminish the range of values in each confidence level (and thus provide more precise conclusions).

You may recall that we calculate the standard error from the sample, taking the standard deviation of that sample and dividing the figure by the square root of the number of observations in the sample.  Since we calculate the standard error by dividing by the square root of the number in the sample, increasing the number in the sample may have only a small effect on decreasing the size of the standard error.

So a question I might like to consider is the following: in order to lessen the size of the standard error, how much would I have to increase the size of my sample?  Or, alternatively, will increasing the size of my sample enable me to narrow the range of the conclusion?

The answer, it turns out for reasons explained below, is that increasing the sample size can indeed narrow the range of results, but that the increase in the sample size has to be very large—so large, in fact, that it may prove to be too costly and time consuming to implement.

For example, if we were dealing with a sample of 100 students in a study of their skills on a test and if the standard deviation of the list of results in our sample was, say, 16 marks, then we would calculate the standard error by dividing the standard deviation by the square root of the number in the sample, that is, 16 divided by the square root of 100, or 16 divided by 10, or 1.6.  Thus, in estimating the confidence intervals for the entire population of students, we would be using the figure of 1.6 marks as the basis of our intervals to calculate the ranges for .68, .95, and .99 probability.

Now, if we wanted a narrower range, in order to have a more precise result, we would like to reduce the standard error (thus having a smaller interval).  One way we might like to do this is to increase the size of the sample.  If we increase its size, then we increase the size of its square root and therefore diminish the standard error (which is produced by dividing the standard deviation by the square root of the number in the sample).

However, since we are dealing with the square root of the number in the sample, we will have to increase the sample size considerably.  For instance, in the example above we dealt with a sample of 100 students and achieved a standard error of 1.6 by dividing the standard deviation of the sample, 16, by the square root of 100, or 10.  If we wanted to reduce the standard error by half, we would have to divide 16 by 20.  And to be able to do this we would have to sample 400 students (the square root of 400 is 20).

What this means, in effect, is that in many cases it is not worth the effort to increase the sample size in order to achieve more precise results.  Since selecting the sample information is the really time consuming part of the analysis, it is generally more efficient to keep the sample relatively small (provided it is over 30) and to concentrate on making it the best sample we can achieve (i.e., least liable to bias).

This is not to say, of course, that the size the sample is irrelevant.  Obviously, that is not the case.  Increasing the size of the sample does reduce the standard error and thus makes the conclusions more precise.  In fact, mathematicians have drawn up guidelines as to the most appropriate sizes for samples relative to the size of the larger population they are intended to represent and to the level of accuracy in the sampling revealed. 

In this module, as mentioned before, we are not dealing with the complex rules for proper samplying strategies (other than the few remarks previously in this section).  So we are not concerning ourselves with the problems of sampling error.  In the various examples we work through, we shall assume that the sample is a good one and will take into account the sampling error (as we should if we were being statistically diligent).

However, for interest only, you might like to see a list of the recommended sample sizes for different populations.  The table below, from a book on surveys, indicates some recommended sample sizes:

 

 

Recommended Sample Sizes for Different Populations and Permissible Sampling Error

Sampling Error Allowed

Population Size

 

500

1000

10,000

100,000

1 million

±10

83

91

99

100

100

±5

222

286

385

398

400

±4

250

385

588

621

625

±3

250

500

1000

1099

1111

±2

250

500

2000

2439

2500

±1

250

500

5000

9091

10,000

P. Working Through An Example

Let us review one more time the steps in making confidence generalizations about an entire population from a single sample.

1.      First we select a sampling strategy (normally using random sampling when the total population is suitable for this process), select our sample (making sure we have at least 30 separate observations in it), and collect the information.

2.      Then, we enter the data on a spreadsheet (like Excel) and apply the Descriptive Statistics tool in order to ascertain the mean and the standard error of the sample.

3.      Finally, we make our conclusions at different confidence levels: 68 percent for a range within 1 standard error of the mean of our sample (above and below), 95 percent for a range within 2 standard errors of the mean of the sample, and 99 percent for a range within 3 standard errors of mean of the sample.

Suppose I wish to know (for purposes of comparison) the average score for all first-year university students in British Columbia on a standard intelligence quotient (IQ) test.  Going through the steps outlined above, I complete steps 1 and 2 for a sample of 100 students.  The mean score of the sample is 112; the standard deviation is 12 points.

From these two figures I can compute the standard error (the standard deviation divided by the square root of the number in the sample): that comes out to 12 divided by 10 or 1.2 points.

Now, I can make my conclusion at different confidence levels, as follows:

1.      I am 68 percent certain that the average IQ score on this test for all first-year university students in BC is within a range 1 standard error on either side of the mean of my sample, that is, between 110.8 and 113.2.

2.      I am 95 percent certain that the average IQ score on this test for all first-year students in BC is within a range 2 standard errors on either side of the mean of my sample, that is, between 109.6 and 114.4.

3.      I am 99 percent certain that the average IQ score on this test for all first-year students in BC is with a range 3 standard errors on either side of the mean for my sample, that is, between 108.4 and 115.6.

Q. Self-Test on Confidence Levels

Using the method outlined immediately above, try the two following problems.

1.      We want to know the average pulse rate in a population of 1000 track athletes.  We sample the pulse rates of 50 athletes taken at random and calculate the mean pulse rate of the sample to be 79.1 beats per minute, with a standard deviation of 7.6 beats per minute.  What can we conclude about the mean value (in beats per minute) for the entire population of athletes?  State your conclusion at three different confidence intervals (at .68, .95, and .99 probability).

2.      A sample study of the family incomes in Canada revealed the following: sample size, 1600, mean family income of the sample--$51,300; standard deviation of the sample--$8000.  What can you infer about the mean family income for the entire population at a confidence level of 95 percent?

R. Using a Table to Read for any Level of Confidence

Up to this point we have only dealt with three confidence levels: 68 percent (or .68 probability), 95 percent (or .95 probability), and 99 percent (or .99 probability).  We used these because they correspond to the ranges defined by 1, 2, and 3 standard deviations away from the mean (something we learned in the previous chapter).

In practice, however, we are not limited to just these three figures.  We can establish any level of confidence we want.  But we will need to know how many standard deviations is represented by the particular level we choose, so that we can define the range properly.

If this puzzles you, let us go through the point step by step, as follows:

1.      A normal curve (the shape of a normal distribution) indicates the relative frequencies of all the values in the population we are studying.  Thus, we can imagine the area under the top line of the curve as representing the entire population.

2.      If we think of the population under the curve as an area, then we can see clearly that in a normal distribution the total population is divided in half by the mean.  There is thus a .5 probability in any normally distributed population that a particular value will fall in the area to the right of the mean (i.e., in the upper values), and a .5 probability that any particular member of the population will fall to the left of the mean (i.e., in the lower half of the values).

3.      In the previous chapter, we discussed how in the normally distributed curve, the area under the curve is always divided in the same way by units of standard deviation: 68 percent of the total population falls within 1 standard deviation of the mean (34 percent on either side); 95 percent of the population falls within 2 standard deviations of the mean (47.5 on either side); and 99 percent of the total population falls within 3 standard deviations of the mean (49.5 on either side).

4.      But clearly we are not confined to just 1, 2, or 3 standard deviations.  There are all sorts of possibilities in between them (e.g., 1.2 standard deviations, 0.7 standard deviations, and so on).  And each of these will define a different area under the normal curve.  And each area, so defined, will include its own percentage of the total population (and thus establish its own confidence level).

5.      Now, the mathematics of calculating areas under the normal curve for all distances away from the mean is complex and laborious.  Fortunately, however, mathematicians have created tables for us, using which we can simply read off particular distances from the mean and their corresponding areas.  Thus, we can easily determine what level of confidence we want and find the distance appropriate to it.

On the next page is an example of such a table.  It indicates in the extreme left hand column (in bold) the distance away from the mean in standard deviation units (which is, as we mentioned before, the z-score) for one half the curve (this item is important, as we shall see).  This column ranges from 0.0 distance from the mean (i.e., the mean itself) to 3.09, just over three standard deviations away.  As we move down the column we move in increments of .1.

The other columns indicate the next decimal place for that particular z-score.  Thus, in the first line, the area under normal curve at a z-score of 0.00 is .0000.  This means that when we are exactly on the mean, the area under the curve is 0.  If we move to the next column (to the right), the z-score is 0.01, and the corresponding area under the curve between the mean and this distance away from it is 0.0040.  Since we are dealing with only one half the curve, the total area under the curve defined by a z-score of 0.01 on both sides of the curve is twice the given value, 0.008 (or 0.8 percent of the total area under the curve is within a z-score of 0.01 on either side of the mean).

If you check now the area figure for a z-score of 1.00 you will notice that it reads .3413.  This means that of all the scores under the curve 34.13 percent of them will fall between the mean and a z-score of 1 on one side of the curve.  If we want to include all the scores within 1 standard deviation of the mean on both sides, then we would double this figure (i.e., to 68.26 percent).  We have been using the figure 68 percent as a convenient approximation of that value.

This table may at first look somewhat confusing, but read over the paragraphs above (consulting the table) a few times until you are familiar with what these numbers mean

Table Showing Area Under the Normal Curve at Different z-Scores
(Note that this table is for only one half of the normal curve)

z-score

.00

.01

.02

.03

.04

.05

.06

.07

.08

.09

0.0

.0000

.0040

.0080

.0120

.0160

.0199

.0239

.0279

.0319

.0359

0.1

.0398

.0438

.0478

.0517

.0557

.0596

.0636

.0675

.0714

.0753

0.2

.0793

.0832

.0871

.0910

.0948

.0987

.1026

.1064

.1103

.1141

0.3

.1179

.1217

.1255

.1293

.1331

.1368

.1406

.1443

.1480

.1517

0.4

.1554

.1591

.1628

.1664

.1700

.1736

.1772

.1808

.1844

.1879

0.5

.1915

.1950

.1985

.2019

.2054

.2088

.2123

.2157

.2190

.2224

 

 

 

 

 

 

 

 

 

 

 

0.6

.2257

.2291

.2324

.2357

.2389

.2422

.2454

.2486

2517

2549

0.7

.2580

.2611

.2642

.2673

.2704

.2734

.2764

.2794

.2823

.2852

0.8

.2881

.2910

.2939

.2967

.2995

.3023

.3051

.3078

.3106

.3133

0.9

.3159

.3186

.3212

.3238

.3264

.3289

.3315

.3340

.3365

.3389

1.0

.3413

.3438

.3461

.3485

.3508

.3531

.3554

.3577

.3599

.3621

 

 

 

 

 

 

 

 

 

 

 

1.1

.3643

.3665

.3686

.3708

.3729

.3749

.3770

.3790

.3810

.3830

1.2

.3849

.3869

.3888

.3907

.3925

.3944

.3962

.3980

.3997

.4015

1.3

.4032

.4049

.4066

.4082

.4099

.4115

.4131

.4147

.4162

.4177

1.4

.4192

.4207

.4222

.4236

.4251

.4265

.4279

.4292

.4306

.4319

1.5

.4332

.4345

.4357

.4370

.4382

.4394

.4406

.4418

.4429

.4441

 

 

 

 

 

 

 

 

 

 

 

1.6

.4452

.4463

.4474

.4484

.4495

.4505

.4515

.4525

.4535

.4545

1.7

.4554

.4564

.4573

.4582

.4591

.4599

.4608

.4616

.4625

.4633

1.8

.4641

.4649

.4656

.4664

.4671

.4678

.4686

.4693

.4699

.4706

1.9

.4713

.4719

.4726

.4732

.4738

.4744

.4750

.4756

4761

.4767

2.0

.4772

.4778

.4783

.4788

.4793

.4798

.4803

.4808

.4812

.4817

 

 

 

 

 

 

 

 

 

 

 

2.1

.4821

.4826

.4803

.4834

.4838

.4842

.4846

.4850

.4854

.4857

2.2

.4861

.4864

.4868

.4871

.4875

.4878

.4881

.4884

.4887

.4890

2.3

.4893

.4896

.4898

.4901

.4904

.4906

.4909

.4911

.4913

.4916

2.4

.4918

.4920

.4922

.4925

.4927

.4929

.4931

.4932

.4934

.4936

2.5

.4938

.4940

.4941

.4943

.4945

.4946

.4948

.4949

.4951

.4952

 

 

 

 

 

 

 

 

 

 

 

2.6

.4953

.4955

.4956

.4957

.4959

.4960

.4961

.4962

.4963

.4964

2.7

.4965

.4966

.4967

.4968

.4969

.4970

.4971

.4972

.4973

.4974

2.8

.4974

.4975

.4976

.4977

.4977

.4978

.4979

.4979

.4980

.4981

2.9

.4981

.4982

.4982

.4983

.4984

.4984

.4985

.4985

4986

.4986

3.0

.4987

.4987

.4987

.4988

.4988

.4989

.4989

.4989

.4990

.4990

S. Answers to Self-Test Sections

Answer to Self-Test on Estimating the Population Average from a Sample

To find the Standard Error we divide the Standard Deviation of the Sample (2.01 hr) by the square root of the number in the sample.  The square root of 50 is 7.07.  Therefore the Standard Error is 2.01 hr divided by 7.07 or .28 hr.

Therefore about the average time elementary school children in School District 68 spend on organized recrational exercise out of school, I can make the following conclusions:

I am 68 percent certain that the average time falls between (2.46 + .28) and (2.46 - .28) or between 2.74 hr and 2.18 hr.  I am 95 percent certain that the average time falls between 3.02 hr and 1.9 hr.  And I am 99 percent certain that the average time falls between 3.3 hr and 1.62 hr.

Notice here that the more confident I wish to be, the wider the range of values I have to accept.

Answer to the Self-Test on Confidence Levels (Section Q)

1.      The Standard Error of the sample is the Standard Deviation divided by the square root of the number in the sample, that is, 7.6 divided by the square root of 50, or 7.6 divided by 7.07, or 1.09 beats per minute.  Thus, I can conclude the following about the population of 1000 athletes: I am 68 percent certain (or the probability is .68) that the average pulse rate is between (79.1 + 1.09) and (79.1 - 1.09) or between 80.19 and 78.01 beats per minute.  I am 95 percent certain (or the probability is .95) that the average pulse rate is between 81.28 and 76.92 beats per minute.  And I am 99 percent certain (or the probability is .99) that the average pulse rate is between 82.37 and 75.83 beats per minute.

2.      The Standard Error of the sample is the Standard Deviation (8000) divided by the square root of the number in the sample (1600) or 8000 divided by 40, or 200 dollars.  Thus, I can be 95 percent certain that the average family income is between $51,700 and 50,900.


Notes to Section Six

(1) A famous well known example of the sort of bias which can occur in non-random sampling is Shere Hite’s book Women and Love.  The author mailed out 100,000 questionnaires to women’s organizations (a Haphazard or Opportunity sample).  Only 4.5 percent were filled out and returned, so that the results were biased in favour of women who belong to such organizations and who were sufficiently motivated to respond. [Back to Text] 

(2) This very important property is true whether or not the population from which the samples are taken is normally distributed or not.  The frequency distribution of the Sample Means from any population will always follow a normal distribution.  [Back to Text]


[Back to Table of Contents]

[Back to johnstonia Home Page]


 



[1] A famous well known example of the sort of bias which can occur in non-random sampling is Shere Hite’s book Women and Love.  The author mailed out 100,000 questionnaires to women’s organizations (a Haphazard or Opportunity sample).  Only 4.5 percent were filled out and returned, so that the results were biased in favour of women who belong to such organizations and who were sufficiently motivated to respond.