Christine Case, Skyline College

Common egret.
Photo©CLCase

Biology 230

Hypothesis Testing

BIOL 230 project	BIOL 230	My home page

Choosing an Appropriate Level of Significance for Hypothesis Testing

When conducting research in any of the sciences, you often need to test a hypothesis using the data you have gathered. An important part of any hypothesis test is choosing a level of significance. Quite often, major decisions are based on the results of a hypothesis test, and therefore one must be careful in choosing an appropriate value. However, before we discuss choosing a level of significance, let's briefly review exactly what a hypothesis test is, and the types of errors that can occur.

A hypothesis test is a way of using statistical data to test a theory or answer a question. In a hypothesis test, you have two hypotheses: One is called the null hypothesis, and the other is called the alternate hypothesis. The term null hypothesis is typically used for the hypothesis that expresses some kind of equality (e.g. the mean age of the male deer population is 4.2 years, or, the number of broad-tailed hummingbirds observed in August of 1998 is equal to the number observed in August 1994). The alternate hypothesis is phrased as some type of inequality (e.g. the mean age of the male deer population is greater than 4.2 years, or, the number of broad-tailed hummingbirds observed in 1998 is different than the number observed in 1994).

Of course, by the very nature of the way the hypotheses are worded, the data can only support one hypothesis. The hypothesis test concludes with the acceptance of one of the hypotheses and the rejection of the other. But what if our data leads us to a false conclusion? If that is the case, we have just made an error.

There are two types of errors associated with a hypothesis test: a Type I error and a Type II error. A Type I error occurs if we erroneously reject a true null hypothesis. A Type II error occurs when we accept a false null hypothesis. How can this happen? Well... let's take a look.

Suppose someone tells you that he is an expert dart thrower. In fact, he claims that he can hit the bullseye of a dartboard, standing 15 paces from the board, 98% of the time. So the null hypothesis is that he hits the bullseye 98% of the time, and the alternate would be that he hits the bullseye less than 98% of the time. He then walks 15 paces from a board, throws 3 darts, and all of them miss the bullseye. If this happened, we would probably reject his hypothesis, and figure he is not a very good dartsman at all. But it is possible that he was telling the truth. It could be that our sample (in this case, the 3 darts) was just a poor sample and in fact he truly is an expert dartsman. If that were the case, then our rejection of the "null" hypothesis was in error. We have just committed a Type I error.

Similarly, suppose another person makes the same claim, and then walks over and hits 3 bullseyes in a row. We would accept this persons claim. But it could be that this person just made 3 extremely lucky shots, and in reality, is just an average dart player. If that were the case, we accepted a false null hypothesis, and committed a Type II error.

Of course, we don't want to commit either type of error. Scientists and statisticians always hope that the conclusion they have come to is the truth. Therefore, we must take steps to minimize the probability of an erroneous conclusion. This is where the level of significance comes into play.

The level of significance, denoted with the Greek letter a (alpha), is the probability of committing a Type I error. As a scientist conducting a hypothesis test, you are free to choose the level of significance. Typical values of a are 0.01 and 0.05. Since a represents the probability of a Type I error, it may seem like we should always choose 0.01. But we must keep in mind that as we decrease the chance of a Type I error, we increase the chance of a Type II error. The decision ultimately rides on the question, which type of error is worse? The answer to that is -- well, there is no one answer. It depends on your study, and the consequences of an error.

For example, suppose we were doing a study examining the effectiveness of an air filtration system. ABC Air Filtration company claims that their air filtration system removes 98% of the dust and pollen in the air. So we want to test their claim against the alternate hypothesis -- that is, against the hypothesis that their system removes less than 98% of the dust and pollen. If we were working for a consumer magazine, we might let a = 0.01. Now think about this. If a = 0.01, that means there is a 1% chance that we would reject their hypothesis even though it did in fact live up to the manufacturers claim. So in other words, it is extremely unlikely we would reject their claim if it is true (one can easily imagine that an editor of a magazine might be very concerned about rejecting a companies legitmate claim).

However, if you were doing this hypothesis test because you had hundreds of employees who were complaining of allergies, you might choose a higher level of significance (probably 0.05), since, if you are going to make an error, you would rather error on the side of protecting your employees. After all, if the study results in your rejecting the companies claim, that would most likely indicate that you will search for a better filtration system. Keep in mind, of course, that either way, the probability of either error is small. It is just a matter of your priorities.

Note that the level of significance should be chosen before you collect the data. The quantity and/or quality of the data should not influence your decision. If you did allow the data to dictate a, then you have just allowed bias to enter the study (it would be similar to laying your bet on the roulette table after the ball has landed on a number). Think about an appropriate value, choose it, and then collect and analyze your data.

In closing, it should also be noted that the level of significance is only one of the things that affects the probability of making an error. The sample size also plays a major role. One should generally work with as large a sample as is feasible. Looking at the earlier example of the dart throwers, if we had asked each person to throw the dart 100 times, the likelihood of committing an error is greatly reduced. Of course, in reality, time, finances, and other considerations often dictate the sample size. Good luck with your research.

By Greg Allen