ISED 160 NOTES ABOUT HOMEWORK, ANNOUNCEMENTS, ETC.

 

Assigned on:

BE SURE TO CLICK ON RELOAD/REFRESH ON YOUR COMPUTER OR THE CURRENT ADDITIONS TO THE PAGE MAY NOT APPEAR!

You may also not see current pages if your computer does not have an up-to-date browserÉ download a new version or use a library/lab computer.

Scroll down as new assignments are added to the old. New assignments are generally posted by 2:00 pm of the lecture day unless otherwise noted.

 

 

 

Most current work is listed first, followed by previous entries:

 

T 12/06

Test #5 will occur as scheduled on Thursday 12/08 and will consist of one page of short answer questions (from 10.1, 10.2, 10.3, 11.1 about use of the tables in accepting and rejecting hypotheses with classical and p-value methods, and forming of hypotheses and sentence writing), and one page with 2 word problems (z test, t test) to perform a complete significance test. See below notes from previous classes for examples.

 

Note: Test 5 will occur on 12/08 as scheduled and will not be a dropped grade. Students who take Test 5 as scheduled will be done with the course (no comprehensive final).

If you miss test #5, you must email me before noon Friday 12/09 and respond to my follow-up emails that day so that we will both know what to expect the following week (finals week). You will then perform a more difficult comprehensive test during finals week to take the place of test 5 only. If you do not contact me by noon Friday 12/09 or fail to show up for your new test appointment during finals week, you will receive a zero score for test 5.

This final is only offered to give those missing Test 5 a chance to avoid a zero score for that test. It will not be offered as a device to raise grades from the rest of the semester.

 

Th 12/01

Next week is our last week, so here are some examples (old and new material) to guide you in this all-important stretch run. IÕm hoping for lots of great test 5 grades!

 

ANSWERS TO TWO OF LAST HMK PROBLEMS:

(Decisions done using classical and p-value methods)

10.3 p487 #10

The standardized sample t is 1.11. From row 12 of the t-table, with all of alpha (0.10) in the right tail, look up in column 0.10 and find the critical value 1.356. Accept the null hypothesis.

Look in row 12 for the closest values to the sample t of 1.11 and see

1.083 < 1.11< 1.356 so the p value is between 0.15 and 0.10 (from the top row of the table). Then p is greater than the given alpha of 0.05 so accept the null hypothesis.

 

10.3 p487 #11

The standardized sample t is -3.11. From row 34 of the t-table, with half of alpha (0.005) in each tail, look up in column 0.005 and find the critical value -2.728. Reject the null hypothesis.

Look in row 34 for the closest values to the sample t of -3.11and see

3.002 < 3.11 < 3.348 so the p value is between 2(0.0025) and 2(0.001) (from the top row of the table and using the fact that the hypothesis is two-sided). So 0.002 < p < 0.005, p is less than the given alpha of 0.01 so reject the null hypothesis.

 

EXAMPLES of t table use and t word problems:

1. Find an estimate for the p value for a one-sided t test with 0.01 level of significance, n= 20 and sample t= 2.34, and would you choose to reject or accept the null hypothesis?

Answer: Using row 19 of the t table with tail area of 0.01 the estimate for the p value for method 2 is 0.01<p<0.02 since 2.205< 2.34<2.539. Accept the null hypothesis.

 

2. Find an estimate for  the p value for a two-sided t test with 0.05 level of significance, n=36 and sample t= 1.98, and would you choose to reject or accept the null hypothesis?

Answer: Using row 35 of the t table the estimate for the p value for method 2 is 0.05<p<0.10 since 1.690< 1.98<2.030 (areas doubled for two-sided test). Accept the null hypothesis.

 

3. Find an estimate for the p value for a one-sided t test with n=40 and sample t=3.6? Without a given level of significance, what can you say about rejecting or accepting the null hypothesis just based on your estimate of the p value?

Answer: Using row 39 of the t table the estimate for the p value for method 2 is p< 0.0005 since sample of 3.6>3.558. Reject the null hypothesis since it is smaller than any reasonable alpha.

 

4. Perform a complete t test: To find out if it seems reasonable that the local town library is lending an average of 4.2 books per patron, a random sample of 13 people was taken and yielded an average of 4.75 with std. deviation of 1.65 books. Test at the 0.10 level.

Answer: The alternate hypothesis is that m4.2, alpha is 0.10, and we compute sample t=1.20.

Classical method: Using row n-1=12 of the t table with tail areas of 0.05 the critical value is 1.782 p-value method: estimate for the p value is 0.20<p<0.30 (double tail areas) since 1.083<sample of 1.20<1.356.

Accept the null hypothesis. We have not found any evidence that the library is lending a different avg. number of books than 4.2 per person.

 

5. Perform a complete t test: A kind of alfalfa is advertised as having an average yield of 2.0 tons per acre. The contention by the FarmerÕs Association is that the true average yield for this kind of alfalfa is less than 2.0 tons per acre. The yield of alfalfa from six test plots gives an average of 1.6 tons per acre with std. deviation of 0.43. Test at the 0.05 level of significance.

Answer: The alternate hypothesis is that m<2, alpha is 0.05, and we compute sample t= -2.28.

Classical method: Using row 5 of the t table with tail area of 0.05 the critical value is 2.015

p-value method: the estimate for the p value is 0.05<p<0.10 since 1.476<sample of 2.28<2.015.

Reject the null hypothesis. We have found evidence that this kind of alfalfa actually yields less than 2 tons per acre.

 

READING for TuesdayÕs lecture:

We will take a look at section 11.1 p509-513. The matched pairs problems allow you to compare two sets of data, but are still t tests with one calculation as in 10.3. The only different part is how the hypotheses are stated. The data from one group are subtracted from the data in the matched group (before and after studies are a good matched pairs model). The null hypothesis states that there is no difference (average difference = 0) between the two groups: Ho: m = 0. The alternate hypothesis shows a difference of some kind, either m < 0, m > 0, or m is not equal to 0 depending on the phrasing of the word problem and the order in which the differences are found.

For example, suppose we wish to find out if a new fertilizer makes tomatoes give larger yields, and we give plot A the fertilizer but do not give plot B. We assume that there will be no difference between the two, so

Ho: m = 0

If we want to show that plot A tomato yields are larger, H1 depends on how the differences are computed. If we subtract the yields in order of plot A – plot B, then

H1: m > 0 because larger numbers minus smaller numbers will give a positive number (> 0).

But if we subtract the yields in order of plot B – plot A, then

H1: m < 0 because smaller numbers minus larger numbers will give a negative number (< 0).

 

Matched pairs word problem example: An agricultural field trial compares the yield of two varieties of tomatoes for commercial use. The researchers divide in half each of 11 small plots of land in different locations (half gets variety A and half gets variety B) and compare the yields in pounds per plant at each location. The 11 differences (variety A minus variety B) give an average of 0.54 and std. deviation of 0.83. Is there evidence at the 0.05 level of significance that variety A has a higher yield than variety B? (Assume differences computed by A yield minus B yield).

Answer: The null hypothesis is that m = 0 and the alternate hypothesis is that m > 0. We compute sample t=2.16. Using row n-1=10 of the t table with right tail area of 0.05 the critical value is 1.812 and the estimate for the p value is 0.025 < p < 0.05 since 1.812 < 2.16 < 2.228. Either way, we reject the null hypothesis. We have found evidence that variety A has a higher yield than variety B.

 

HOMEWORK (due Tuesday 12/06)

1. short answer use of t table: Find an estimate for the p value for a two-sided t test with 0.05 level of significance, n= 13 and sample t= 1.85, and would you choose to reject or accept the null hypothesis?

2. short answer use of t table: Find an estimate for the p value for a one-sided t test with n=26 and sample t=0.67. Without a given level of significance, what can you say about rejecting or accepting the null hypothesis just based on your estimate of the p value?

3. full significance test: 10.3 p488 #14,

4. full significance test: 10.3 p488 #16,

5. full significance test: 10.3 p489 #24b (use population mean 7, sample mean 7.01, sample std. deviation 0.0316)

 

Test #5 will occur as scheduled on Thursday 12/08 and will consist of one page of short answer questions (from 10.1, 10.2, 10.3, 11.1 about use of the tables in accepting and rejecting hypotheses with classical and p-value methods, and forming of hypotheses and sentence writing), and one page with 2 word problems (z test, t test) for which you must perform a complete significance test. A few examples of short answer questions are below.

 

Some short-answer questions for practice:

Example: Find the critical value in a one-sided z test, n = 45, sample z = -2.59, alpha 0.01?

Answer: 2.326.

Example: a. What is the critical value for a two-sided t-test with n=33 sample t – 3.02 and alpha of 0.01, , and b. do you reject or accept the null hypothesis?

Answer: a.With row 32 and upper tail of 0.005 since half of alpha goes in each tail, the critical values are + and – 2.738, so b. we would reject the null hypothesis.

Example: Estimate the p value for a one-sided z-test, n = 23, sample z= 1.15 and alpha = 0.10.

Answer: Since this is a z value problem, you go down to the bottom row of the table and see that 1.15 is between 1.036 and 1.282 so the p value is between 0.10 and 0.015.

Example: In the previous example, would you accept or reject the null hypothesis?

Answer: p > alpha, so accept.

Example: Estimate for the p value for a two-sided t-test with n=31 and sample t= 3.75.

Answer: In row 30, our t is off the table to the right, so we know that the p value is smaller than twice the 0.0005 that is above the last table entry, i.e., p < 0.001.

Example: In the previous example, would you accept or reject the null hypothesis?

Answer: the p value is rare, so we would reject the null hypothesis no matter what alpha given.

Example: Estimate the p value for a two-sided t-test, n=24, sample t=-2.48, alpha is 0.02.

Answer: In row 23, it puts us between 0.01 and 0.02 which we must double because it is two-sided, so the p value is between 0.02 and 0.04. Accept the null hypothesis since p > alpha.

Example: Write the hypotheses and sentence of conclusion only for the following situation: The average score on the SAT Math exam is 505. A test preparatory company claims that the mean scores of students who take their course is higher than 505. Suppose we reject the null hypothesis.

Answer: Ho : M=505  Hi : M>505. The company has evidence students who take their course will on average have a higher score than the 505 of all students who take the SAT Math exam.

 

 

T 11/29

ANSWERS TO SOME OF LAST HMK PROBLEMS:

4. supplemental problem: A researcher believes that the average height of a woman aged 20 years or older is greater now than the 1994 mean of 63.7 inches. She obtains a sample of 45 woman and finds the sample mean to be 63.9 inches. Assume a population std. deviation of 3.5 inches and test at the 0.05 level.

ANSWER:

Hypotheses

Ho: m = 63.7

Hi: m > 63.7

Level of Significance

 alpha = 0.01

Data and calculations

z=(63.9-63.7)/(3.5/sqroot45= 0.38

Decision (to accept or reject the null hypothesis)

Critical value method: all of alpha (0.01) into right tail gives a critical z value of +2.326 and the sample z is closer to center than that, so accept Ho.

p-value method: on the z table, 0.38 gives 0.3520 p value which is more than the alpha of 0.01, so accept Ho.

Sentence of Conclusion

 The researcher has found no evidence that 20 year or older women are now taller than 63.7 inches.

5. supplemental problem: The average daily volume of Dell computer stock in 2000 was 31.8 million shares. A trader wants to know if the volume has changed and takes a random sample of 35 trading days and the mean is found to be 23.5 million shares. Using a population std. deviation of 14.8 million, test at the 0.01 level of significance.

ANSWER:

Hypotheses

Ho: m =31.8

Hi: m not equal to 31.8

Level of Significance

 alpha = 0.01

Data and calculations

z=(23.5-31.8)/(14.8/sqroot35)=-3.32

Decision (to accept or reject the null hypothesis)

Critical value method: half of alpha (0.005) into each tail gives a critical z values of +/-2.576 and the sample z is farther out than that, so reject Ho.

p-value method: on the z table, 3.32 gives 0.0005 in each tail for a total p value of 0.001 which is less than the alpha of 0.01, so reject Ho.

Sentence of Conclusion

 The trader has found evidence that the average daily volume of Dell stock has changed from the 2000 value of 31.8 million shares.

 

LECTURE -- section 10.3: Today, we moved on to a refinement of the testing strategy. Instead of having the population std. deviation given Òfrom previous studiesÓ, we can rely completely on our sample use s (the std. deviation of the sample). However, by relying completely on our sample, we have more chance of error, so the normal distribution will have a correction factor depending on the sample size. This means we will have to use a new table with more rows to take care of various sample sizes. There is a copy of this table in the back of your book.

 

BH269HD:Users:guest_:Desktop:tablet.pdf

 

Note that the top row and the bottom row have the numbers you were using in the abbreviated table for looking up critical values for z tests.

 

To use the new part of the table, you take one less than the sample size, df=n-1 and go down to that row instead of down to the bottom where the z values lie. Use the table symmetrically so that it works for negative t values and gives areas in the left tail of the distribution also.

 

EXAMPLES USING THE t-TABLE TO FIND CRITICAL VALUES AND P-VALUE ESTIMATES:

 

1. What is the critical value for a one-sided test with n=20 and alpha =0.05?

ANSWER: df=20-1=19 and that row put together with the column of 0.05 gives a critical value of 1.729

 

2. What would the critical value be for the above situation if it were two-sided instead of one-sided?

ANSWER: In the same row df=19, you would look at the column with area 0.025, since half of the alpha of 0.05 goes into each tail, and this would give you a critical value of 2.093.

 

3. Find an estimate for the p value in a one-sided test with n=33 and sample t=0.52.

ANSWER: df=33-1=32, so we look in that row on the new t table to find the next higher and lower numbers with respect to 0.52. But since 0.52 < 0.682 the p value then is greater than the area of 0.25 for the t value of 0.682. That is, p > 0.25.

 

4. Find an estimate for the p value for a two-sided test with n=25 and sample t value of 1.52.

ANSWER: df=25-1=24 and in that row, 1.318 < 1.52 < 1.711 so the right or left tail area for the p value is between 0.10 and 0.05, but we have a two-sided test so we double the areas to get the sum of the left and right tail areas: 0.10 < p < 0.20.

 

5. Problem 5 from your last homework would be the same in section 10.3 except for the values in the decision: Decision (to accept or reject the null hypothesis)

Critical value method: half of alpha (0.005) into each tail and looking in row 34 (one less than the sample size of 35) gives critical t values of +/-2.728 and the sample t is farther out than that, so reject Ho.

p-value method: on the t table in row 34 again, we look for the closest values to 3.32: 3.002<3.32<3.348 and find a p value estimate of 2(0.0.001)>p>2(0.0025), or 0.002<p<0.005 which is less than alpha of 0.01. We reject the null hypothesis.

 

6. in the text 10.3 p487 #7: The standardized sample t is 2.502. From row 22 of the t-table, with half of alpha (0.005) in each tail, look up in column 0.005 and find the critical value 2.819. Accept the null hypothesis. Look in row 22 for the closest values to the sample t of 2.502 and see

2.183 < 2.502 < 2.508 so the p value is between 2(0.01) and 2(0.02) (from the top row of the table and using the fact that the hypothesis is two-sided). So 0.02 < p < 0.04 p is greater than the given alpha of 0.01 so accept the null hypothesis.

 

7. in the text 10.3 p488 #9: The standardized sample t is -1.677. From row 17 of the t-table, with all of alpha (0.05) in the right tail, look up in column 0.05 and find the critical value -1.740. Accept the null hypothesis. Look in row 17 for the closest values to the sample t of 1.677 and see

1.333 < 1.677 < 1.740 so the p value is between 0.05 and 0.10 (from the top row of the table). Then p is greater than the given alpha of 0.05 so accept the null hypothesis.

 

HOMEWORK DUE Thursday 12/01

10.3 p487 #6, 8, 10, 11, 12

 

Th 11/17

PREVIOUS MATERIAL:

At the end of lecture, I presented the results of the lottery trial from after Test #3. The winning numbers for 10/26 were 15, 20, 25, 40, 41 and Mega 20. Here are the results for all 232 practice plays done by students in three classes (categories list the number of regular numbers gotten and M stands for having gotten the Mega number). It matched expected probabilities well (that we learned to calculate before Test3 and that appear on the back of the playslip). Some of the plays could not be considered ÒrandomÓ because they used the same numbers as other plays but with one difference each time, so that may have thrown the numbers off a little. One student got 4 of the regular numbers and the Mega, but the estimated actual payoff of $1500 would not be much out of a $15 million dollar pot!

 

category

0

0+M

1

1+M

2

2+M

3

4+M

% student plays

0.51

0.01

0.37

0.01

0.09

0.00

0.009

0.004

expected %

0.53

0.02

0.35

0.01

0.07

0.003

0.005

0.000

approx. payoff on $15 million pot

$0

$1

$0

$2

$0

$11

$10

$1500

 

LECTURE:

We worked on the whole significance test in class today. This is the material from section 10.2 in your book. Look at the examples in the section and the following additional examples:

 

(Modified)In-class example (changed std. dev to 1.5 to give more appealing values!):

Researchers wanted to measure the effect of mothersÕ alcohol use on the development of the hippocampal region of the brain in adolescents, and find out whether the volume of this portion of their brains would be less than the normal volume of 9.02 cubic cm. They sampled 32 such adolescents and found an average volume of 8.10 cubic cm. Assuming a population std. deviation of 1.5, what could they conclude at the 0.01 level of significance?

Hypotheses:

Ho: m = 9.02

Hi: m < 9.02

Level of Significance:

alpha =  0.01

Data and calculations:

z = (8.10 – 9.02)/(1.5/sqroot32) = –3.47

Decision:

Classical: For 0.005 of alpha going in each tail, we find a critical z value of 2.576 and the sample z is farther out from the theoretical population mean. Reject Ho.

P-value: Look up sample z of 0.0003 on the z table p  = 2(0.0003) = 0.0006 whereas the alpha was 0.10. Reject Ho.

Conclusion:

The researchers have strong evidence that the hippocampus of these adolescents is significantly smaller than 9.02 cubic cm.

 

Example:

Grant is in the market to buy a three-year-old Corvette. Before shopping for the car, he wants to determine what he should expect to pay. According to the blue book, the average price of such a car is $37,500. Grant thinks it is different from this price in his neighborhood, so he visits 15 neighborhood dealers online and finds and average price of $38,246.90. Assuming a population std. deviation of $4100, test his claim at the 0.10 level of significance.

Hypotheses:

Ho: population mean = 37500 

Hi: population mean not equal to 37500 

Level of Significance:

alpha =  0.10

Data and calculations:

z = (38246.90 - 37500)/(4100/sqroot15) = 0.71 

Decision:

Classical: For 0.05 of alpha going in each tail, we find a critical z value of 1.645 and the sample z is not as far out from the theoretical population mean.

P-value: Look up sample z of 0.71 on the z table p  = 2(0.2389) = 0.4778 whereas the alpha was 0.10. By either method, we find that we do not have evidence to reject the null hypothesis, so we accept it.

Conclusion:

Grant does not have any evidence that the mean price of a 3 yr. old Corvette is different from $37,500 in his neighborhood.

 

Example:

According to the U.S. Federal Highway Administration, the mean number of miles driven annually in 1990 was 10,300. Bob believes that people are driving more today than in 1990 and obtains a simple random sample of 20 people and asks them the number of miles they drove last year. Their responses give an average of 12,342 miles. Assuming a std. deviation of 3500 miles, test BobÕs claim at the 0.01 level of significance.

Hypotheses:

Ho: population mean = 10300

Hi: population mean > 10300

Level of Significance:

 alpha =  0.01

Data and calculations:

Z=(12342-10300)/(3500/sqroot20)=2.61

Decision:

 The alternate hypothesis is one-sided on the right, the sample z works out to be 2.61, and we reject the null hypothesis. (Classical: alpha of 0.01 gives a critical z value of 2.326 and 2.61 is farther out than that. P-value: on the z table, we look up 2.61 and find p = 0.0045 which is less than the alpha of 0.01. By either of these methods, we reject the null hypothesis).

Conclusion:

 Bob has found significant evidence that people are driving more today than in 1990, when they drove an average of 10,300 miles.

 

Homework due Tuesday 11/29:

For each of the following word problems, perform a complete significance test with:

--Hypotheses (null and alternate),

--Level of significance (given in problem),

--Sample z calculation,

--Decision to accept or reject the null hypothesis (show using both methods)

--Sentence of conclusion relating back to the original problem.

1. section 10.2 p477 #22 b, c, d

2. section 10.2 p478 #26 b using 6.3 for the sample mean (from avg of table values)

3. section 10.2 p478 #28 b

4. supplemental problem: A researcher believes that the average height of a woman aged 20 years or older is greater now than the 1994 mean of 63.7 inches. She obtains a sample of 45 woman and finds the sample mean to be 63.9 inches. Assume a population std. deviation of 3.5 inches and test at the 0.05 level.

5. supplemental problem: The average daily volume of Dell computer stock in 2000 was 31.8 million shares. A traded wants to know if the volume has changed and takes a random sample of 35 trading days and the mean is found to be 23.5 million shares. Using a population std. deviation of 14.8 million, test at the 0.01 level of significance.

 

HAPPY THANKSGIVING: ENJOY YOUR BREAKÉYOU DESERVE IT!

 

T 11/15

READING IN THE TEXT (same as last time):

Ch 10.1 p455-458 (skip type 1 and 2 errors)

10.2 p463-top of 474 (skip objectives 4 and 5)

 

LECTURE:

We have talked about what the level of significance is and how it is used, how to find the z value calculation using the sample data, and how to make a decision to accept or reject Ho. Today, we talked about how to state the hypotheses from word problems and how to write the sentence of conclusion as we went over the hmk.

 

HYPOTHESES:

The null hypothesis, symbolized by Ho is what is accepted as true for the population mean until evidence to the contrary is found.

The alternate hypothesis, symbolized by H1 is what the investigator or researcher is trying to show (always relating to the number for M used in the null hypothesis). You must form alternate hypotheses based on the intent of the investigator in the problem, not from your own feelings about the situation.

 

CONCLUSION:

You must state what you have found from the sampleÕs evidence, or lack thereof. Write a grammatically complete sentence with the following elements: Tell

1. if you have Òfound evidenceÓ or Ònot found evidenceÓ against the null hypothesis,

2. about what (what was the subject of the investigation?),

3. with respect to what number (what was the number in question in the hypotheses?).

 

If you are rejecting the null hypothesis, you have found evidence against the null hypothesis and therefore evidence for your claim in the alternate hypothesis.

If you are accepting the null hypothesis, you have not found evidence against it and therefore have not found evidence to back up your claim in the alternate hypothesis.

  

Example:

An energy official claims that the output of oil per well in the US has increased from the 1998 level of 11.1 barrels per day. Suppose that after she takes a random sample and calculates the sample z value she decides to reject the null hypothesis Ho. Write the hypotheses and the sentence of conclusion.

Answer:

H0 : M =11.1

H1 : M >11.1

The energy official has found evidence that the output of oil per well in the U. S. has increased significantly from the 1998 level of 11.1 barrels per day.

 

Example:

A Muni bus drives a prescribed route and the supervisor wants to know whether the average run arrival time for buses on this route is about every 28 minutes. Suppose that after we calculate the sample z value the data causes the supervisor to accept the null hypothesis. Write the hypotheses and the sentence of conclusion.

Answer:

H0 : M = 28

H1 : M is not equal to 28

The supervisor has found no evidence that the average run arrival time for buses on this route is significantly different from 28 minutes.

 

Example:

A manufacturer produces a paint which takes 20 minutes to dry. He wants make changes in the composition to get nicer colors, but not if it increases the drying time needed. Suppose that after he calculates the sample z value the data causes him to reject the null hypothesis. Write the hypotheses and the sentence of conclusion.

Answer:

H0 : M = 20

H1 : M > 20

The manufacturer has found evidence that the composition change significantly increases the drying time, so he will not make a change. (Notice that he is using the test to pull him away from a bad decision).

 

HOMEWORK (due Thursday 11/17):

10.1 p461/462 (the first three groups are paired problems involving the same situation):

#16 (state hypotheses) and 24 (write sentence of conclusion),

#18 (state hypotheses) and 26 (write sentence of conclusion),

#20 (state hypotheses) and 28 (write sentence of conclusion),

10.2 p476 (try some more like last hmk- the answers to one method are in the back of the book!)

#15 (do both methods: classical and p-value)

#17 (do both methods: classical and p-value)

 

Th 11/10

Sorry to burden you with such long notes, but ch.10 can be a difficult read so I want to point out the main issues and give you more examples than the book provided!

 

READING IN THE TEXT:

Ch 10.1 p455-458 (skip type 1 and 2 errors)

10.2 p463-top of 474 (skip objectives 4 and 5) this section is the difficult reading!

 

SUMMARY OF CH. 10

We will now move into performing Òsignificance testsÓ. The burden is being shifted from the middle of the distribution (which was interesting in confidence intervals) to the tails of the distribution (which are interesting in significance tests). The middle area, or percentage, was the confidence level. The area in the tails is now referred to as the significance level, denoted by alpha, or . ÒSignificance levelÓ because we will not consider a distance from center to be significant until it goes past the critical z values that mark the spots that are defined by the confidence and significance levels.

 

In a significance test, some body or authority has made a claim that the population mean has a certain value. We (the researchers) want to put that claim to the test by taking our own sample average and seeing if it comes reasonably close to the proposed population mean (and thus makes us believe the original claim), or if it is sufficiently different from the population mean to cast doubt on the original value claimed. The stages of the full test are:

 

HYPOTHESES:

The null hypothesis Ho is what is accepted as true for the population mean  until evidence to the contrary is found.

 

The alternate hypothesis H1 is what the investigator is trying to show (always relating to the number for  used in the null hypothesis).

 

LEVEL OF SIGNIFICANCE:

This is    which is given in the statement of the word problem, and it marks the region where one stops believing the null hypothesis and starts accepting the alternate hypothesis instead. It can also be thought of as the total area corresponding to rejection of Ho.

 

DATA AND CALCULATIONS:

A sample is taken to test the null hypothesis and must be standardized with z calculation for the sampling distribution (see p464 for an example).

 

DECISION:

Find if the sample is too rare (as defined by ) to believe  is what it was claimed to be. There are two methods: the p-value approach (p470) and the classical approach (p466).

 

CONCLUSION:

You must state what you have found from the sampleÕs evidence, or lack thereof. Write a grammatically complete sentence with the following elements: Tell

1. if you have Òfound evidenceÓ or Ònot found evidenceÓ against the null hypothesis,

2. about what (what was the subject of the investigation?),

3.  with respect to what number (what was the number in question in the hypotheses?).

 

We will be doing whole significance tests soon, but first, we must take a more careful look at each part. We start with how to make a DECISION to reject or accept the null hypothesis.

 

LECTURE NOTES:

I gave a handout with the following abbreviated table of critical values, so you do not have to look them up on the table backwards each time you want to do a problem. The top row represents the area in either the left or right tail of the distribution, and the bottom row represents the positive or negative critical value. Refer to it as you look at the example problems below:

 

0.25

0.20

0.15

0.10

0.05

0.025

0.02

0.01

0.005

0.0025

0.001

0.0005

0.674

0.841

1.036

1.282

1.645

1.960

2.054

2.326

2.576

2.807

3.091

3.291

 

Two-sided significance test example: If the null hypothesis is that the mean of a population is 35 and the alternate hypothesis is that the mean of the population is not 35 (within a certain amount of acceptable error) and a level of significance is given as 0.05 and you take a sample and standardize it to get z=2.25, does it give enough evidence to reject the null hypothesis and therefore accept the alternate hypothesis?

 

P-value approach Compare the alpha area with the p-value area. The p area is the probability that you would get a value as far away or farther away from the center as the sample value you got. For a sample of 2.25, the area outside of that would be 0.0122 using your old z table. So for a two sided test such as above, the two tails would have .0122 in them based on the sample z. Then the p value is 0.0122+0.0122=0.0244. If p<alpha, you reject the null hypothesis and if p>alpha, then you accept it. Here, 0.0244<0.05  so we reject the null hypothesis.

 

Classical approach Another way of determination is to compare the sample z to the critical value of z. The critical values come from the alpha value. That is, alpha is divided by 2 to get 0.025  (this is how much goes in each tail of the distribution) and on the short table above, you see a critical z* of 1.96. If you get a sample z that is further away from the mean, you have evidence to reject the null hypothesis, as in this case, since the sample z of 2.25 is farther away from the mean that 1.96.

 

One-sided significance test example: If the null hypothesis is that the mean of a population is 45 and the alternate hypothesis is one-sided with the mean is less than 45 (you donÕt care if it is greater) and alpha is 0.02, sample z=1.96

 

P-value approach The p value comes from only one tail, and for z=1.96 it is 0.0250 using the z table. Since p is > alpha, we choose to accept the null hypothesis since the sample is not as rare in probability of occurrence as the alpha.

 

Classical approach If you have a one-sided alternate hypothesis ( that is where the mean is > or < a number instead of Ònot equal toÓ), you donÕt take half of alpha as we did in the two-sided problems, you put it all into the tail of interest. In the situation above, we only care about values that stray too far above what is claimed to be the center. Put all of alpha into the right hand tail. The z value with 0.02 area to its right is about +2.054 from the table above. Since 1.96 is closer to center, we got a routine sample (one that would happen 98% of the time) so there is nothing strange about the center being where it is claimed to be. We accept the null hypothesis.

 

More example problems

1. H0: m=11 and H1: m not equal to 11  and =0.01 sample z= 2.67

Answer:

p value: The area to the left of 2.67 is 0.0038 so the p value is the sum of the left and right tails, 0.0038+0.0038=0.0076< 0.01 (p<alpha) , so reject the null hypothesis.

classical: Half of alpha, 0.005 goes into each tail since the alternate hypothesis is two-sided. The critical values for 0.005 are + or - 2.576 and 2.67 is farther away from center than this, so reject the null hypothesis.

 

2. H0: m=265 and H1: m<265  and =0.01 sample z= -2.25

Answer:

p value: The area to the left of -2.25 is the p value 0.0122> 0.01 (p>alpha) so accept the null hypothesis.

classical: All of alpha, 0.01 goes into the left tail of the distribution since the alternate hypothesis only pertains to values <265. The critical value for 0.01 is -2.326 and -2.25 is closer to center than this, so accept the null hypothesis.

 

3. H0: m=35 and H1: m>35  and =0.05 sample z= 2.23

Answer:

p value: The area to the right of 2.23 is 0.0129 so the p value is 0.0129< 0.05 (p<alpha) so reject the null hypothesis.

classical: All of alpha 0.05 goes into the right tail of the distribution since the alternate hypothesis only pertains to values >35. The critical value for 0.05 is 1.645 and 2.23 is farther from center than this, so reject the null hypothesis.

 

4. H0: m=1.23 and H1 m not equal to 1.23  and =0.02 sample z= -2.45

Answer:

p value: The area to the left of -2.45 is 0.0071 and to the right of +2.45 is the same, so the p value is 0.0071+0.0071=0.0142< 0.02 (p<alpha) so reject the null hypothesis.

classical: Half of alpha, 0.01, is put into each of the left and right tails of the distribution since the alternate hypothesis pertains to values not equal to 1.23, that is, both greater than and less than 1.23. The critical values are + or - 2.326 and -2.45 is farther away from center than this, so reject the null hypothesis.

 

5. H0 m=0.045 and H1: m>0.045  and =0.005 sample z= 2.06

Answer:

p value: The area to the right of 2.06 is 0.0197 so the p value is 0.0197> 0.005 (p>alpha) so accept the null hypothesis.

classical: All of alpha, 0.005, is put into the right tail of the distribution. The critical value for 0.005 is 2.576 and 2.06 is closer to center than this, so accept the null hypothesis.

 

6. H0: m=4500 and H1: m<4500  and =0.025 sample z= -1.83

Answer:

p value: The area to the left of -1.83 is 0.0336 so the p value is 0.0336> 0.025 (p>alpha) so accept the null hypothesis.

classical: All of alpha is put into the left tail of the distribution due to the alternate hypothesis. The critical value for 0.025 is -1.96 and -1.83 is closer to center than this, so accept the null hypothesis.

 

HOMEWORK due Tuesday 11/15: 

Draw distributions for each with relevant z values and areas (use table above for critical values):

10.2 p476

12. verify sample z = 1.92 from part a, and do parts b, c, d

13. verify sample z = 3.29 from part a, and do parts b, c, d

14. verify sample z = –1.32 from part a, and do parts b, c, d

16. verify sample z = 1.20 and do the rest of part a, b

18. verify sample z = 2.61 and do the rest of part b, c

 

 

T 11/08

Test #4 will occur on Thursday. Homework is to study for it. Some additional examples, as requested:

--7.3

Example: In an experiment to determine the amount of time required to assemble an "easy to assembleÓ toy, the average time to assemble it was 27.8 minutes with a standard deviation of 4.0 minutes. What is the probability that a randomly selected person will assemble the toy in more than 30 minutes?

Answer: You want to find the area under the curve corresponding to x values more than 25. When x=30, z= (30-27.8)/4= 0.55. We use this standardized value of x to look up the area under the curve. On the table, this z value gives an area of 0.2912. Since this is the area for values more than 30 ( to the right of 30) this is our answer. About 29% of the time one would expect a person to assemble the toy in more than 30 minutes.

Example: In a very large world history class, the final exam grades have a mean of 66.5 and a standard deviation of 12.6. Above what score lie the highest 25% of the scores?

Answer: This is a ÒbackwardsÓ problem since you are looking for an x value, having been given an area. You are given the area of 0.25, so draw this area as a right tail in the distribution since the highest scores in the class are to the right of the mean (the average score is in the center). The closest value to 0.25 in the table is 0.2514. This area corresponds to a z value of +0.67 (note that the one in class was negative!). This is a positive z value since the higher grades are in the right side of the distribution. You will not get the correct answer if you do not include the correct sign of the z value. We ÒunstandardizeÓ this value by using the formula to solve for x and get

+0.67=(x-66.5)/12.6 so x=(+0.67)(12.6)+66.5=74.942.

About 25% of the class had scores of 74.9 or higher.

 

--8.1

Example: According to the U.S. Federal Highway Administration, the mean number of miles driven annually in 1990 was 10,300. A simple random sample of 20 people and asks them to disclose the number of miles they drove last year and gives an average of 12,342 miles. Assuming a std. deviation of 3500 miles, standardize the sample mean.

Answer: z=(12342-10300)/(3500/sqroot20)=2.61

 

--9.1

Example: What are the critical values for a confidence level of 57%?

Answer: The total area in the tails is 0.43 so half in each tail is 0.215 which gives z = 0.79

 

--9.1 Short answer about changes in n, E, confidence levels and intervals, etc.

Example: . If you take a larger sample size does the confidence interval become wider (less precise) or narrower (more precise) about the mean?

Answer: Narrower (since error gets smaller).

 

--9.1

Example: A large hospital found that in 50 randomly selected days it had on average 96.4 patient admissions per day, assuming a std. deviation of 12.2 from previous studies. Construct a 90% confidence interval for the actual daily average number of hospital admissions per day.

Answer: 96.4-2.84 = 93.56 and 96.4+2.84 = 99.24

Example: In the problem above, how large a sample of days must we choose in order to ensure our estimate is off by no more than 2 daily admissions of patients?

Answer: z*=1.645, std.dev.=12.2, E=2 so n=101

Example: A Gallup poll asked 500 randomly selected Americans, "How often do you bathe each week?". Results of the survey indicated an average of 6.9 times per week. Using a population std. deviation of 2.8 days, what could we say with 80% confidence about the error in using 6.9 times per week as an estimate of the true average number of times Americans bathe?

Answer: Using the formula for error, n=500, s=2.8, critical z=1.282, so we get E=0.16. That is, in 80 samples out of 100 we would expect the number of times that Americans bathe per week could be estimated by 6.9 times and be off by no more than 0.16 times in either direction. (The less confident you are willing to be in your estimate, the better estimate you get).

 

Th 11/03

LAST Homework items we did not get to go over in class:

9.1 p389 # 24

a. n=20, stddev=17, critical z=1.88, we get an error of 7.15, so

123 7.15 = 115.85 and 123 + 7.15 = 130.15

115.85  < m < 130.15.

b. n=12, stddev=17, critical z=1.88, we get an error of 9.23 (increases from part a), so

123 9.23= 113.77 and 123 + 9.23= 132.23

113.77 < m < 132.23.

c. n=20, stddev=17, critical z=1.44, we get an error of 5.47 (decreases from part a), so

123 5.47 = 117.53 and 123 + 5.47 = 128.47

117.53 < m < 128.47.

d. No, because the sample size was less than 30.

e. Increase the mean, shifting the interval to the right.

9.1 p389 # 30

To increase the precision of the interval is to make the lower and upper bounds come closer together, and this can be done by increasing the sample size or lowering the confidence level.

 

LECTURE/ READING IN THE TEXT:

We worked on word problems in class involving confidence intervals, error, and sample size. You can find the formulas and examples in 9.1 p405-415, as in the reading from last time.

 

HOMEWORK due Tuesday 11/08: 

 

1. 9.1 p418 #34

 

2. 9.1 p420 # 44

 

3. Supplemental problem to the text: A random sample of 300 telephone calls made to the office of a large corporation is timed and reveals that the average call is 6.48 minutes long. Assume a std. deviation of 1.92 minutes can be used. If 6.48 minutes is used as an estimate of the true length of telephone calls made to the office,

a. What can the office manager say with 99% confidence about the maximum error?

b. What can the office manager say with 90% confidence about the maximum error?

(do not do a whole confidence interval for these, just use the error formula).

 

4. In the text, 9.1 p418 #36 the answer to part a is x bar (the sample mean) = $3727. Using this info, do parts c, d, and e.

 

5. Supplemental problem to the text: A large hospital finds that in 50 randomly selected days it had, on average 96.4 patient admissions per day. From previous studies it has been determined that a population std. deviation of 12.2 days can be used. Using a 90% confidence level,

a. How large a sample of days must we choose in order to ensure that our estimate of the actual daily number of hospital admissions is off by no more than five admissions per day?

b. How large a sample of days must we choose to have 25% of the error we had in part a?

 

 

TEST #4 FORMAT for Thursday 11/10. A copy of the z table will be provided, along with formulas for z values (populations and sampling distributions), confidence intervals, error, and sample size, and the critical values for 90/95/99%.

 

--Several word problems (perhaps with related parts to cut down on repeated calculations and table look-ups) like those in 7.3, with one ÒbackwardsÓ problem for sure. Note that you use the z formula for populations.

 

--Standardize a sample mean given the z formula for sampling distributions as in 8.1. Other issues in 8.1, such as comparing population and sampling distributions (like the quiz today) and questions of normality and whether sample size must be 30 or more will not be included, as they either do not fit well into the test or duplicate the work tested in other problems (such as those from 7.3).

--Show how to find the z* or critical values for a given confidence level using the z table backwards as in 9.1.

 

--Some short answer questions on sampling distributions and the effect of changes to sample size and confidence on error and confidence intervals, as in 9.1. (For example, what is the effect on the error in using the sample mean to estimate the population mean if you take a smaller sample size? It gets bigger).

 

--At least one each of word problems (some with follow-up parts), as in 9.1 and supplemental problems from homework, dealing with confidence intervals, error and sample size, not necessarily in that order.

 

 

T 11/01

Test grades so far:

By 5 pm today, I will link up the ÒtestscoresÓ item for your class at http://www.smccd.edu/accounts/callahanp  so you can compare the averages I have for you with what you got on your first 3 tests. The scores are by code given in class today. If you were not there to receive your code, please ask for it on Thursday.

 

Homework items we did not get to go over in class:

8.1 p389 # 22

a. use the z calculation for the population: z = (40 – 43.7) / (4.2) = – 0.88 so area to the left is 0.1894 using the table.

b. use the z calculation for the sampling distribution z = (40 – 43.7) / (4.2/sqroot 9) = –2.64 so area to the left is 0.0041.

c. use the z calculation for the sampling distribution z = (40 – 43.7) / (4.2/sqroot 12) = –3.05 so area to the left is 0.0011.

d. as sample size is increased, the probability decreases in the tails because the curve is less spread out due to a smaller std. deviation.

e. use the z calculation for the sampling distribution z = (46 – 43.7) / (4.2/sqroot 15) = 2.12 so area to the left is 0.0170, which happens in less than 2% of all samples, so is not common.

 

8.1 p389 # 24

a. use the z calculation for the population: z = (95 – 85) / (21.25) = 0.47 so area to the right is 0.3982 using the table.

b. use the z calculation for the sampling distribution z = (95 – 85) / (21.25/sqroot 20) = 2.10 so area to the right is 0.0179.

c. use the z calculation for the sampling distribution z = (95 – 85) / (21.25/sqroot 30) = 2.58 so area to the right is 0.0049.

d. as sample size is increased, the probability decreases in the tails because the curve is less spread out due to a smaller std. deviation.

e. happens in less than ½ % of all samples, so is very uncommon.

 

8.1 p389 # 30

a. since n = 40, this is a large enough sample to ensure that the sampling distribution will be approximately normal.

b. mean of sampling distribution same as population 20, and std. deviation is sqroot20/sqroot40 = sqroot of ½ which is about 0.707.

c. z = (22.1 – 20) / (sqroot20/sqroot 40) = 2.97 so area to the right is 0.0015 using the table, and this is unusual, since it would indicate that a sample like this would happen less than 2 times in 1000 samples. It would be considered an anomaly: an indicator that business fluctuates greatly during this time.

 

READING IN THE TEXT:

Material in 9.1 in the text:

p405-414 (and read p414-415 for sample size formula for next time if you have the time!).

 

HOMEWORK due Thursday 11/03: 

Make note for yourself in comparing the parts to a problem how changes in confidence and sample size affect error and interval width. In your work below, round error and confidence interval values to 2 decimal places.

Turn in:

9.1 p416 #14 (do as in 7.2 p347 #23-26)

9.1 p416 #16 (this was done in 7.2 p347 #23 also, so you have the answer from there!)

9.1 p416 # 22 (critical value for 90% given on p410 and for 98% as the answer to 9.1 #13)

9.1 p416 # 24 (use the critical value for 94% you found in 9.1 #14, and for 85%, the answer is given in 9.1 #15).

9.1 p416 #30

 

Th 10/27

LECTURE: We looked at the word problems from homework in detail, and then started looking at section 8.1: about the sampling distribution, which lets us work with a normal distribution of all possible samples from a population even the original population itself is not normal. If a population is normal, then the sampling distribution with any size sample is normal. But if a population is not normal, this text chooses that the sample size must be greater than or equal to 30 for the sampling distribution to be considered approximately normal.

You will be using a modified z calculation that takes into account the size of a sample. I suggest to avoid errors, you should protect the numerator and denominator with parentheses or multiply by the reciprocal of the fraction in the denominator instead of dividing by it:

For example, if the mean of the population is 12.7, the std. deviation of the population is 2.5, the sample size is 5, and the sample mean is 12.5,

 

 

 

 

READING IN THE TEXT:

Browse section 8.1 p377-388, note definitions on p381 and 385, see ex3 p382 and ex5 p387.

We will move on to 9.1 next lecture if you want to start reading ahead.

 

exampleS:

8.1 p389 # 19

a. Since the sample size is less than 30, the population must be normal with mean of sampling distribution same as population 64, and std. deviation is 17/sqroot 12 = 4.91

b. z = (67.3 – 64) / (17/sqroot 12) = 0.67 so P(<67.3) is area to left of 0.67 = 1 – 0.2514 = 0.7486

c. z = (65.2 – 64) / (17/sqroot 12) =  0.24 so P(>65.2) is area to right of  0.24 = 0.4052

 

8.1 p389 # 21

a. use the z calculation for the population: z = (260 – 266) / (16) = – 0.38 so area to the left is 0.3520 using the table.

b. The sampling distribution is normal with mean of 266 and std. deviation of 16/sqroot20 = 3.58

c. use the z calculation for the sampling distribution z = (260 – 266) / (16/sqroot 20) = –1.68 so area to the left is 0.0465

d. use the z calculation for the sampling distribution z = (260 – 266) / (16/sqroot 50) = –2.65 so area to the left is 0.0004

e. since 0.0004 is small, the result is unusual

f. find the area between 266 – 10 = 256 and 266 + 10 = 276

z = (256 – 266) / (16/sqroot 15) = –2.42 so area to left is 0.0078

z = (276 – 266) / (16/sqroot 15) = 2.42 so area to right is 0.0078

area between is 1 – 0.0078 – 0.0078 = 0.9844

 

HOMEWORK (due on Tuesday 11/01):

8.1 p389 # 18, 20, 22, 24, 30

 

T 10/25

LECTURE:

We worked on normal word problems from 7.3 again today, looking at variations of problems from your last homework. As there are not enough examples and problems in the book, I will supplement the book with some different problems below and ask for some variations of them for homework.

 

In a ÒforwardsÓ problem, given an x value (or two x values), use the standardizing formula

z = ( x – mean)/(std. deviation) to find the z value(s). Then look up the area in the tail of the distribution corresponding to each z value. Find the area you want using these tail areas.

 In ÒbackwardsÓ problems, given an area (%, probability, proportion) find the x value that bounds it by reversing the process from the ÒforwardsÓ problems. Identify the given area in a picture and search the middle of the table for the closest area to the one you are given, map it backwards to find the row and column it belongs to in order to find the z value, then take the resulting z value and ÒunstandardizeÓ it (solve for x) in the formula

z = ( x – mean)/(std. deviation)!

 

Supplementary exampleS (most problems in the last hmk were ÒforwardsÓ problems, so 3 of these 4 examples will be the harder ÒbackwardsÓ ones):

1. A salesman has an average car route trip time of 4.3 hours with std. deviation of 0.2 hours. What is the probability that the length of his car trip will last anywhere from 4 to 4.5 hours?

Answer: This is a ÒforwardsÓ problem. For x=4, z=(4-4.3)/0.2=-1.5 and for x=4.5. z=(4.5-4.3)/0.2=1.0. The area to the left of –1.5 is 0.0668 and the area to the right of 1.0 is 0.1587. The area between is 1-(0.1587+0.0668)=0.7745, so there is about a 77% chance that his trip will last anywhere from 4 to 4.5 hours.

 

2.The lengths of sardines received by a cannery have a mean of 4.64 inches and a standard deviation of 0.25 inches. If the distribution of these lengths can be approximated closely with a normal distribution, below which length lie the shortest 18% of the sardines?

Answer: This is a ÒbackwardsÓ problem since you are looking for an x value (length of sardines), having been given an area. The area of 18% or 0.18, is a left-hand tail area, because it represents below-average lengths. The closest value to this in the table is 0.1814. This area corresponds to a z value of 0.91. We ÒunstandardizeÓ this value by using the formula to solve for x and get

0.91=(x-4.64)/0.25 so

x=(0.91)(0.25)+4.64= 0.2275+4.64=4.41.

About 18% of the sardines measure 4.4 inches or shorter.

 

3.The average assembly time for a product is 27.8 minutes with a standard deviation of 4.0 minutes. Above what number of minutes lie the 25% slowest assembly times?

Answer: This is a ÒbackwardsÓ problem since you are looking for an x value (number of minutes), having been given an area. You are given the area of 25% or 0.25, and the closest value to this in the table is 0.2514. This area corresponds to a z value of +0.67. This is a z value on the right side of the distribution since the ÒslowestÓ assembly times involve the most number of minutes, and these occur in the right side of the distribution, where the average time of 27.8 is in the middle. Here is a case where the wording in the problem might not match your intuition! The right-hand side of the distribution does not necessarily represent the biggest, strongest, fastest numbers!

Now we ÒunstandardizeÓ this value by using the formula to solve for x :

+0.67=(x-27.8)/4.0 so x=(+0.67)(4.0)+27.8=30.48 or about 30 minutes.

The 25% slowest assembly times take about 30 or more minutes. (almost the same as the previous ÒforwardsÓ version of the problem).

  

4. The average assembly time for a product is 27.8 minutes with a standard deviation of 4.0 minutes. Above what number of minutes lie the 60% slowest assembly times?

Answer: This is a ÒbackwardsÓ problem since you are looking for an x value (number of minutes), having been given an area. You are given the area of 60% which as a decimal is 0.60, but you can only find areas from 0% to 50% on the table! To find an area of 60% above an x value, the x value must be to the left of the mean. So you must use the area to the left of this x value, which is 40%, or 0.40. The closest value to 0.40 in the table is 0.4013. Looking from this area to what row and column it belongs to, we see this area corresponds to a z value of 0.25. This is a z value on the left side of the distribution, so it is negative.

Now we ÒunstandardizeÓ this value by using the formula to solve for x :

0.25=(x-27.8)/4.0 so x=(0.25)(4.0)+27.8=26.8 minutes.

The 60% slowest assembly times take about 26.8 or more minutes.

 

READING IN THE TEXT:

7.3 p349-352, but using the table given in class.

 

HOMEWORK due Th 10/27 (the last two are ÒbackwardsÓ problems): 

1. If final exam grades have a mean of 66.5 and a standard deviation of 12.6, what percent of the class should receive an A if AÕs are earned by those with scores of 87 or better?

 

2. The average amount of radiation to which a person is exposed while flying by jet across the U.S. is 4.35 units with std. deviation of 3.2. What is the probability that a passenger will be exposed to more than 4 units of radiation?

 

3. The average time to assemble a product is 27.8 minutes with a standard deviation of 4.0 minutes. What percent of the time can one expect to assemble it in anywhere from 30 to 35 minutes?

  

4. The number of days that patients are hospitalized is on average 7.1 days with std. deviation of 3.2 days. How many days do the 20% longest-staying patients stay?

  

5. For a salesman driving between cities, the average trip time is 4.3 hours with std. deviation of 0.2 hours. Below what time lie the fastest 10% of his trips?

 

 

Th 10/20

READING IN THE TEXT:

7.3 p349-353

 

exampleS:  

7.3 p354 #17

a. area less than z = (20-21)/1 = –1 is 0.1587 so about 16% of the eggs are expected to hatch in less than 20 days.

b. area more than z = (22-21)/1 = +1 is 0.1587 so about 16% of the eggs are expected to hatch in more than 22 days.

c. area less than z = (19-21)/1 = +2 is 0.0228 and area more than z = 0 is 0.50, so the area between is 1 – 0.0228 – 0.50 = 0.4772 so about 48% of the eggs are expected to hatch in 19 to 21 days.

d. area less than z = (18-21)/1 = +3 is 0.0013 which happens 0.13% of the time (much less than 1%).

7.3 p354 #21 (other parts I did not do in class)

b. area less than z = (250-266)/16 = –1 is 0.1587 so about 16% of pregnancies last less than 250 days.

d. area more than z = (280-266)/16 = +0.88 is 0.1894 so about 19% of pregnancies last more than 280 days.

e. area no more than z = (245-266)/16 = –1.31 is the same as area less than –1.31 which is 0.0951 so about 10% of pregnancies last no more than 250 days.

f. area less than z = (224-266)/16 = –2.63 is 0.0043 so pregnancies lasting less than 224 days happen less than ½ of a percent of the time, therefore are considered rare.

7.3 p355 #29 Be careful of this oneÉyou are working backwards!

b. Find the z value such that there is 0.03 area in the right tail. Since 0.0301 is the closest area to 0.03 in the table and it corresponds to a z value of 1.88, we must work backward to find the x value for this z in the formula. We are given mean 17 and std. deviation 2.5 so  z = (x – 17)/2.5. Then if z = 1.88,  1.88 = (x – 17)/2.5. Multiply both sides by 2.5 to get (1.88)(2.5) = (x – 17) and then add 17 to both sides to get (1.88)(2.5) + 17 = x so x = 21.7 minutes.

 

HOMEWORK due Tuesday 10/25

7.3 p354

#18 do all parts abcd, then do an extra part e:

What is the probability that a randomly selected sixth-grade student reads less than 125 words per minute?

#20 do all parts abcd

#28 do as with the third example above (#29).

 

 

T 10/18

Homework is to study for Test #3. The format is in the previous notes below. Before the test, I will be giving you some instruction about problems from 7.3, which involves some of what you are being tested on. I will put up a few notes after the test, so please check back then.

 

Th 10/13

Homework items we did not get to go over in class:

5.5 p278 #66

a.

 

12 total

 

3 diet

9 regular

Take 3 total

2 from here

1 from here

 

(3C2*9C1)/12C3 = (3*9)/220 = about 0.1227

b. 

 

12 total

 

3 diet

9 regular

Take 3 total

1 from here

2 from here

 

(3C1*9C2)/12C3 = (3*36)/220 = about 0.4909

c. 

 

12 total

 

3 diet

9 regular

Take 3 total

3 from here

0 from here

 

(3C3*9C0)/12C3 = (1*1)/220 = about 0.0045

 

d. 

 

12 total

 

3 diet

9 regular

Take 3 total

0 from here

3 from here

 

(3C0*9C3)/12C3 = (1*84)/220 = about 0.3818

 

e. It is a probability distribution: every event is represented by the parts above, and

0.1227 + 0.4909 + 0.0045 + 0.3818 = 0.9999 ~ 1 (due to rounding)

 

LECTURE:

We have done what we are going to do in 5.5, and will skip ch 6, as we have already talked about probability distributions in other sections. So now we go to ch 7 and back to normal distributions!

 

Recall from chapter 3, once you know how to calculate the mean and standard deviation for a normal distribution, you can put these two important numbers to work to ÒstandardizeÓ any normal distribution. We convert a particular score in the distribution into a standardized z score by way of the following standardizing formula: z = (x – mean)/ std.deviation

What if a normal distribution has a mean of 63.2 and a standard deviation of 7.8, find the z value for each using the formula:

a. 63.2 Answer: z=(63.2-63.2)/7.8= 0 (the mean always standardizes to 0!)

b. 58.9 Answer: z=(58.9-63.2)/7.8= -0.55

c. 64.8 Answer: z=(64.8-63.2)/7.8= 0.205128205.... = 0.21 (round to 2 places!)

 

We do this in order to be able to look up the area under the curve for a given value using the table. When you work on these problems involving areas under the normal curve, draw a normal curve and fill in the numbers of interest for a particular problem on number lines below the curve. Once you have all known values on your picture, think about what area under the curve you are looking for and what areas you have from the table for the z values of interest. Then you must decide how to use the table values to find the area you need, and this may not be immediately apparent!

 

We focused today just on using the table below to find areas under the already standardized curve. The z table gives the areas under the std. normal curve for particular z scores. We will use a modified version of this table from your book, where we employ the symmetry of the curve so that the area to the left of a negative z value is the same as the area to the right of a positive z value. In that way, you can look up the z values below as + or – instead of just –.

I gave a copy of the modified table in class today—ask for it next time if you were not there.

 

::tablez.pdf

 

To look up a particular value of z, you put together the row and column that make up the z value. The left-most column gives the ones and tenths places of the z, but the uppermost row gives the hundredths place of the z value. For instance, if the z value is 1.83, since 1.83=1.8+0.03, you look to the row of 1.8 and the column of 0.03 to find the area under the curve : 0.0336.

For example:

What is the area under the standard normal curve for values to the left of z= -1.57 ?

Answer: Putting row 1.5 with column 0.07 we get 0.0582

 

exampleS:  

1. What is the area to the left of z= –2.04? Answer: 0.0207

2. What is the area to the right of z= 2.79? Answer: 0.0026

3. What is the area to the left of z= –0.06? Answer: 0.4761

4. What is the area to the left of z= –0.60? Answer: 0.2743

5. What is the area to the right of z= 0.60? Answer: 0.2743

6. What is the area to the right of z= –1.74? Answer: 0.9591 (from 1-0.0409)

7. What is the area to the left of z= 1.05? Answer: 0.8531 (from 1-0.1469)

8. What is the area between z= 0.87 and z= 2.03?

Answer: 0.1710 (The smaller tail corresponding to z= 2.03 has area of 0.0212 and the larger tail corresponding to z=0.87 has area of 0.1922. The smaller area is contained within the larger area, so to find the area between, take the larger and subtract the smaller: 0.1922–0.0212=0.1710)

9. What is the area between z= –0.25 and z= –1.97

Answer: 0.3769 (Same as the above, subtract smaller tail from larger: 0.4013-0.0244 = 0.3769)

10. What is the area between z= –2.09 and z=3.07?

Answer: 0.9806 (Different from the previous two problems, because the values are on opposite sides of the distribution and so it is not the case that one tail is contained within the other. You must start with 100% of the whole distribution and Òchop offÓ the two tails using subtraction:

1 – (0.0183 + 0.0011) or 1 – 0.0183 – 0.0011 which equals 0.9806)

 

READING IN THE TEXT:

(Review of previous topics 7.1 p327-332 standardizing formula and area under the normal curve)

7.2 p337-346 finding area under the normal curve (be careful that we a using a modified version of the table in the book—the table in the book has a two-page table with separate +/- values, but the answers to the area exercises should ultimately come out the same)

 

HOMEWORK due Tuesday 10/18

1. In Super Lotto Plus (previous hmk), find the probability of getting 2 of the regular numbers and not getting the Mega number. To make your life a little easier, here are some shortcuts:

  nC0=1 for all n (so 5C0 = 1 for example)

  nC1=n for all n (so 5C1 = 5 for example)

  nCn=1 for all n (so 5C5 = 1 for example)

  5C2=10

  42C3=11,480

  47C5=1,533,939

2. Using the table and material from lecture today, find the area between z = – 2 and z = +2.

3. Using the table and material from lecture today, find the area between z = – 3 and z = +3.

4. 7.2 p346 # 6 parts a and c

5. 7.2 p346 # 8 parts b and d

6. 7.2 p346 # 10 parts b and c

7. 7.2 p347 # 16 (see ex. 5 p341 and search the areas in the table for the closest value to 0.2000 and tell what z value that area belongs to – this is using the table backwards)

8. 7.2 p347 # 18 (see ex. 5 p341 and 7 p343 and use 15% or 0.1500 to do as in #16!).

9. 7.2 347 #23 (see ex. 8 p343 putting 0.10 in each of the left and right tails)

10. 7.2 347 #25 (see ex. 8 p343 putting 0.005 in each of the left and right tails)

 

TEST #3 FORMAT:

Test #3 will occur as scheduled on Th 10/20.

I will provided the general addition and multiplication rules and formulas for nPr and nCr.

It will likely contain (I will firm it up on Tuesday):

 

1. One short problem like 5.1 p234 #32-34

2. One probability model set-up like 5.1 p235 #40 combined with questions from a problem like 5.2 p246 #26

3. Given a table like 5.2 p248 #42, 44 but combining it with material from 5.4, find probabilities of events like:

P(A), P(A given B), P(A and B), P(A or B). Events may be stated in words (as in #42, 44 for example) or defined with letters such as A and B, and the multiplication and addition rules must be shown explicitly.

4. One general addition rule card problem like 5.2 #32ac or ex3 p242

5. One general multiplication rule word problem like 5.4 #12-16

6. One nPr or nCr to show computation from given formula, like 5.5 p276 #18 or 26

7. About 4 situations like 5.5 #46-50 to decide if order matters and write appropriate nPr/nCr but not compute it

8. Write a more complicated probability using a quotient of nCr counts from subsets like 5.5#66 or ex15 p276

9. A problem like 5.5 p278 #60 or Super Lotto examples to write an event probability with a quotient of nCr counts

10. Various areas to look up like todayÕs hmk from 7.2

 

 

T 10/11

CLASS EXERCISE: For 5 identical job positions there are 15 applicants, 6 of whom are female. What is the probability that in filling these 5 positions, we will get exactly 2 females?

Answer:  

 15 total applicants

 6 females

9 males

Take 5 total

2 from here

3 from here

 

(6C2*9C3)/15C5=(15*84)/3003 =0.42     (More examples of this below).

 

LECTURE:

We looked at the general addition and multiplication rules from 5.2 and 5.4 in a quiz, went over hmk from 5.5 (how to tell if order matters or not in deciding to compute Permutations nPr = n!/(n-r)! and Combinations nCr = n!/[r!(n-r)!]). We looked at one more new topic from 5.5 today: how to form more sophisticated probabilities using nCr counts.

 

READING IN THE TEXT:

5.5 p271 ex7 and p273 ex11 and p275 Ex. 14 (deciding if order is important)

5.5p275-276 ex 14, 15 probabilities involving combinations

 

exampleS:  

5.5 p277 #47 order matters: 20P4

5.5 p277 #51 order doesnÕt matter: 50C5

 

5.5 p278 #62

c. (55C3*45C4)/100C7 = (26235*148995)/(1.60075608x10 to the 10th power) = about 0.24

 

5.5 p278 #65a. Out of the 13 tracks, 5 are liked so 8 must be disliked. You are taking 2 of 5 liked and 2 of 8 disliked for the event probability on the top of the fraction. On the bottom of the fraction, any 4 could pop up from the 13 tracks available.

 

 

13 total

 

5 liked

8 disliked

Take 4 total

2 from here

2 from here

 

(5C2*8C2)/13C4 = (10*28)/715 = about 0.39

 

Another example not in book:

Out of 125 dishes in a box, 8 are chipped. If we select 6 dishes at random from the box, what is the probability that exactly 1 will be chipped? This is using the techniques from card counting above in the same way, but with a different set and subsets:

Answer:

Out of 125 dishes in a set, if 8 are chipped, 117 are not.

 

 

125 total

 

8 chipped

117 not chipped

Take 6 total

1 from here

5 from here

 

(8C1*117C5)/125C6=(8*167549733)/4690625500 =0.29

  

One of the most famous examples of this counting method is:

THE CALIFORNIA LOTTERY: SUPER LOTTO PLUS

To play the game, you are asked to pick 5 different numbers choosing from 1 to 47 regular numbers and one ÒMegaÓ number choosing from 1 to 27. The top prize (which is the one advertised in millions) goes to whoever matches all 5 of 5 winning numbers and matches the one Mega number. Much smaller prizes are awarded for matching some of the numbers. Prizes are awarded to the following winning combinations:

 

 Prize categories

Odds of winning:

chance of winning as %

All 5 of 5 and the Mega 

1/41,416,353

0.000000002

Any 5 of 5 (and not Mega)

1/1,592,937

0.000000628

Any 4 of 5 and the Mega

1/197,221

0.000005070

Any 4 of 5 (and not the Mega)

1/7,585

0.000131839

Any 3 of 5 and the Mega

1/4,810

0.000207900

Any 3 of 5 (and not the Mega)

1/185

0.005405405

Any 2 of 5 and the Mega

1/361

0.002770083

Any 1 of 5 and the Mega

1/74

0.013513514

None of 5, only the Mega

1/49

0.020408163

  

As I suggested in class, it can be quite helpful to map out the strategy for outcomes by breaking down each number set into the important subsets and see how many are to be taken from each.

Getting any 3 of 5 and not getting the Mega:

 

Out of 47 regular numbers there are:

AND

(mult.)

Out of 27 ÒMegaÓ numbers

 

5 winning

42 losing

 

1 winning

26 losing

Take 5:

3 from here

2 from here

Take 1:

0 from here

1 from here

 

( ( 5C3 * 42C2 ) / 47C5 ) * ( (1C0*26C1) / 27C1 ) = ( ( 10 * 861 ) / 1,533,939 ) * ( (1*26) / 27 )

= 223860 / 41,416,353 = 0.005405111 because they rounded the odds of 1/185.01 to 1/185.

 

Some more examples without the tables:

Getting any 3 of 5 and the Mega (notice that you are selecting 3 of 5 winners and 2 of 42 losers):

( ( 5C3 * 42C2 ) / 47C5 ) * ( 1C1 / 27C1 ) = ( ( 10 * 861 ) / 1,533,939 ) * ( 1 / 27 )

= 8610 / 41,416,353

= 0.00207889 which is slightly different from above, because the odds work out to about 1/4810.26 and the lottery rounds the figure to 1/4810 as above.

 

Getting all 5 of 5 and the Mega (any 5 of the 47 could be chosen, but you want all 5 of the 5 available winners, and any 1 of the 27 possible Megas could be chosen, but you want 1 of only 1 successful): ( 5C5 / 47C5 ) * ( 1C1 / 27C1 ) = ( 1 / 1,533,939 )( 1 / 27 ) = 1 / 41,416,353 = 0.000000002 as above.

 

HOMEWORK due Th 10/13:

1. 5.5 p277 #48,

2. In the Illinois Lottery, in how many different ways can one pick the set of six winning numbers from the 51 available to choose from? Does order matter? (Compute both permutations and combinations for this one. Then think about the probability of winning being 1 chance in the number you just found. For example, if you found the number of different ways to pick lottery numbers was 12,345 , then if you purchased one ticket, you would have a one chance in 12,345 of winning. That is, 1/12,345 = 0.000081004 chance of winning. Consider that there are about 13 million people in Illinois, assume that almost that many tickets are sold each time, and that the Lottery officials would like someone to win at least every other drawing! Which looks more appropriate now, 51C6 or 51P6?)

3. 5.5 p278 #60,

4. In Super Lotto Plus above in the examples, find the probability of not getting any of the regular numbers and not getting the Mega number either!

5. 5.5 p278 #66 Do parts a, b, and c, and also find P(0 diet). Check that all the parts form a probability distribution.

 

 

Th 10/06

LAST HMK:

We went over the problems in 5.2 and 5.4 in detail, with the exception of 5.4 #18:

a. P(former given cancer) = 91/(782+91+141) = 0.09

b. P(cancer given former) = 91/(91+7757) =  0.01

 

LECTURE:

I introduced material from section 5.5 that we will continue with on Tuesday.

To form more complicated probabilities, one must know how to count sometimes large and complex numbers of things.

We considered the example of how to count the different ways one can take 3 letters, without repetition, from a set of four letters{ A, B, C, D }. We found 24 permutations (order matters) and 4 combinations (order doesnÕt matter) with the help of a tree diagram. We could have performed these counts without a tree, using the formulas in 5.5. Before another example, please read the pages below.

 

READING IN THE TEXT:

5.5 pages 266 thru the end of example 11 on p273, and try some of the computations in the skill building section on p276/277 for yourself (check answers to odds in the back of the book). Especially read about:

-- tree diagrams on p267

-- factorials on p269

-- permutation formula p270

-- combination formula p272

--p271 ex7 and p273 ex11 deciding if order is important

 

another example:

Suppose that we wish to list the number of ways that we can choose three letters at a time from the following set of five letters { A, B, C, D, E } without choosing a letter more than once at a time and where order of the letters is important (i.e., ABC is not the same sample as CBA because the order of selection is different, so they therefore represent different choices). We could make the following selections (one would not want to list them with a Òtree diagramÓ!):

 

ABC

BAC

CAB

DAB

EAB

ABD

BAD

CAD

DAC

EAC

ABE

BAE

CAE

DAE

EAD

ACB

BCA

CBA

DBA

EBA

ACD

BCD

CBD

DBC

EBC

ACE

BCE

CBE

DBE

EBD

ADB

BDA

CDA

DCA

ECA

ADC

BDC

CDB

DCB

ECB

ADE

BDE

CDE

DCE

ECD

AEB

BEA

CEA

DEA

EDA

AEC

BEC

CEB

DEB

EDB

AED

BED

CED

DEC

EDC

 

So there are 60 ways to select 3 letters from a set of 5 where order of the letters is important. The easier way to come up with this number without listing the selections is to use the formula on p270! Check that you get 5P3 = (5!)/(5-3)! = 5!/2! = (5*4*3*2*1)/(2*1) = 120/2 = 60 using the formula on p270.

  

Now suppose that we wish to make the same count, but where order of letters is not important (i.e., ABC is considered the same sample as CBA). Our table would now lose many of its items.

If ABC is the same as ACB, BAC, BCA, CAB, CBA.

If ABD is the same as ADB, BAD, BDA, DAB, DBA.

If ABE is the same as AEB, BAE, BEA, EAB, EBA.

If ACD is the same as ADC, CAD, CDA, DAC, DCA,

If ACE is the same as AEC, CAE, CEA, EAC, ECA,

If ADE is the same as AED, DAE, DEA, EAD, EDA,

if BCD is the same as BDC, CBD, CDB, DBC, DCB.

if BCE is the same as BEC, CBE, CEB, EBC, ECB.

if BDE is the same as BED, DBE, DEB, EBD, EDB.

if CDE is the same as CED, DCE, DEC, ECD, EDC.

 

That leaves us with 10 different ways to choose letters, represented by the following individuals:

ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE. Or, we could just count them using the formula on p272! Check that you get 5C3 = (5!)/((3!)(5-3)!) = 5!/(3!*2!) = (5*4*3*2*1)/((3*2*1)*(2*1)) = (120)/((6)(2)) = 120/12 = 10 using the formula on p272. 

 

HOMEWORK due Tuesday 10/11: (Please perform problems in the order listed below)

5.5 p276 # 6, 8, 14, 16, 24 (note that #9 tells you that 0! = 1 by definition)

5.5 p277 #28, showing the possible paths on a tree diagram.

5.5 p277 #30, showing which outcomes in #28 above are repeats of other outcomes.

5.5 p277 #18 and see how quickly the counts can get out of hand, even with small sets of objects. Would you want to list all of these selections in a table or on a tree diagram?

5.5 p277/278 #46, 50 deciding if order matters first, then computing the appropriate P or C.

 

 

T 10/04

ANSWERS to 5.1 p233 #12, 14, 32, 34, 40, 52 from last hmk (bring questions next time if you have some):

12. All of them are yellow, since P(yellow) = 1 = 100%

14. Not a probability model since 0.1+0.1+0.1+0.4+0.2+0.3 = 1.2 not 1

32. a. P(at least once) = 341/110 = 0.31     b. we expect a larger sample of female adults to produce a similar result: that about 31% of all female adults volunteered at least once last year.

34. a. P(Titleist) = 35/80 = 0.44   b. P(Top) = 20/80 = 0.25   c. Based on this sample, we would expect larger bags of this type to contain about 44% Titleist and about 25% TopFlite.

40. Adding the total of the categories, we get 4521 so the probabilities are

     118/4521 = 0.03, 249/4521 = 0.06, 345/4521 = 0.08, 716/4521 = 0.16, 3093/4521 = 0.68.

52. a. P(8 girls) = P(1st girl)*P(2nd girl)***P(8th girl) = ½*½*½*½*½*½*½*½= 0.00390625

so this is classical since it results from expectation, not experimentation or experience.

     b. Empirical since based on an experimental survey of 1000.

     c. Subjective since based on personal experience.

     d. Empirical since based on trials in an experiment.

 

LECTURE:

We looked at section 5.2 the two forms for the addition rule p238 and p241/242 and the two forms for the multiplication rule p251 and p259. The second form of each rule (p242 and p259) is the general form, which actually covers the cases of the first forms presented (p238 and p251).

 

ÒE or FÓ is a union of sets and ÒE and FÓ is an intersection of sets.

The general addition rule of section 5.2 includes intersections as part of the union (subtracting the intersection once so as not to double count it in the union):

P(E or F) = P(E) + P(F) – P(E and F)

In table problems, we treated P(E and F) as an intersection of a row and column in the table. Now we can also use the general multiplication rule to find it:

P(E and F) = P(E)áP(F given E), where P(F given E) is the conditional probability: the probability that event F occurs given that event E has occurred or that E is a subset being chosen from.

 

You can find a table discussion similar to the one we had in class in example 4 on p242/243 and example 1/2 on p257/258 (they refer to the same table even though they are in different sections).

 

READING IN THE TEXT:

5.2 p238-243 assigned previously

5.3 p250-252 (thru example 2) multiplication rule for independent events

5.4 p256-259 (thru example 3) general multiplication rule (works for both independent and dependent events, thus it is a general rule!).

 

MORE EXAMPLES (to guide you in your hmk):

5.2 p247

5. E and F share {5, 6, 7} so they are not mutually exclusive

7. S has 12 members and (F or G) = {5, 6, 7, 8, 9, 10, 11, 12} so P(F or G) = 8/12 = 2/3

     P(F or G) = P(F) + P(G) – P(Fand G) = 5/12 +4/12 – 1/12 = 8/12 or 2/3

9. E and G do not share any numbers, so they are mutually exclusive

13. P(E or F) = P(E) +P(F) – P(E and F) = 0.25 +0.45 – 0.15 = 0.55

15. P(E or F) = P(E) + P(F) = 0.25 + 0.45 = 0.70

19. P(E or F) = P(E) +P(F) – P(E and F) so 0.85 = 0.60 +P(F) – 0.05 and solving for P(F), we      get 0.85 – 0.55 or 0.30.

31. a. P(heart or club) = P(heart) + P(club) = 13/52 + 13/52 = 26/52 = 0.50

     b. P(heart or club or diamond) = P(heart) + P(club) +P(diamond)

         = 13/52 + 13/52 +13/52 = 39/52 = 0.75

     c. P(heart or ace) = P(heart) + P(ace) – P(heart and ace) = 13/52 + 4/52 – 1/52 = 16/52

43. a. P(satisfied) = 231/375

      b. P(junior) = 94/375

      c. P(satisfied and junior) = 64/375 from the intersection of the row and column in the table.

      d. P(satisfied or junior) = P(satisfied) + P(junior) – P(satisfied and junior)

          = 231/375 + 94/375 – 64/375 = 261/375

5.4 p262

3. P(E and F) = P(E)*P(F given E) so 0.6 = (0.8)(P(F given E)) so P(F given E) = 0.6/0.8 = 0.75

13. use mult rule but now in word problem form!

     P(cloudy and rainy) = P(cloudy)*P(rainy given cloudy)

     0.21 = (0.37)(P(rainy given cloudy)) so P(rainy given cloudy) = 0.21/0.37 = 0.57

15. P(16/17 and white) = P(16/17)*P(white given 16/17)

     0.062 = (0.084)( P(white given 16/17)) so P(white given 16/17) = 0.062/0.084 = 0.74

17. a. P(no given <18) = 8661/78676 = 0.11

     b. P(<18 given no) = 8661/46993 = 0.18

    

Homework due Thursday 10/06:

5.2 p245 #8, 14, 20, 32, 44

5.4 p262 #4, 8, 14, 16, 18

 

 

Th 9/29

LECTURE:

Last time and today before the test, we talked about the beginnings of probability in Ch. 5 with the classical and empirical methods of computing probabilities in 5.1. We looked at finding probabilities from tables from 5.2, but today, I did not have enough time to explain the addition rule properly. I only showed you the addition rule for disjoint events, which works for some tables, but for the table I used in class, I needed to show you the general addition rule which takes into account Òdouble-countingÓ in each set being added. So I am omitting the part requiring the general rule from the last assigned problem below and will pick up with it on Tuesday.

 

READING IN THE TEXT:

5.1 p223-227 up to but not including example 5.

For next time: 5.2 p238-243 to the end of example 4.

 

EXAMPLES (to guide you in your hmk):

5.1 p233 (using the set-up for probabilities on p227)

13. cannot have a negative probability

31. P(sports) = 288/500 = 0.576

33. a. P(red) = 40/100 = 0.40

     b. P(purple) = 25/100 = 0.25

39. 118+249+345+716+3093 = 4521 never 125/4521 = 0.026, rarely 249/ 4521=0.068, sometimes 345/4521 =0.116, most 716/4521 =0.263, always 3093/4521=0.527

49. a. P(right) = 24/73 = about 0.33

     b. P(left) = 2/73 = about 0.03    c. yes, only 3% of the time

5.2 p247

25. using the addition rule for disjoint events (add the probabilities):

     a. they all add to 1

     b. gun or knife = 0.671 +0.126 = 0.797

     c. 0.126 + 0.044 + 0.010 = 0.180

     d. 1 – 0.671 = 0.329

     e. yes, they only happen 1% of the time

43. a. P(satisfied) = 231/375

      b. P(junior) = 94/375

      c. P(satisfied and junior) = 64/375 from the intersection of the row and column in the table.

      Part d requires the general addition rule, which we will talk about on Tuesday:

      d. P(satisfied or junior) = P(satisfied) + P(junior) – P(satisfied and junior)

          = 231/375 + 94/375 – 64/375 = 261/375

 

Homework due Tuesday 10/04:

Do 5.1 p233 #12, 14, 32, 34, 40, 52

Do 5.2 p247 #26, 42abc (skip part d which would require the general addition rule).

 

T 9/27

As promised, leftover quizzes that were not picked up in class last week and the current quiz from today are all in a folder outside of my office BH 269. Pick them up if they will help you study. Please clip them back after you take yours.

 

I removed item #11 from the first version of the test format since we need to spend more quality time with it before testing (but we started to go over that material today). Revised format follows:

TEST #2 FORMAT

Formulas to be provided:

3.2 p133-136 formula for std. deviations of samples,

3.2 p139 the numbers 68/95/99.7 from the empirical rule, ChebyshevÕs inequality

3.4 p155 z scores for populations

3.4 p160 lower and upper fences

Questions:

 1. short answer questions about means of samples and populations, medians, modes, and distribution shape, resistance to skewing, meaning of stadardization, and measures to report,  (see text 3.1 p118/119 ex1, p122 definition and table 4, p129 #24,42, 3.2 p142 #8 and 3.4 p155 definition, p159 summary table, p163 #30).

 2. short answer like 3.1 p126 #18 match pictures of distributions with table data.

 3. like 3.1 p125 #16 find the mean median and mode.

 4. like 3.2 p142 #11-16,20 where you are given a set of data and asked to find the mean and std. deviation of a sample (using deviation formulas, not computational formula).

 5. like 3.2 p144/145 #35-38 plus supplemental parts, using the empirical rule to find areas.

 6. like 3.2 p145 #39-40 use of ChebyshevÕs inequality.

 7. short answer comparison of areas to left/right of z values (already standardized) to tell which is bigger.

 8. like 3.4 p161 #9-14 compare z scores and relative placement.

 9. like 3.4 p162 #22 and 3.5 p170 #12 to find quartiles, IQR, fences, outliers and make a box plot.

10. like 5.1 p233/234 #17, 18 (part of todayÕs material).

 

HOMEWORK is to study for the test.

 

LECTURE:

One problem from the hmk that we did not go over in class:

Ch 3 review p173 #2

a. sample mean: add up the table values 91610 then divide by 9 to get 10178.89

median after putting data in order = 9980 (4 larger values and 4 smaller values)

b. range = 14050 – 5500 = 8550

sample std. deviation from the computing formula:

sum of x = 91610 and sum of squared x = 1008129252

Sxx = 1008129252 – (91610squared / 9) = 1008129252 – 932488011.11 = 75641240.89

s = squareroot of (75641240.89 / (9 – 1)) = 3074.92

Q1 = avg of 7200 and 7889 = 7544.50

Q3 = avg of 12999 and 13999 = 13499

IQR = Q3 – Q1 = 13499 – 7544.5 = 5954.50

c. new table sum is 91610 + 27000 = 118610 so new mean is 13178.89 but median same

range = 41050 – 5500 = 35550

sample std. deviation from the computing formula:

sum of x = 118610 and sum of xsquared = 2495829252

Sxx = 2495829252 – (118610squared / 9) = 932681240.90

s = squareroot of (932681240.90 / (9 – 1)) = 10797.46

IQR is same

Conclusions: the mean, std. deviation and range changed fairly dramatically, but the median and the IQR stayed the same, meaning that they are ÒresistantÓ to outliers.

 

We also started looking at taking probabilities from tables of values. We will look again at this before your test and I will put some notes and hmk up after the test, so please check back.

 

 

Th 9/22

ANSWERS to previous hmk problems for your reference:

3.2 p145 #38

a. 68% between 4 – 0.007 = 3.993 and 4 + 0.007 = 4.007

b. these values are 2 std. deviations from the mean, so 95%

c. the area outside of the area in b would be 1 – 0.95 = 0.05 or 5%

d. the area between 4 and 4.007 is 0.68/3 = 0.34 and the area between 4 and 4.021 is 0.997/2 = 0.4985, so 0.4985 – 0.34 = 0.1585 or 15.85%

extra parts

e. the % > 4.007 is 0.50 – 0.34 = 0.16 or 16%

f. the % greater than 3.993 is 0.50 + 0.34 = 0.84 or 84%

3.2 p145 #40

a. k = 2 so (1 – 1/2squared)100% = (1 – 0.25)100% = 75%

b. k = 1.5 so (1 – 1/(1.5squared))100% = 55.56%

and 27.3 – 1.5(8.1) = 15.15   and 27.3 + 1.5(8.1) = 39.45

c. 27.3 – 3(8.1) = 3 and 27.3 + 3(8.1) = 51.6 so (1 – 1/3squared)100% = (1 – 8/9)100% = 88.89%

 

ANSWERS to the last in-class exercise from T 9/20:

4. Dr. JinÕs patient z = (190 – 200)/10.5 = -0.95

and Dr. BrownÕs patient z  = (169 – 177)/10.9 = -0.73

and all of the patients in both of these distributions are high BP patients, so it is better to be as low in the curve as possible: Dr. JinÕs patient is better off relative to his group.

 

LECTURE: We finished looking at material in section 3.4 comparing z scores and interquartile range and 3.5 boxplots.

ANSWER to the box plot example in class:

Data 3 5 15 32 34 36 40 42 43 45 48 52 67 75

a. Q2 = (40+42)/2 = 41     Q1 = 32       Q3 = 48

IQR = Q3 – Q1 = 48 – 32 =  16

Left fence = 32 – 1.5(16) =  32 – 24 = 8 (so outliers on the left are 3 and 5)

Right fence = 48 + 1.5(16) = 48 + 24 = 72 (so outlier on the right is 75)

Number line below box plot shows min, Q1, M, Q3, max:

 

     * * [         -----------------

 

 

-------------------       ]  *

            

0         10         20         30         40         50         60         70         80

 

Data appears to be skewed left.

 

MORE EXAMPLES (to guide you in your hmk!):

3.4 p162 #21 The mean of this sample data is 3.99 and std. dev. is 1.78. Notice they put the data in order by columns, so you do not need to list it again!

a. z = (0.97- 3.99)/178 = –1.70

b. Q2 = (3.97+4)/2 = 3.99     Q1 = (2.47+2.78)/2 = 2.63    Q3 = (5.22+5.50)/2 = 5.36

IQR = Q3 – Q1 = 5.36 – 2.63 = 2.73

Left fence = 2.63 – 1.5(2.73) = –1.47

Right fence = 5.36 + 1.5(2.73) = 9.46

so no outliers (data values outside the ÒfenceÓ).

 

3.5 p170 #11

The data in order are:

0.598, 0.600, 0.600, 0.601, 0.602, 0.603, 0.605, 0.605, 0.605, 0.606, 0.607, 0.607, 0.608, 0.608, 0.608, 0.608, 0.608, 0.609, 0.610, 0.610, 0.610, 0.610, 0.611, 0.611, 0.612.

a. Q2 = 0.608     Q1 = (603+605)/2 = 604     Q3 = (610+610)/2 = 610

IQR = Q3 – Q1 = 610 – 604 =  6

Left fence = 604 – 1.5(6) =  595 (so no outliers on the left)

Right fence = 610 + 1.5(6) = 619 (no outliers on the right either)

Number line below box plot shows min, Q1, M, Q3, max:

 

------------ 

 

 

-----

            

.598       .604      .608 .610 .612           

 

Data appears to be skewed left.

 

READING IN THE TEXT:

3.4 p157-160 percentiles, quartiles, and outliers discussion.

3.5 p164-167 read about how to construct a box plot from the quartiles in section 3.4 (see blue box on p165) and how the box plot gives a nice visual of data that is easier to construct than a histogram (see pictures on p167).

 

Homework due Tuesday 09/27:

1. 3.4 p162 #20,

2. 3.4 p162 #22 (given that mean of sample is 10.08 and std. dev. of sample is 1.89)

3. 3.5 p169 #6,

4. 3.5 p170 #12 (data in order 1.01, 1.34, 1.40, 1.44, 1.47, 1.53, 1.61, 1.64, 1.67, 2.07, 2.08, 2.09, 2.12, 2.21, 2.34, 2.38, 2.39, 2.64, 2.67, 2.68, 2.87, 3.44, 3.65, 3.86, 5.22, 6.81)

5. Ch3 Review p173 #2

 

TEST #2 FORMAT

Formulas to be provided:

3.2 p133-136 formula for std. deviations of samples,

3.2 p139 the numbers 68/95/99.7 from the empirical rule, ChebyshevÕs inequality

3.4 p155 z scores for populations

3.4 p160 lower and upper fences

Questions:

 1. Short answer questions about means of samples and populations, medians, modes, and distribution shape, resistance to skewing, meaning of stadardization, and measures to report,  (see text 3.1 p118/119 ex1, p122 definition and table 4, p129 #24,42, 3.2 p142 #8 and 3.4 p155 definition, p159 summary table, p163 #30).

 2. short answer like 3.1 p126 #18 match pictures of distributions with table data.

 3. like 3.1 p125 #16 find the mean median and mode.

 4. like 3.2 p142 #11-16,20 where you are given a set of data and asked to find the mean and std. deviation of a sample (using deviation formulas, not computational formula).

 5. like 3.2 p144/145 #35-38 plus supplemental parts, using the empirical rule to find areas.

 6. like 3.2 p145 #39-40 use of ChebyshevÕs inequality.

 7. short answer comparison of areas to left/right of z values (already standardized) to tell which is bigger.

 8. like 3.4 p161 #9-14 compare z scores and relative placement.

 9. like 3.4 p162 #22 and 3.5 p170 #12 to find quartiles, IQR, fences, outliers and make a box plot.

10. like 5.1 p233/234 #18, 34 (part of TuesdayÕs material).

11. like 5.2 p248 #42abc, 44abc to find probabilities from a table of data (part of TuesdayÕs material).

 

T 9/20

LECTURE:

We started looking at how we can put the mean and standard deviation to work now that we know how to calculate them. Overall, we want to be able to make any normal distribution have the same center (mean) of 0 and same spread (std. deviation) of 1. Once distributions are on the same scale, you can compare them.

 

READING IN THE TEXT:

3.4 p155 (thru example 1) z-score definition and formula (note that z values measure the number of std. deviations that an x data value lies from the mean).

We also talked about making area comparisons without being able to look up the areas on a table yet. The material from 7.1 p330-332 approximates this discussion. Look at how z calculations are found, given the mean and std. deviation, then look at figure 10 to see a picture of how the original x data is transformed to z data that is centered at 0 and has std. deviation of 1. Once areas for different distributions are on the same scale, you can compare them (i.e., is one area contained within the other on a shared graph? The area to the left of z = 2 is larger that the area to the left of z = 1).

 

Homework due Thursday 09/22: (I am putting the points each problem will be worth to emphasize that you should not just skip the last problem – there is a lot to do in it!)

1. (2pts)Which is larger, the area associated with values less than 55 for a distribution with a mean of 80 and std. deviation of 10, or the area associated with values less than 50 for a distribution with a mean of 80 and a std. deviation of 15?

2. (2pts) 3.4 p161 #12

3. (2pts) 3.4 p161 #14

4. (4pts) 3.4 p163 #30 (see table of data from 3.1 p127 #24 and 3.2 p143 #26)

a. Given the info from those previous problems that the population mean of the data in the table is 26.4 and the population std. deviation of the data in the table is 12.8, find the z-scores for each data point in the table (you should end up with 9 z scores).

b. Find the mean of the 9 z-scores from part a. (sum the + and – values as they are).

c. Find the std. deviation of the 9 z-scores from part a. using the computational formula for population std. deviation as in table 10 on p134 (remember to divide by N, not n-1).

 

Th 9/15

In-class exercise on Empirical Rule p139:

For a normal distribution with mean 75 and std. deviation 3,

a. what are the values that mark 1, 2, and 3 std. deviations on either side of the mean?

   1 std. deviation:  75 – 3 = 72 and 75+3 = 78

   2 std. deviations: 75 – 2(3) = 69 and 75 +2(3) = 81

   3 std. deviations: 75 – 3(3) = 66 and 75 + 3(3) = 84

b. what is the area under the curve for values between 72 and 81?

   between 72 and 75 lies half of the 0.68 for one std. deviation, or 0.34 and

   between 75 and 81 lies half of the 0.95 for two std. deviations, or 0.475,

   so the area between 72 and 81 is 0.34 + 0.475 = 0.815 (81.5%)

c. what is the area under the curve for values between 69 and 72?

   between 69 and 75 lies half of the 0.95 for two std. deviations, or 0.475 and

   between 72 and 75 lies half of the 0.68 for one std. deviation, or 0.34, and

   the area between 69 and 75 contains the area between 72 and 75,

   so we must subtract the smaller area from the larger area to find the area in between:

   0.475 – 0.34 = 0.135

 

MORE EXAMPLES:

Suppose a distribution has a mean of 120 and a std. deviation of 25:

a. what values are 1 std. deviation from the mean (z = -1 and z = 1)?

Answer: 120 – 25 = 95 and 120 + 25 = 145

b. what values are 2 std. deviation from the mean (z = -2 and z = 2)?

Answer: 120 – 2(25) = 70 and 120 + 2(25) = 170

c. what values are 3 std. deviation from the mean (z = -3 and z = 3)?

Answer: 120 – 3(25) = 45 and 120 + 3(25) = 195

d. what % of scores is greater than 170?

Answer: the area between the mean and z = 2 is 0.95/2 = 0.475, so the area under the curve to the right of x=170 (or z = 2) is 0.50 – 0.475 = 0.025

e. what % of scores is less than 170?

Answer: from above, 1 – 0.025 = .975

f. what % of scores is greater than 70?

Answer: the area between z = -2 and the mean is 0.95/2 = 0.475, so the area under the curve to the right of x=70 (or z = 2) is 0.475 + 0.50 = 0.975

g. what % of scores is between 70 and 145?

Answer: the area between x=70 (or z= -2) and the mean is 0.95/2 = 0.475, and the area between the mean and x=145 (or z=1) is 0.68/2 = 0.34, so the area under the curve between 70 and 145 is 0.475 + 0.34 = 0.815

h. what % of scores is between 170 and 195?

Answer: the area between x=170 (or z= 2) and the mean is 0.95/2 = 0.475, and the area between the mean and x=195 (or z=3) is 0.997/2 = 0.4985, so the area under the curve between 170 and 195 is 0.4985 – 0.475 = 0.0235

 

READING IN THE TEXT:

From today, read 3.2 p139-141 (including example 8) read about the Empirical Rule and ChebyshevÕs inequality.

If you want to read ahead for next week, we will be skipping 3.3, but will cover

3.4 p155-156 z-score comparisons

3.4 p157-160 quartiles, fences, and outliers

3.5 p163-167 box plots

and move on to Ch. 5 if we have the time.

 

Homework due Tuesday 09/20 (I am adding extra questions to 36 and 38 like the examples I included above to give more practice with finding areas):

3.2 p145

#36 do parts a, b, c, and also do the following:

   d. what % of scores is less than 743?

   e. what % of scores is between 287 and 857?

   f.  what % of scores is between 173 and 743?

#38 do parts a, b, c, d, and also do the following:

   e. what % of bolts is greater than 4.007?

   f.  what % of bolts is greater than 3.993?

#40 (use the formula on p140 and try #39 for practice and check answers in back of book!)

 

 

T 9/13

Participation note: In class, I expect students need instruction and will give full attention to the material. I see students texting above and below the table, and I donÕt like to police such behavior, but I do make note of it and remind them of what I have seen when they tell me they do not understand the material or when they do poorly on tests. I also remember while I am grading.

 

LAST HMK:

As we went over hmk in class, I did not have time to show you the whole frequency tally and histogram. Here it is for your reference:

A class size of 4 will sufficiently spread out the data. You may choose something else!

class

tally

frequency

90-93

I I I

3

94-97

I I

2

98-101

I I I I I I I I

8

102-105

I I I I I I I I I I I I I I

14

106-109

I I I I I I I I

8

110-113

I I I I I I I

7

114-117

I

1

118-121

I

1

 

A primitive histogram is all I can do here (yours would have a scale on the x and y axes and would have bars, not XÕs):

 

 

 

 

X

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

X

 

 

 

 

 

 

X

X

X

 

 

 

 

 

X

X

X

X

 

 

 

 

X

X

X

X

 

 

 

 

X

X

X

X

 

 

 

 

X

X

X

X

 

 

X

 

X

X

X

X

 

 

X

X

X

X

X

X

 

 

X

X

X

X

X

X

X

X

 

The mean = 104.1 and the median = 104

Mean is good measure of central tendency when the distribution is symmetric as this one is.

 

IN-CLASS WORK:

We worked on the concept of ÒskewnessÓ, then moved to looking at the two most important numbers that describe a distribution:

measure of center: mean, and

measure of spread: standard deviation.

These numbers will be extremely important later in Ch.7 and beyond for normal distributions.

We will form the sampling distribution (made up of all possible samples of a certain size that can be taken from a population), and it will be normal regardless of the shape of the population. Then we will use the mean and std. deviation to make the distribution into a standard shape and read off areas under this standard normal curve from a table. I mention this now because in the last homework on 3.1, they started asking you to think about means of different samples taken from a population, and in this section 3.2, they are asking you to think about the mean and standard deviation as important for a good reason that you donÕt know yet, but that I want you to be anticipating!

 

READING IN THE TEXT:

Read p131-138 (skipping empirical rule and ChebyshevÕs inequality for now).

Be careful that the book separates variance from standard deviation (standard deviation is the square root of variance), whereas in class, we did not separate them. Also make note that the formulas for populations and samples are a little different:

 

--the symbols used for mean and standard deviation:

mean of a sample:

mean of a population:

standard deviation of a sample:  s

standard deviation of a population:

 

--the standard deviation calculations:

for samples you divide by the Òcorrection factorÓ n-1

for populations you divide by the whole population size N

 

We did a problem in class much like the one in example 4 p136, where we computed standard deviation of a sample in two ways: once using individual deviations and once using the computational formula (which employs the Sxx we used for finding lines of best fit). You see that you get the same answer. In example 3 p134, they are doing the same thing, but to a population instead of a sample, so they do not subtract 1 from the size.

 

Homework due Thursday 09/15:

Most of the problems in this section have data sets too large for easy hand-calculations, so we will limit ourselves to the beginning problems with smaller data sets! (I am asking for only one method for std. deviation in each problemÉdo both only if you have the time and the desire!):

3.2 p142

#8 fill in the blanks with vocabulary from the reading

#12 find s (sample std. dev.) using a deviations table as in table 11 p136

#16 find  (population std. dev.) using a deviations table as in table 9 p134

#20 find s (sample std. dev.) using the computational formula as in table 12 p136

 

Th 9/08

Congratulations on completing 1/5 of the semester (it goes fast, doesnÕt it?)!

Before starting the test, I introduced the concept of ÒcenterÓ, or average, of a data set. The three types of centers are the mean (quantitative), the median (quantitative), and the mode (quantitative or qualitative). The mean is the usual arithmetic average that you used when you were finding lines of best fit. The median is the middle value of a set of data in ascending numerical order (for odd numbers of data it is a data point, but for even numbers of data, find the mean of the two data points nearest to the center). Some sets you are given in the exercises are not in order to start with! The mode is the data value that occurs most frequently. Please read about them and do some homework with regard to them for Tuesday. We will learn about standard deviations from 3.2 with an in-class exercise on Tuesday.

 

READING IN THE TEXT:

3.1 p117-125.

Some main issues:

p118 – In example 1, notice how the average from a sample of 4 students was 80, whereas the average of the whole population of 10 students was 79. There are many possible samples of size 4, and most will yield an average close to 79, but few, if any, will be exactly 79. This is preparing you to think about the sampling distribution in chapter 8.

p119 – In the second activity box, look at the mean as the Òcenter of gravityÓ. This concept will be used in section 3.2 to motivate standard deviation.

p120 – Before you start finding the median, make sure you check step 1: the data must be in ascending order of smallest to greatest.

p122 – Look at figure 7 and read about how the median is resistant to outliers that ÒskewÓ the data. The info in this figure will be asked about on Test #2.

 

Homework due Tuesday 09/13:

3.1 p125 #16, 18, 24, 30, 42

Hints:

#18, refer to figure 7 p122,

#24b, choose 3 different groups of 4 people at a time and find a mean for each sample of 4, where the samples can share people but must have at least one person different from another sample—donÕt worry too much about how to choose at random: just pick some!

 

 

T 9/06

Reminders:

Today is the last day to drop the course. Withdrawals after today go on your record.

 

Please make sure you have read the course syllabus at

http://www.smccd.edu/accounts/callahanp/syl160F11.html

(some questions being asked are answered there and should not be asked!).

 

Homework is to study for the test (no problems to turn in).

 

Test #1 will occur as scheduled on Thursday 09/08 and will cover the material from lecture, class exercises, homework, and quizzes from our first 5 class sessions.

You will be provided with formulas for slope, point-slope, slope-intercept, general form for the exponential, and the set of formulas for lines of best fit, e.g., Sxx, Syy, etc. Paper will be provided; you need only bring a standard scientific calculator and something to write with. You will not be allowed to use calculator cell phones, PDAs, or other transmission-capable devices. You may not share one calculator amongst more than one test taker. If you forget your calculator, see if someone can lend you an extra or let you borrow theirs after they turn in their test. Below each problem type is a reference to where in the book you can find more problems to practice. Be prepared to perform the following:

 

1. Given a set of linear scatterplot data, plot the points on a given graph, draw an estimate of the line of best fit, and estimate the equation of the line using two points.

(See 4.2 p195-196)

I will give you the Òsummation tableÓ for x, y, xy, and x squared for the scatterplot data already filled out and you will use the equations for Sxx, Syy, etc. to find the actual line of best fit.

(We simplified the calculations in class with the alternate book formulas. See left sidebar on p198 for alternate formula and p206 #26 for a manageable set of data on which to use the formulas for lines)

 

2. Given a set of exponential scatterplot data, turn it into a set of linear scatterplot data using logs, graph the linear data, draw a best-fit estimate and be ready to estimate the equation of the best line you drew. One or two follow-up questions may ask you to plug in a given x or y value to your equation and solve for the other.

(Supplemental material from class: see notes below from 1st week)

 

3. Given the line of best fit for some ÒloggedÓ data (x, logy), find the exponential of best fit for the original data (x, y) by ÒunloggingÓ the slope and y-intercept.

(Supplemental material from class: see notes below from 1st week)

 

4. Some short-answer questions on chapter 1.1 and 1.5 reading and definitions, including statistics, samples and populations, qualitative and quantitative variables, discrete and continuous variables, and bias in sampling.

(See 1.1 p12 #21-36, 45-48 and 1.5 p43 #13-24)

 

5. Given a set of data, find/show various parts of the following: relative frequencies, frequency and relative frequency bar graphs (may include side-by-side comparison of two sets of data), and why one representation is better than another.

(See 2.1 p72 #17-29)

 

6. Given a small set of data, construct classes of a given width and starting place and form the resulting frequency distribution. Be prepared to provide a histogram if asked for.

(See 2.2 p95 #37-40)

 

7. Answer short questions regarding use of areas in graphics, and how vertical scaling affects perception.

(See 2.3 p106 #1-7, 10-12) (skip time-series plots).

 

TodayÕs material from 2.3 and more examples:

Today we looked at section 2.3 (graphical misrepresentations of data) to conclude the material for the test. You can read about this in 2.3 p100-106. Among other considerations, look for vertical scale manipulation, graphics that are unclear or sized incorrectly, incorrect use of areas (which are difficult to perceive as they change in both length and width). We looked at 2.3 p106 #2, 4, 10 (some more are mentioned below also):

#1 The heights of the podiums (podia?) do not correspond to the numbers they are supposed to represent.

#2 The heights of the podiums do not look proportional with the numbers, are the graphics supposed to be included in the perceived heights of the ÒbarsÓ?, the graphics distract one from focusing on the numbers, what is the y-scale?, why doesnÕt the beer have a bar?, the width of the burger gives an unfair visual comparison with the slenderness of the beer bottle, etc. It is better to let one dimensional equal-width bars show the comparison and avoid graphics and changing volumes of pictures.

#3 Watch out for the vertical scale not starting at 0! A graph with 0 included would show a much less dramatic decrease in the numbers.

#4 Same complaint as #3, causing it to look like there is a much more dramatic increase from one category to another than there actually is. The figure for 25-35 looks like it is less than half that for the 35-45 year olds, but comparing the % values, 10% to 13%, they are not quite as different as they look!

#7 Same complaint as #3 and #4. By starting the scale at 0.1 instead of 0, the 25-35 bar looks to be more than three times the size of the 45-55 bar. Redraw it starting at 0 and see what it should really look like!

#10 Comparing the numbers, 696.3 is about 93 times larger than 7.5, but volume-wise, the bigger barrel can only fit maybe 5 or 6 of the smaller barrels put into it. Graphics involving area change both in width and height (unlike bars that just change in height) and are very deceptive!

#11 The bars do not seem to obey a set scale.

 

 

Th 9/01

I placed 2 weeks of material here to give you time to get access to the book, but as I advised you, we have reached the point that I must assume you made timely plans to gain access to the book. If you do not have the book yet and the bookstore is on order, you should buy an ebook to have instant access. If you are waiting for an order to come in, you should have made plans to get the assignments from someone in class.

The Pearson site shows the book and its features at:

http://www.mypearsonstore.com/bookstore/product.asp?isbn=0321641876

and the Coursesmart site shows where to purchase the ebook:

http://www.coursesmart.com/0321644832/?a=1773944

 

We spent time after the quiz talking about the vocabulary of Ch.1 in 1.1 and 1.5. Now that you have the book, you should go back and read these sections to fill in the ideas. Sometime soon, you should also read the pages in 1.2, 1.3, and 1.4 that I suggested last time. We may not do concrete problems there, but the material is important to browse for your general knowledge of statistics.

 

TodayÕs in-class exercise was intended to show you the similarities and differences between graphical representations of qualitative and quantitative data along with how to choose classes.

In 2.1, qualitative data is grouped naturally by the categories that describe the data and is shown using bar graphs (where there is space between bars on the x line since this data can be put in any order and does not suggest that one category picks up where another left off).

In 2.2, quantitative continuous data must be grouped into categories, so you choose a class width from a beginning class limit to divide up the data into manageable and equally-sized categories. The data is then placed into these classes and the number of data in each is counted (frequency) and then displayed on a histogram (where the x line is a number line so there is no space between the bars because the categories must be ordered according to magnitude).

We ran out of time for the second half of the exercise about continuous data, but you can see the answers in the back of the book since the exercise was problem 2.2 p95 #39.

Briefly,

1st part: classes of width 10 give:

20-29  1 

30-39  111111 

40-49  1111111111 

50-59  11111111111111 

60-69  111111

70-79  111

pg 83 has histogram examples, but you can also see the histogram by turning the tally marks above 90 degrees counterclockwise!

2nd part: 

classes of width 5 are: 20-24, 25-29, 30-34  etc.

and if you draw a histogram for this frequency distribution, it has the same general shape as the one from part a above, but spreads the data out more and contains more peaks and valleys that show more about the original data. As the book points out, there is no one best way to divide up the data. You pick what you think shows the spread of the data best.

 

READING IN THE TEXT:

2.2 p78-83

2.2 p88-89 identifying the shape of a distribution

 

Homework due Tuesday 9/6:

2.2 p91-95 #2, 4, 6, 12, 14, 30, 34, 38

 

We will work on selected other topics from sections 2.2, 2.3, and possibly 3.1 on Tuesday if you want to do some looking ahead. Note that your first test will be on Thursday and it will cover the material up to today and some of what we do on Tuesday.

 

T 8/30

READING IN THE TEXT:

By Thursday, I will assume that everyone has obtained access to the text and can do some reading. Until then, I am putting some definitions here from the text and will ask you to try some exercises involving identifications with these definitions. We will talk more about them as we go over the homework in class. The in-class exercise today was about bar graphs (the material from 2.1) and one of the homework problems at the bottom is from there.

 

1.1 p3-8, noting definitions especially:

A sample is a subset of the population being studied, and we make inferences about the population based on that sample.

Variables are the characteristics of the individuals within the population.

Qualitative, or categorical variables allow for classification of individuals based on some attribute or characteristic. Quantitative variables provide numerical measures of individuals, where arithmetic operations can be performed and provide meaningful results.

Examples: If we consider ÒgenderÓ as a variable, we would say it is qualitative because the categories of male and female are not numerical measures. If we consider ÒtemperatureÓ as a variable, we would say it is quantitative because we can measure temperature numerically and perform arithmetic operations on the values. But be careful! If we consider Òzip codeÓ as a variable, it may look like it is quantitative because you see numbers, but actually, it is qualitative since the numbers just helps to categorize locations and you would not perform arithmetic operations on them (such as adding two zip codes together, which would have no meaning).

A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number of possible values, where countable means that the values result from counting, such as 0, 1, 2, etc. A continuous variable is a quantitative variable that has an infinite number of possible values that result making measurements.

Examples: The number of cars that go through a fast-food line is discrete because it results from counting, but the number of miles a car can travel with a full tank of gas is continuous because the distance would have to be measured.

 

To read when you get your book (no homework problems on these):

1.2 p16-17, about observational and designed experiments

1.3 p23-26 up to ex. 3, about simple random sampling

1.4 p30-35 about sampling methods

 

1.5 p38-42 about bias in sampling:

sampling bias uses a technique that favors one part of a population over another,

undercoverage (flawed sampling technique) causes a segment to not be fully representative of the whole population,

nonresponse of sample subjects to surveys causes error that may or may not be minimized by callbacks and incentives.

response bias can result from respondents not feeling comfortable with interviewers, respondents misrepresenting facts or lying, and questions that are leading in the way they are phrased (poorly worded questions).

Example: A policeman asks students in a classroom to fill out a survey involving whether they have used drugs and what kinds they have used. Anonymous discussion of the results will follow. Response bias could occur if the students feel uncomfortable giving this information to a policeman, or if students misrepresent facts because they either donÕt want to face their problems or want to appear cool to their friends.

 

2.1 p63-66 and example 6 p68-69 about frequency and relative frequency, bar graphs, and pie charts:

A frequency distribution lists each category of data and the number of occurrences for each category of data.

The relative frequency is the percent of observations within a category and is found by dividing the category frequency by the sum of all the frequencies in the table.

A bar graph categories labeled on the horizontal axis and frequencies on the vertical axis, with bars extending from the horizontal axis to the height that is the frequency and where bars are usually not touching but are of the same width.

A side-by-side bar graph can be used to compare data sets and should use relative frequencies to ensure that the sets are being measured on the same scale, where bars being compared from the same category usually have no space between them but space is still left between different categories.

 

Homework due Thursday 9/1:

1.1 p12

#26 Consider Òassessed value of a houseÓ as a variable. From the definitions and examples above, would it be called qualitative or quantitative?

#28 Consider Òstudent ID numberÓ as a variable. From the definitions and examples above, would it be called qualitative or quantitative?

#32 Referring to the definitions and examples above, is the quantitative variable Ònumber of sequoia trees in a randomly selected acre of YosemiteÓ discrete or continuous?

#34 Referring to the definitions and examples above, is the quantitative variable ÒInternet connection speed in kilobytes per secondÓ discrete or continuous?

#36 Referring to the definitions and examples above, is the quantitative variable ÒAir pressure in pounds per sq. inch in a tireÓ discrete or continuous?

1.5 p43 Consider the type of possible bias for each of the following:

#14  The village of Oak Lawn wishes to conduct a study regarding the income level of all households within the village. The manager selects 10 homes in the southwest corner of the village and sends out an interviewer to the homes to determine income.

#16  Suppose you are conducting a survey regarding the sleeping habits of students. From a list of registered students, you obtain a simple random sample of 150 students. One survey question is Òhow much sleep do you get?Ó.

#18 An ice cream chain is considering opening a new store in OÕFallon. Before opening, the company would like to know the percentage of homes there that regularly visit an ice cream shop, so the researcher obtains a list of homes and randomly selects 150 to send questionnaires to. Of those mailed out, 4 are returned.

2.1 p72

#20 A survey asked 770 adults who used the internet were asked about how often they participated in online auctions. The responses were as follows:

frequently  54

occasionally  123

rarely  131

never  462

a. construct a relative frequency distribution.

b. what proportion never participate?

c. construct a frequency bar graph.

d. construct a relative frequency bar graph.

2.1 p72

#22 A survey of U.S. adults in 2003 and 2007 asked ÒWhich of the following describes how spam affects your life on the Internet?Ó

   Feeling         2003   2007

Big problem     373     269

Just annoying   850     761

No problem      239     418

DonÕt know      15        45

a/b. Construct the relative frequency distributions for 2003 and 2007.

c. Construct a side-by-side relative frequency bar graph.

d. Compare each yearÕs feelings and make some conjectures about the reasons for similarities and differences.

 

 

Th 08/25

Supplementary notes, followed by homework:

 

Finding the exponential equation of best fit for a scatterplot of seemingly exponential data:

 

This exponential part is in the more expensive version of this text, but should not be skipped, so I will supplement your version with notes in class and here.

 

If you have a scatterplot of linear data, you saw in class last time that it was relatively easy and accurate to estimate the line of best fit from a graph and also find the best fit line using the equations from last time. However, if you have a scatterplot of data that is best described by an exponential curve, it is difficult to draw a good curve and you wouldnÕt know how to find its equation because it does not have a constant slope (i.e, you could not take two points and use the slope formula or point-slope form!).

 

But if you take the logarithm of each y value in the exponential data (leaving the x values the same), that is, turn (x, y) into (x, logy) in the table, you will have transformed it into linear data! From here, you could use the equations for line of best fit to find y = mx+b for the ÒloggedÓ data (x, logy). Then you can ÒunlogÓ the slope m and y-intercept b to find the ÒaÓ and ÒbÓ in y= b(a)x for the original data.

 

We did an example of this process in class. Here is another example, but with data that is not perfectly exponential as it was in class:

The following below set of data is best represented with an exponential relationship y= b(a)x but since it is not perfectly exponential, we cannot write the equation from the table values.

 

x

y

1

398.11

3

199.53

4

156.49

 

It is difficult to estimate an exponential scatterplot relationship, but it can be turned into a linear relationship by taking the logarithm of the y values (graph it if you donÕt believe it –unfortunately, I cannot show the graphing part here!).

  

x

y

1

log(398.11)=2.60

3

log (199.53)=2.30

4

log (156.49)=2.19

 

Now we can find the best fit line for this (x, logy) linear scatterplot by making a summations table with and use the standard deviation calculations (Sxx, etc.) for finding the best fit line.

 

x

y

xy

x2

1

2.60

2.60

1

3

2.30

6.90

9

4

2.19

8.76

16

x=8

y=7.09

xy=18.26

x2 =26

 

=8/3=2.67 and =7.09/3=2.36

Sxx = 26-(64/3)=4.67

Sxy = 18.26-[(8)(7.09)/3]= -0.65

slope = -0.65/4.67 = -0.14

y-intercept = 2.36-(-0.14)( 2.67) =2.73

So the best fit line for the logged data is y=-0.14x+2.73

 

To find the best fit exponential for the original data, ÒunlogÓ the slope and y-intercept of the line above: raise 10 to the power of each separately and then write the equation for the exponential of best fit for the original table data (x, y).

a=10slope =10-0.14 = 0.72

b=10y-intercept = 102.73= 537.03

So the best fit exponential for the original data is y= b(a)x = 537.03(0.72)x.

 

(Check your answer: does plugging x=1 into your best fit exponential give you something close to the original table value of 398.11? It shouldnÕt be exact because the original data was not perfectly exponential, but it should be in the ballpark! Same for the other two points.).

Notice that this is a decreasing exponential relationship. For decreasing exponential relationships, a<1 and for increasing exponential relationships, a>1. For decreasing linear relationships y=mx+b, m is negative and for increasing ones, m is positive.

 

Linear and exponential patterns (not in book):

 

A linear relationship y=mx+b is built by repeated addition. We add positive numbers for an increasing line (positive slope) or negative numbers for a decreasing line (negative slope).

An exponential relationship y= b(a)x is built by repeated multiplication. The a in the exponential is the amount by which we multiply each time (ÒaÓ contains the rate of increase or decrease since a=1+r or 1-r, where r is the rate). As with lines, b is the y-coordinate of the y-intercept, which is the point that has x = 0. Intercepts are included in each of the following example tables.

 

The following are some tables of data to illustrate what sets of linear and exponential data look like and how their equations are written:

 

x

y

0

12

1

9

2

6

3

3

 

is a decreasing linear set of data because you are adding -3 each time, so y=-3x+12. (Verify that the points in the table lie on this line by plugging the values in and checking them).

 

x

y

0

20

1

27

2

34

3

41

 

is a increasing linear set of data because you are adding +7 each time, so y=7x+20.

  

x

y

0

50

1

75

2

112.5

3

168.75

 

is a increasing exponential set of data because you are multiplying by 1.5 each time so

y= 50(1.5)x.

 

x

y

0

250

1

225

2

202.5

3

182.25

 

is a decreasing exponential set of data because you are multiplying by 0.9 each time so

y= 250(0.9)x.

 

Brief practice solving for variables in linear and exponential problems from Algebra:

 

Given the equation of a line y = 5x + 7,

If x = 4 is given, we can solve for y: y = 5 (4) +7 = 27.

If y = 9 is given, we can solve for x. Since 9 = 5x + 7 subtract 7 from both sides to get 2 = 5x. Then divide both sides by 5 to solve for x: 2/5 = x.

 

Given the equation of an exponential y= 12(5)x,

If x = 3, then we can solve for y: y = 12(5)3 = 12(125) = 1500.

If y = 24, then we can solve for x, but it involves logarithms to rescue x from being an exponent: 24 = 12(5)x. Divide both sides by 12 to get 2 = (5)x. If you take the logarithm of both sides of the equation, you get log 2 = log (5)x. Properties of logs give you log 2 = x log (5). So to solve for x, divide both sides by log 5 to get x = log 2 / log 5 , which by calculator is about 0.43.

 

 

Homework (due Tuesday 08/30):

1. Treat the following table data as forming a linear scatterplot:

 

x

y

30

5

40

60

55

140

 

a. Sketch the points on a hand-drawn graph (just on binder paper), draw what you think is the line of best fit, and write the equation of your line.

 

b. Fill out a summation table and find the line of best fit using the equations Sxx, Sxy, etc.

 

2. Treat the following table data as forming an exponential scatterplot (not perfectly exponential, but best described by an exponential function). Refer to the example given above in the notes:

 

x

y

0

25.00

1.5

40.20

2

92.50

3.2

167.30

4

294.35

 

a. Take the original (x, y) values in the table and make a new table (x, log y).

That is, find log 25, log 34.12, etc.

 

b. Find the line of best fit for the values in the (x, logy) table using Sxx and Sxy.

Hint: you should get the following summations to plug in (find and check them for yourself):

x= 10.7    y= 9.66    xy= 23.33    x2 = 32.49

(For y notice that you are not summing the original y values to get 619.35— you sum the logged y values to get the y summation of 9.66!):

 

c. ÒUnlogÓ the slope m and the y intercept b from the best fit line for the (x, logy) data in part b to get the Òa and bÓ that are the components of the best fit exponential y= b(a)x for the original (x, y) data using a=10slope and b=10y-intercept .. Does this equation look like it describes the data well? Compare with a graph of the original data.

 

d. Use the equation for the best fit exponential from part c to estimate the value of y

when x is 2.5.

 

e. Use the equation for the best fit exponential from part c to estimate the value of x

when y is 300.

 

 

T 08/23

We worked on linear scatterplots. Some supplementary notes follow since I wonÕt assume you have the book until next week.

 

Reading in the text:

--Read 4.1 objective 1 ÒDraw and interpret scatter diagramsÓ

--Read 4.2 example 1 and objective 1 Òfind the least squares regression lineÉÓ and note that I am using the alternate form of the equations in the footnotes.

 

Supplementary notes on lines of best fit:

Given several data points (x,y) you fill out the table below (the data points' coordinates are the x and y values. The symbol   means add them ).

For example, for the data points (2,7) and (4,8) we know that the slope of the line thru them is

1/2 = 0.5 so that y-7 = 0.5(x-2) so that y = 0.5x +6 is the equation of the line thru them, that is, a line with slope 0.5 and y-intercept 6. Now let us use the equations for the line of best fit:

Set up a table with the following quantities and sum them up

 

x

y

xy

x2

2

7

14

4

4

8

32

16

   x=6

y=15

xy =46

x2 =20

 

(n=2 in this short ex. for the 2 data pts given)

=6/2=3 and

=15/2=7.5

After having done this with all of the given data points use all of these numbers to plug into the formulas for the line of best fit (which are below)

 Sxx = x2 - ((x)2/n) = 620 - (36/2) = 2

Sxy = xy - ((x)( y) / n) = 46 - [(6)(15)/2] = 46-45 = 1

Slope of best line = Sxy/ Sxx = 1/2 = 0.5

Y-Intercept of best line =    - (slope)( ) = 7.5-(0.5)(3) =7.5 -1.5 = 6

The best fit line is then y=0.5x+6 (which matches the equation found at the beginning exactly, because 2 points make a line, not a scatterplot!

 

Another example:

For the data pts (1,9), (2,8), (3,6), and (4,3):

 

x

y

xy

x2

1

9

9

1

2

8

16

4

3

6

18

9

4

3

12

16

x=10

y=26

xy=55

x2 =30

 

Note that n=4 is the number of data points

=10/4=2.5

=26/4=6.5

Using the formulas above,

Sxx = 30-(100/4)=30-25=5

Sxy = 55-[(10)(26)/4]=55-65= -10

Slope = -10/5= -2

Y- intercept = 6.5-(-2)(2.5) =6.5+5 =11.5

The best fit line is then y= -2x+11.5