ISED 160 NOTES ABOUT HOMEWORK, ANNOUNCEMENTS, ETC.

 

Assigned on:

BE SURE TO CLICK ON RELOAD/REFRESH ON YOUR COMPUTER OR THE CURRENT ADDITIONS TO THE PAGE MAY NOT APPEAR!

You may also not see current pages if your computer does not have an up-to-date browserÉ download a new version or use a library/lab computer.

Scroll down as new assignments are added to the old. New assignments are generally posted by 2:00 pm of the lecture day unless otherwise noted.

 

 

 

Most current work is listed first, followed by previous entries:

 

T 5/14

Review:

z-tests use row z on the t-table:

 

Classical method for z tests:

If you have a one-sided z test with alpha = 0.01, n = 35 and sample z = 2.45, Match the bottom row z (that says z to the left) with the 0.01 column to find critical z value of 2.326. Reject Ho.

If it had been a two-sided test, take half of the alpha (0.01/2 = 0.005) to put in each tail and look at 0.005 column in row z to find 2.576. Accept Ho.

 

P-value method for z tests:

If you again have the one-sided z test with alpha = 0.01, n = 35 and sample z = 2.45, go to the bottom row z of the t table and find that 2.326 < 2.45 < 2.576, so 0.005 < p < 0.01. Since the p < alpha of 0.01, reject Ho.

If it had been a two-sided test, you would have 0.01 < p < 0.02. Then the p > alpha, so accept Ho.

 

t-tests use df = n-1 row on the t-table:

 

Classical method for t tests:

If you have a one-sided z test with alpha = 0.01, n = 35 and sample z = 2.45, go to row n-1 = 34 and find the critical value of 2.441. Reject Ho.

If you have a two-sided test instead, you would go to the same row, but to the 0.005 column to find the critical value of 2.728. Accept Ho.

 

P-value method for t tests:

If you have a one-sided z test with alpha = 0.01, n = 35 and sample z = 2.45, you look for 2.45 in row 34, and can only find the two closest values to estimate it: 2.441 < 2.45 < 2.728, so 0.005 < p < 0.01, p< alpha. Reject Ho.

If the test had been two-sided, double the values to find 0.01 < p < 0.02, so p > alpha. Accept Ho.

 

Test #5 will occur as scheduled on Thursday 5/16 and will consist of one page of short answer questions (from 10.1, 10.2, 10.3, 11.1 about use of the z and t tables to accept/reject hypotheses with classical/p-methods, and forming of hypotheses and sentence writing), and one page with 2 word problems (z test, t test) to perform complete significance tests.

Some short-answer questions for practice:

Example: Find the critical value in a one-sided z test, n = 45, sample z = -2.59, alpha 0.01?

Answer: 2.326.

Example: a. What is the critical value for a two-sided t-test with n=33 sample t – 3.02 and alpha of 0.01, , and b. do you reject or accept the null hypothesis?

Answer: a.With row 32 and upper tail of 0.005 since half of alpha goes in each tail, the critical values are + and – 2.738, so b. we would reject the null hypothesis.

Example: Estimate the p value for a one-sided z-test, n = 23, sample z= 1.15 and alpha = 0.10.

Answer: Since this is a z value problem, you go down to the bottom row of the table and see that 1.036 < 1.15 < 1.282 so 0.10 < p < 0.15.

Example: In the previous example, would you accept or reject the null hypothesis?

Answer: p > alpha, so accept.

Example: Estimate for the p value for a two-sided t-test with n=31 and sample t= 3.75.

Answer: In row 30, our t is off the table to the right, so we know that the p value is smaller than twice the 0.0005 that is above the last table entry, i.e., p < 0.001.

Example: In the previous example, would you accept or reject the null hypothesis?

Answer: the p value is rare, so we would reject the null hypothesis no matter what alpha given.

Example: Estimate the p value for a two-sided t-test, n=24, sample t=-2.48, alpha is 0.02.

Answer: In row 23, it puts us between columns 0.01 and 0.02 which we must double because it is two-sided, so 0.02 < p < 0.04. Accept the null hypothesis since p > alpha.

Example: Write the hypotheses and sentence of conclusion only for the following situation: The average score on the SAT Math exam is 505. A test preparatory company claims that the mean scores of students who take their course is higher than 505. Suppose we reject the null hypothesis.

Answer: Ho : M=505  Hi : M>505. The company has evidence students who take their course will on average have a higher score than the 505 of all students who take the SAT Math exam.

 

Note:

     Test 5 will not be a dropped grade. Students who take Test 5 as scheduled will be done with the course (no comprehensive final). The final is offered only to give those missing Test 5 a chance to avoid a zero score for that test. It will not be offered as a device to raise grades from the rest of the semester.

     If you miss test #5, you must email me before noon Friday 5/17 and respond to my follow-up emails that day so that we will both know what to expect the following week (finals week) and you will then perform a more difficult comprehensive test during finals week to take the place of test 5 only. If you do not contact me by noon Friday 5/17 or fail to show up for your new test appointment during finals week, you will receive a zero score for test 5.

     Grades: I will post the Test#5 grades on my website under the codes I gave you, hopefully within a week of the test, but I will not post final grades at that site. I will submit final grades to the school sometime before the deadline and you may view yours then at the SF State Gateway (MySFSU).

 

Th 5/9

LECTURE/READING: (Read section 11.1)

We took a look at 11.1 p522 Ex. 2. The matched pairs problems are still t tests as in 10.3, but allow you to compare two sets of data by looking at differences between the data.

For H0 assume there is no difference between the data: Ho: m = 0.

For H1 there is a difference of some kind, either m < 0, m > 0, or m is not equal to 0 depending on the phrasing of the word problem and the order in which the differences are found.

 

Matched pairs word problem example: An agricultural field trial compares the yield of two varieties of tomatoes for commercial use. The researchers divide in half each of 11 small plots of land in different locations (half gets variety A and half gets variety B) and compare the yields in pounds per plant at each location. The 11 differences (variety A minus variety B) give an average of 0.54 and std. deviation of 0.83. Is there evidence at the 0.05 level of significance that variety A has a higher yield than variety B? (Assume differences computed by A yield minus B yield).

Answer:

Ho: m = 0

If we want to show that plot A tomato yields are larger, H1 depends on how the differences are computed. If we subtract the yields in order of plot A – plot B, then

H1: m > 0 because larger numbers minus smaller numbers will give a positive number (> 0).

But if we subtract the yields in order of plot B – plot A, then

H1: m < 0 because smaller numbers minus larger numbers will give a negative number (< 0).

So in this case, H1: m > 0

We compute sample t=2.16. Using row n-1=10 of the t table with right tail area of 0.05 the critical value is 1.812 and the estimate for the p value is 0.025 < p < 0.05 since 1.812 < 2.16 < 2.228. Either way, we reject the null hypothesis. We have found evidence that variety A has a higher yield than variety B.

 

HOMEWORK due Tuesday 05/14 (for each matched pairs t-test below, write hypotheses, calculations, decisions and sentences of conclusion):

1. 11.1 p516 #14c only, given n=6, the mean of the differences (blue-red) is 0.093, s = 0.17.

2. 11.1 p519 # 20 given differences are computed by ÒThrify minus HertzÓ and n = 10, the mean of the differences is -0.259 and s = 9.20.

3. The design of controls and instruments affects how easily people can use them. A student project investigated this effect by asking 25 right -handed students to turn a clockwise screw handle (favorable to right-handers) with their right hands and then turn a counterclockwise screw handle (favorable to left-handers) again with their right hands. The times it took for each handle were measured in seconds, and the 25 differences (clockwise minus counterclockwise) gave an average of -13.32 seconds with std. deviation 22.94. Is there evidence that right-handed people find the clockwise screw handle easier to use? Test at the 0.01 level.

 

Test #5 is a mandatory part of your grade that will occur as scheduled on Thursday 5/16 and is the last graded activity of the semester (if you miss it, you will be offered the opportunity to take a final exam to replace it – this is the only instance in which the final will be used). It will consist of one page of short answer questions (from 10.1, 10.2, 10.3, 11.1 about use of the t table for both z and t tests in accepting and rejecting hypotheses with classical and p-value methods, and forming of hypotheses and sentence writing), and one page with 2 word problems (z test, t test) to perform a complete significance test.

 

T 5/7

The methods of 10.3 are the same as those from 10.2 with one exception. Since you must rely completely on your sample for the std. deviation (instead of having the population std. deviation from previous studies) you must use a different row than the z row at the bottom of the table.

To use the new part of the t table, you take one less than the sample size, df=n-1.

 

EXAMPLES using the t-table:

 

1. What is the critical value for a one-sided test with n=20 and alpha =0.05?

ANSWER: df=20-1=19 and that row with the column of 0.05 gives a critical value of 1.729

 

2. What would the critical value be for the above situation if it were two-sided?

ANSWER: In the same row df=19, you would look at the column with area 0.025, since half of the alpha of 0.05 goes into each tail, and this would give you a critical value of 2.093.

 

3. Find an estimate for the p value for a one-sided test with n=8 and sample t value of 1.15.

ANSWER: df=8-1=7 and in that row, 1.119 < 1.15 < 1.415 so p value is between 0.10 and 0.15.

 

4. Find an estimate for the p value in a one-sided test with n=33 and sample t=0.52.

ANSWER: df=33-1=32, so we look in that row on the new t table to find the next higher and lower numbers with respect to 0.52. But since 0.52 < 0.682 the p value then is greater than the area of 0.25 for the t value of 0.682. That is, p > 0.25.

 

5. Find an estimate for the p value for a two-sided test with n=25 and sample t value of 1.52.

ANSWER: df=25-1=24 and in that row, 1.318 < 1.52 < 1.711 so the right or left tail area for the p value is between 0.10 and 0.05, but we have a two-sided test so we double the areas to get the sum of the left and right tail areas: 0.10 < p < 0.20.

 

6. Perform a complete t test: To find out if it seems reasonable that the local town library is lending an average of 4.2 books per patron, a random sample of 13 people was taken and yielded an average of 4.75 with std. deviation of 1.65 books. Test at the 0.10 level.

Answer: The alternate hypothesis is that m4.2, alpha is 0.10, and we compute sample t=1.20.

Classical method: Using row n-1=12 of the t table with tail areas of 0.05, critical value is 1.782

p-value method: estimate for p is 0.20<p<0.30 (double tail areas) since 1.083<1.20<1.356.

Accept Ho. We have not found any evidence that the library is lending a different avg. number of books than 4.2 per person.

 

HOMEWORK DUE Thursday 5/9

10.3 p487 #6, 10, 12, 16, 24 (Skip a. For b use pop mean 7, sample mean 7.01 and s = 0.0316)

 

 

Th 5/2

LECTURE – We will need the following table for section 10.3, and since it has info for the material from 10.2 included on it, it is the only table we will use going forward:

 

BH269HD:Users:guest_:Desktop:ticket:tablet.pdf

 

Note that the top row and the bottom row have the numbers you were using in the abbreviated table for looking up critical values for z tests.

0.25

0.20

0.15

0.10

0.05

0.025

0.02

0.01

0.005

0.0025

0.001

0.0005

0.674

0.841

1.036

1.282

1.645

1.960

2.054

2.326

2.576

2.807

3.091

3.291

 

For right now, we will just look at those two rows. Use the table symmetrically so that it works for negative z values with areas in the left tail. To perform the p value method, the best we can do is to make estimates.

 

EXAMPLES using the new table:

 

1. What is the critical value for a one-sided z test with alpha =0.05?

ANSWER: 1.645

 

2. What would the critical value be for the above situation if it were two-sided?

ANSWER: With tail area 0.025, since half of the alpha of 0.05 goes into each tail, this would give you critical values of + or - 1.960.

 

3. Find an estimate for the p value for a one-sided z test with sample z value of 1.15.

ANSWER: In row z, 1.036 < 1.15 < 1.282 so p value is between 0.10 and 0.15.

 

4. Find an estimate for the p value in a one-sided z test with sample z = 0.52.

ANSWER: in row z, since 0.52 < 0.674 the p value then p > 0.25. If it had been a two-sided test, P>2(0.25), p>0.50. In either case, those values are common, so one would accept Ho!

 

5. Find an estimate for the p value for a two-sided test with sample z value of 1.52.

ANSWER: 1.282 < 1.52 < 1.645 so the right or left tail area for the p value is between 0.05 and 0.10, but we have a two-sided test so we double the areas to get the sum of the left and right tail areas: 0.10 < p < 0.20.

 

Homework due Tuesday 5/07:

1. a. Find a p value estimate using the new table for a 1-sided z test with sample z =2.45.

    b. If alpha (level of significance) is 0.02, would you accept or reject Ho?

2. a. Find a p value estimate using the new table for a 1-sided z test with sample z =3.50.

    b. Without need of an alpha, would you accept or reject Ho?

3. a. Find a p value estimate using the new table for a 2-sided z test with sample z =1.83.

    b. If alpha (level of significance) is 0.05, would you accept or reject Ho?

4. a. Find a p value estimate using the new table for a 2-sided z test with sample z =0.46.

    b. Without need of an alpha, would you accept or reject Ho?

(For each of the following word problems, perform a complete significance test)

5. section 10.2 p477 #22 (skip part a)

6. section 10.2 p478 #26 (skip part a, use 6.3 for the sample mean)

7. section 10.2 p478 #28 (skip part a)

 

T 4/30

NOTES FOR SECTION 10.1:

How to state the hypotheses from word problems and write the sentence of conclusion.

HYPOTHESES:

H0 is what is accepted as true for the population mean until evidence to the contrary is found.

H1 is what the investigator or researcher is trying to show.

CONCLUSION:

You must state what you have found from the sampleÕs evidence, or lack thereof. Write a grammatically complete sentence with the following elements: Tell

1. if you have Òfound evidenceÓ or Ònot found evidenceÓ against the null hypothesis,

2. about what (what was the subject of the investigation?),

3. with respect to what number (what was the number in question in the hypotheses?).

 

If you are rejecting H0 you have found evidence against H0 and therefore evidence for H1.

If you are accepting H0, you have not found evidence against H0 and therefore have not found evidence to back up your claim H1. Write sentences from the H1 standpoint.

SECTION 10.2 Word problems:

Do what was in 10.1, but throw the LEVEL of significance, CALCULATION of sample z, and the DECISION to accept/reject Ho into the middle!

 

10.1 Example

A Muni bus drives a prescribed route and the supervisor wants to know whether the average run arrival time for buses on this route is about every 28 minutes. Suppose that after we calculate the sample z value the data causes the supervisor to accept H0. Write the hypotheses and the sentence of conclusion.

Answer:

H0: M = 28

H1 : M is not equal to 28

The supervisor has found no evidence that the average run arrival time for buses on this route is significantly different from 28 minutes.

 

10.1 Example:

A manufacturer produces a paint which takes 20 minutes to dry. He wants make changes in the composition to get nicer colors, but not if it increases the drying time needed. Suppose that after he calculates the sample z value the data causes him to reject H0. Write the hypotheses and the sentence of conclusion.

Answer:

H0 : M = 20

H1 : M > 20

The manufacturer has found evidence that the composition change significantly increases the drying time, so he will not make a change. (Notice that he is using the test to pull him away from a bad decision).

 

10.2 Example (whole test):

According to the Highway Administration, the mean number of miles driven annually in 1990 was 10,300. Bob believes that people are driving more today than in 1990 and obtains a simple random sample of 20 people and finds an average of 12,342 miles. Assuming a std. deviation of 3500 miles, test BobÕs claim at the 0.01 level of significance.

Hypotheses:

Ho: population mean = 10300

Hi: population mean > 10300

Level of Significance:

 alpha =  0.01

Data and calculations:

Z=(12342-10300)/(3500/sqroot20)=2.61

Decision:

Classical: alpha of 0.01 gives a critical z value of 2.326 so reject Ho.

P-value: on the z table, 2.61 gives p = 0.0045 so p < alpha. Reject Ho.

Conclusion:

 Bob has found significant evidence that people are driving more today than in 1990, when they drove an average of 10,300 miles.

 

10.2 Example (whole test):

Before shopping for a used Corvette, Grant wants to determine what he should expect to pay. The blue book average is $37,500. Grant thinks the price is different in his area, so he visits 15 area dealers online and finds and average price of $38,246.90. Assuming a population std. deviation of $4100, test his claim at the 0.10 level of significance.

Hypotheses:

Ho: population mean = 37500 

Hi: population mean not equal to 37500 

Level of Significance:

alpha =  0.10

Data and calculations:

z = (38246.90 - 37500)/(4100/sqroot15) = 0.71 

Decision:

Classical: For 0.05 of alpha going in each tail, we find critical z values of +/-1.645. Accept Ho.

P-value: For z of 0.71 on the z table p = 2(0.2389) = 0.4778. Since p > alpha, accept Ho.

Conclusion:

Grant does not have any evidence that the mean price of a 3 yr. old Corvette is different from $37,500 in his neighborhood.

 

HOMEWORK (due Thursday 5/2):

1. 10.1 p461/462  do pair of problems #16 (state hypotheses) and 24 (write sentence),

2. 10.1 p461/462  do pair of problems #18 (state hypotheses) and 26 (write sentence),

3. 10.1 p461/462  do pair of problems #20 (state hypotheses) and 28 (write sentence),

4. Do complete test (hypotheses, level, calculation, decision, sentence): A researcher believes that the average height of a woman aged 20 years or older is greater now than the 1994 mean of 63.7 inches. She obtains a sample of 45 woman and finds the sample mean to be 63.9 inches. Assume a population std. deviation of 3.5 inches and test at the 0.05 level.

5. Do complete test (hypotheses, level, calculation, decision, sentence): The average daily volume of Dell computer stock in 2000 was 31.8 million shares. A trader wants to know if the volume has changed and takes a random sample of 35 trading days and the mean is found to be 23.5 million shares. Using a population std. deviation of 14.8 million, test at the 0.01 level of significance.

 

Th 4/25

Keep working É only 3 weeks of regular session left!

LECTURE NOTES:

Last time, we went over the hmk problems but added the knowledge of how to calculate the sample z value and how to perform the significance test using p-values. A z calculation is on p464 and p-value tests are performed in ex3 p470 and ex4 p472.

The p area is the probability that you would get a value as far away or farther away from the mean as the sample value you got. If p < alpha, reject Ho and if p > alpha, accept Ho

You will need to refer to your old z-table to find the p-values.

Abbreviated table of critical values for reference with the classical approach:

 

0.25

0.20

0.15

0.10

0.05

0.025

0.02

0.01

0.005

0.0025

0.001

0.0005

0.674

0.841

1.036

1.282

1.645

1.960

2.054

2.326

2.576

2.807

3.091

3.291

 

ONE-SIDED EXAMPLE:

H0: m = 45 , H1: m < 45 , alpha = 0.05

sample data: n = 24,  sample mean = 40.8,  population std dev. = 10.5

z = (40.8 – 45) divided by (10.5 / sqrroot 24) = – 1.96

Classical The critical value for 0.05 is 1.645 and 1.96 is farther out, so reject H0.

P-value For z = 1.96 the p area is 0.0250 using the z table. Since p is < alpha, we reject H0 since the sample is more rare in probability of occurrence than the alpha.

 

ONE-SIDED EXAMPLE:

H0: m = 0.045 and H1:  m > 0.045  and alpha = 0.005 sample z= 2.06

sample data: n = 17,  sample mean = 0.0055,  population std dev. = 0.001

z = (0.0055 – 0.005) divided by (0.001 / sqrroot 17) = 2.06

Classical The critical value for 0.005 is 2.576 and 2.06 is closer to center, so accept H0.

P-value The area to the right of 2.06 is 0.0197 so the p value is 0.0197 > 0.005 (p>alpha) so accept H0 since the sample is not as rare in probability of occurrence as the alpha

 

TWO-SIDED EXAMPLE:

H0: m = 35 , H1: m is not 35 , alpha = 0.05

sample data: n = 40,  sample mean = 37.63,  population std dev. = 7.4

z = (37.63 – 35) divided by (7.4 / sqrroot 40) = 2.25

Classical Each tail gets alpha divided by 2 = 0.025  which has a critical z of 1.96. The sample z of 2.25 is farther away from the mean than 1.96 so reject H0.

P-value For a sample of 2.25, the tail area would be 0.0122 using your old z table. So for a two sided test, each tail has .0122 so the p value is 0.0122+0.0122=0.0244. Since 0.0244 < 0.05  so we reject H0.

 

MORE EXAMPLES of p value method only:

1. Ho: m=11 and H1: m 11  and =0.01 sample z= 2.67

The area from the z table for z = 2.67 is 0.0038 and since we are using a two-sided test, we add the areas for +/- 2.67: p = 2(0.0038) = 0.0076, so p < . Reject Ho.

 

2. Ho: m=265 and H1: m < 265  and  =0.01 sample z= -2.25

The area from the z table for z = -2.25 is p = 0.0122, so p > . Accept Ho.

 

3. Ho: m=35 and H1: m > 35  and  =0.05 sample z= 2.23

The area from the z table for z = 2.23 is p = 0.0129, so p < . Reject Ho.

 

4. Ho: m=1.23 and H1 1.23  and  =0.02 sample z= -2.45

The area from the z table for z = 2.45 is 0.0071 and since we are using a two-sided test, we add the areas for +/- 2.45: p = 2(0.0071) = 0.0142, so p < . Reject Ho.

 

5. Ho m=0.045 and H1: m > 0.045  and  =0.005 sample z= 2.06

The area from the z table for z = 2.06 is p = 0.0197, so p > . Accept Ho.

 

6. Ho: m=4500 and H1: m < 4500  and  =0.025 sample z= -1.83

The area from the z table for z = -1.83 is p = 0.0336, so p > . Accept Ho.

 

HOMEWORK (due Tuesday 4/30):

(You did #3-7 below in the last hmk using the classical method, now use the p-value method on them and compare the answers --  you should have the same conclusions!).

1. Compute z with n = 35,  sample mean = 36.2,  pop. mean = 30,  pop. std dev. = 12.9

2. Compute z with n = 2500,  sample mean = 24.9,  pop. mean = 25.3,  pop. std dev. = 8.4.

3. redo 10.2 p476 #12. given sample z = 1.92, find the p-value and accept or reject Ho.

4. redo 10.2 p476 #13. given sample z = 3.29, find the p-value and accept or reject Ho.

5. redo 10.2 p476 #14. given sample z = –1.32, find the p-value and accept or reject Ho.

6. redo 10.2 p476 #16. given sample z = 1.20, find the p-value and accept or reject Ho.

7. redo 10.2 p476 #18. given sample z = 2.61, find the p-value and accept or reject Ho.

 

T 4/23

No new hmk. Study for test as in notes below.

Th 4/18

LECTURE:

Ch 10 Significance tests (10.1 and 10.2):

I gave a handout with the following abbreviated table of critical values, so you do not have to look them up on the table backwards each time you want to do a problem. The top row represents the area in either the left or right tail of the distribution, and the bottom row represents the positive or negative critical value. Refer to it as you look at the example problems below:

 

0.25

0.20

0.15

0.10

0.05

0.025

0.02

0.01

0.005

0.0025

0.001

0.0005

0.674

0.841

1.036

1.282

1.645

1.960

2.054

2.326

2.576

2.807

3.091

3.291

 

One-sided significance test example: If the null hypothesis is that the mean of a population is 45 and the alternate hypothesis is that it is less than 45, we have a one-sided alternate hypothesis: you only care if it is less than 45 and you donÕt care if it is greater.

Ho:  = 35

H1:  < 35

If a level of significance (alpha) is given as  = 0.02

and you take a sample and standardize it to get z= -1.90,

does it give evidence to reject the null hypothesis and therefore accept the alternate hypothesis?

Classical approach The critical value comes from the alpha value. Since we only care about values that stray too far below what is claimed to be the center, we put all of alpha into the left tail. The critical z value with 0.02 area to its left is -2.054 from the table above. Since -1.90 is closer to center, we consider it a routine sample (one that would happen 98% of the time) so there is nothing strange about the center being where it is claimed to be. We accept the null hypothesis.

 

Two-sided significance test example: If the null hypothesis is that the mean of a population is 35 and the alternate hypothesis is that it is not 35 (within a certain amount of acceptable error), we have a two-sided alternate hypothesis: we care if  is significantly higher or lower than 35.

Ho:  = 35

H1:   35

If a level of significance (alpha) is given as  = 0.05

and you take a sample and standardize it to get z=2.25,

does it give evidence to reject the null hypothesis and therefore accept the alternate hypothesis?

Classical approach The critical values come from the alpha value. Since we have a two-sided H1, alpha is divided by 2 to get 0.025  (this is how much goes in each tail of the distribution) and on the table above, you see a critical z of 1.96. Compare the sample z to the critical value of z. Since the sample z of 2.25 is farther away from the mean than the critical value of 1.96, we have evidence to reject the null hypothesis.

 

MORE EXAMPLES:

1. Ho: m=11 and H1: m 11  and =0.01 sample z= 2.67

Half of alpha, 0.005 goes into each tail since the alternate hypothesis is two-sided. The critical values for 0.005 are + or - 2.576 and 2.67 is farther away from center than this, so reject Ho.

 

2. Ho: m=265 and H1: m < 265  and  =0.01 sample z= -2.25

All of alpha, 0.01 goes into the left tail of the distribution since the alternate hypothesis only pertains to values < 265. The critical value for 0.01 is -2.326 and -2.25 is closer to center than this, so accept Ho.

 

3. Ho: m=35 and H1: m > 35  and  =0.05 sample z= 2.23

All of alpha 0.05 goes into the right tail of the distribution since the alternate hypothesis only pertains to values >35. The critical value for 0.05 is 1.645 and 2.23 is farther from center than this, so reject Ho.

 

4. Ho: m=1.23 and H1 1.23  and  =0.02 sample z= -2.45

Half of alpha, 0.01, is put into each of the left and right tails of the distribution since the alternate hypothesis pertains to values not equal to 1.23, that is, both greater than and less than 1.23. The critical values are + or - 2.326 and -2.45 is farther away from center than this, so reject Ho.

 

5. Ho m=0.045 and H1: m > 0.045  and  =0.005 sample z= 2.06

All of alpha, 0.005, is put into the right tail of the distribution. The critical value for 0.005 is 2.576 and 2.06 is closer to center than this, so accept Ho.

 

6. Ho: m=4500 and H1: m < 4500  and  =0.025 sample z= -1.83

All of alpha is put into the left tail of the distribution due to the alternate hypothesis. The critical value for 0.025 is -1.96 and -1.83 is closer to center than this, so accept Ho.

 

HOMEWORK due Tuesday 4/23: 

10.2 p476 (#15-18 in use p-values, but for right now, treat them with critical values as below).

Draw distributions for each with relevant z values and areas (critical values from table above):

12. given part a. sample z = 1.92, do parts b, c, d

13. given part a. sample z = 3.29, do parts b, c, d

14. given part a. sample z = –1.32, do parts b, c, d

16. given sample z = 1.20, do part b using critical values

18. given sample z = 2.61, do part b using critical values

 

TEST #4 FORMAT for Thursday 4/25. Given: the z table, and formulas for the population z value, confidence intervals, error, and sample size, and the critical values for 90/95/99%.

--About 2 word problems with parts, like 7.3 p354 #17-24, 28-29 and hmks due 4/9, 4/11.

--Show how to find the z critical values for a given confidence level using the z table backwards as in 7.2 p347 #23-26 and 9.1 p416 #13-16.

--3 or 4 short answer questions on sampling distributions and the effect of changes to sample size and confidence on error and confidence intervals, as in #7 from hmk due 4/16 and todayÕs quiz.

--At least one each of word problems (some with follow-up parts) dealing with confidence intervals, error and sample size (not necessarily in that order), as in 9.1 p416 #21-24, 43-48, and hmks due 4/16, 4/18.

--A few situations like todayÕs work to accept or reject hypotheses in significance tests.

 

T 4/16

LECTURE:

We worked on word problems in class involving confidence intervals, error, and sample size.

 

READING:

To practice finding critical values, see 7.2 ex8 p343 and p347 #23-26.

For background on what the sampling distribution is and how we use it, see 8.1 p377-388.

For examples of forming confidence intervals, finding the error in using a sample mean to approximate the population mean, and how to select a sample size for a fixed error, see 9.1 p405-415.

 

HOMEWORK due Thursday 4/18: 

In your work below, round error and confidence interval values to 2 decimal places. Do not round square root values in the middle of a calculation. As stated on p414, always round sample size up to the next whole number. For example, in ex. 7 p415, 43.2964 is rounded to 44.

1. 9.1 p416 #14 (do as in 7.2 p347 #23-26)

2. 9.1 p416 #16 (this was done in 7.2 p347 #23 also, so you have the answer from there!)

3. 9.1 p416 # 24 (use the z values you found in 9.1 #14, and for 85%, Òz answerÓ is in 9.1 #15!).

4. 9.1 p420 # 44

5. Supplemental problem: A random sample of 300 telephone calls made to the office of a large corporation is timed and reveals that the average call is 6.48 minutes long. Assume a std. deviation of 1.92 minutes can be used. If 6.48 minutes is used as an estimate of the true average length of telephone calls made to the office,

a. What is the maximum error in the estimate of the mean using 99% confidence?

b. What is the maximum error in the estimate of the mean using 90% confidence?

6. Supplemental problem: A large hospital finds that in 50 randomly selected days it had, on average 96.4 patient admissions per day. From previous studies a population std. deviation of 12.2 days can be used. Using a 90% confidence level,

a. How large a sample of days must we choose in order to ensure that our estimate of the actual daily number of hospital admissions is off by no more than five admissions per day?

b. How large a sample of days must we choose to have one-fourth of the error in part a?

 

NEXT LECTURE: We will start ch. 10 next time, and this chapter can be a difficult read. We will look at the significance test in pieces before putting it all together, and will start with how to make a decision, as outlined on p467 and p470. Bring todayÕs handout on Thursday!

 

Th 4/11

LECTURE:

Section 8.1: the sampling distribution lets us work with a normal distribution of all possible samples from a population even the original population is not normal (use n of 30 or greater in this case).

Section 9.1: constructing confidence intervals looking at ex. 3 p411 using the material from p410/411. The formula for the lower and upper bounds of the confidence interval for the population mean can be written as on p410:

 

 

 

The z values above are the Òcritical valuesÓ that come from putting the given confidence level as an area in the middle of the distribution. This was done in 7.2 p347 #23-26:

#23 For confidence level of 80%, the critical z values are +/- 1.28

#24 For confidence level of 70%, the critical z values are +/- 1.04

#25 For confidence level of 99%, the critical z values are +/- 2.58

#26 For confidence level of 94%, the critical z values are +/- 1.88

The most common critical z values are given in the table on p410:

For confidence level of 90%, we will just use the first of the closest table values +/-1.64.

For confidence level of 95%, the critical z values are +/- 1.96

For confidence level of 99%, we will just use the first of the closest table values +/-2.57.

 

exampleS:

1. Speeding: In example 3 p411/412 a 90% confidence interval (note we use 1.65 from above) for the whole population of car speeds is:

sample mean – error < m < sample mean + error

59.62 1.64(8/squareroot12) < m < 59.62 + 1.64(8/squareroot12)

59.62 3.79 < m < 59.62 + 3.79

55.83 < m < 63.41

If we are willing to accept a lower confidence level, like 70%, (note we use 1.04 from above):

59.62 1.04(8/squareroot12) < m < 59.62 + 1.04(8/squareroot12)

59.62 2.40 < m < 59.62 + 2.40

57.22 < m < 62.02

Tighter interval around m, but valid in only 70% of all samples. You would need to know why the estimate was needed in the first place, or be working in the particular field of study to know whether the trade-off is worth it. Here, residents wanted evidence that cars were speeding regularly, and the first interval showed this (the speed limit for the area was 45mph), so there is no need in this case to settle for less confidence to get a tighter interval.

 

2. Time reading the Sunday paper:

A study discloses that 100 randomly selected readers devoted on average 126.5 minutes to the Sunday edition of the paper. Similar studies have shown a population standard deviation of 26.4 can be used. Construct a 95% confidence interval for the true average number of minutes that readers spend on the Sunday edition.

The interval, using the formula above is:

sample mean - error <  m < sample mean + error

126.5 1.96(26.5/squareroot100) <  m < 126.5 + 1.96(26.5/squareroot100)

126.5 1.96(26.5/10) < m < 126.5 + 1.96(26.5/10)

126.5 1.96(2.65) < m < 126.5 + 1.96(2.65)

126.5 5.17 < m < 126.5 + 5.17

121.33 < m < 131.67

So in 95 samples out of 100, we would expect a reader to spend anywhere from about 121 minutes to about 132 minutes on the Sunday paper.

If we take a sample of 500 readers instead, leaving all other conditions the same,

126.5 1.96(26.5/squareroot500) <  m < 126.5 + 1.96(26.5/squareroot500)

126.5 2.32 < m < 126.5 + 2.32

124.18 < m < 128.82

so notice that without tampering with the confidence level, we got a tighter interval. The problem with this, though, is that you pay a price in time and money by taking a sample 5 times the first.

 

READING IN THE TEXT:

Browse section 8.1 p377-388, note definitions on p381 and 385.

Read 9.1 p405-412 for items from this lecture, and read p413-415 for next time.

 

HOMEWORK due Tuesday 04/16: Use given critical values above examples.

Try variations of the 2 examples above and notice the changes to the error and interval width:

1. Redo example 1 (speeding) above by changing 90% confidence to 95% (z is 1.96).

2. Redo example 1 above keeping the 90% confidence level, but changing sample size n to 50.

3. Redo example 1 above keeping the 90% confidence level, but changing sample size n to 5.

4. Redo example 2 (Sunday paper) by changing 95% confidence to 90% confidence.

5. Redo example 2 above by keeping 95% confidence but change n to 1000.

6. Redo example 2 above by keeping 95% confidence but change n to 40.

7. In general, from looking at the examples and your work:

a. Does the error get bigger or smaller as you reduce sample size?

b. Does the confidence interval get wider (less precise estimate for m) or narrower (closer estimate for m) around the population mean as we take a smaller sample size?

c. Does the error get bigger or smaller as you reduce the % confidence level?

d. Does the confidence interval get wider or narrower around the mean as we take a smaller confidence level?

 

T 4/9

LECTURE:

We worked on normal word problems from 7.3 again today.

In a ÒforwardsÓ problem, given an x value (or two x values), use the standardizing formula

z = ( x – mean)/(std. deviation) to find the z value(s). Then look up the area in the tail of the distribution corresponding to each z value. Find the area you want using these tail areas.

 In ÒbackwardsÓ problems, given an area (%, probability, proportion) find the x value that bounds it by reversing the process from the ÒforwardsÓ problems. Identify the given area in a picture and search the middle of the table for the closest area to the one you are given, map it backwards to find the row and column it belongs to in order to find the z value, then take the resulting z value and ÒunstandardizeÓ it (solve for x) in the formula

z = ( x – mean)/(std. deviation)!

 

READING IN THE TEXT:

7.3 p349-352, but using the table given in class.

 

Supplementary exampleS :

1. A salesman has an average car route trip time of 4.3 hours with std. deviation of 0.2 hours. What is the probability that the length of his car trip will last anywhere from 4 to 4.5 hours?

Answer: This is a ÒforwardsÓ problem. For x=4, z=(4-4.3)/0.2=-1.5 and for x=4.5. z=(4.5-4.3)/0.2=1.0. The area to the left of –1.5 is 0.0668 and the area to the right of 1.0 is 0.1587. The area between is 1-(0.1587+0.0668)=0.7745, so there is about a 77% chance that his trip will last anywhere from 4 to 4.5 hours.

2.The lengths of sardines received by a cannery have a mean of 4.64 inches and a standard deviation of 0.25 inches. If the distribution of these lengths can be approximated closely with a normal distribution, below which length lie the shortest 18% of the sardines?

Answer: This is a ÒbackwardsÓ problem since you are looking for an x value (length of sardines), having been given an area. The area of 18% or 0.18, is a left-hand tail area, because it represents below-average lengths. The closest value to this in the table is 0.1814. This area corresponds to a z value of 0.91. We ÒunstandardizeÓ this value by using the formula to solve for x and get

0.91=(x-4.64)/0.25 so

x=(0.91)(0.25)+4.64= 0.2275+4.64=4.41.

About 18% of the sardines measure 4.4 inches or shorter.

  

HOMEWORK due Th 4/11: 

(Draw normal distribution for each problem, label values, write sentence of conclusion)

1. 7.3 p355 #28 (this is a ÒbackwardsÓ type problem)

2. The average amount of radiation to which a person is exposed while flying by jet across the U.S. is 4.35 units with std. deviation of 3.2. What is the probability that a passenger will be exposed to more than 4 units of radiation?

3. The number of days that patients are hospitalized is on average 7.1 days with std. deviation of 3.2 days. How many days do the 20% longest-staying patients stay?

4. The average time to assemble a product is 27.8 minutes with a standard deviation of 4.0 minutes. What percent of the time can one expect to assemble it in anywhere from 30 to 35 minutes?

5. For a salesman driving between cities, the average trip time is 4.3 hours with std. deviation of 0.2 hours. Below what time lie the fastest 10% of his trips?

 

Th 4/4

LECTURE: Before the test, I gave a few examples of the use of the standardizing formula from ch3: z = ((x-mean)/std.deviation) to find areas under the normal curve for word problems. We will spend more time on this on Tuesday, but read some more examples below and try the hmk.

 

READING IN THE TEXT:

7.3 p349-353 Notice that you are combining knowledge from 3.4 and 7.2 in this section. Use the formula from 3.4 to standardize the distribution, then use the table from 7.2 to look up areas.

 

exampleS:  

7.3 p354 #17

a. area less than z = (20-21)/1 = –1 is 0.1587 so about 16% of the eggs are expected to hatch in less than 20 days.

b. area more than z = (22-21)/1 = +1 is 0.1587 so about 16% of the eggs are expected to hatch in more than 22 days.

c. area less than z = (19-21)/1 = +2 is 0.0228 and area more than z = 0 is 0.50, so the area between is 1 – 0.0228 – 0.50 = 0.4772 so about 48% of the eggs are expected to hatch in 19 to 21 days.

d. area less than z = (18-21)/1 = +3 is 0.0013 which happens 0.13% of the time (much less than 1%).

7.3 p354 #21

b. area less than z = (250-266)/16 = –1 is 0.1587 so about 16% of pregnancies last less than 250 days.

d. area more than z = (280-266)/16 = +0.88 is 0.1894 so about 19% of pregnancies last more than 280 days.

e. area no more than z = (245-266)/16 = –1.31 is the same as area less than –1.31 which is 0.0951 so about 10% of pregnancies last no more than 250 days.

f. area less than z = (224-266)/16 = –2.63 is 0.0043 so pregnancies lasting less than 224 days happen less than ½ of a percent of the time, therefore are considered rare.

 

HOMEWORK due Tuesday 04/09

7.3 p354 (Draw picture of normal distribution for each part and label with values from problem)

#18 try all parts abcd, then do an extra part e:

   e. probability a randomly selected 6th-grade student reads less than 125 words per minute?

#20 try all parts abcd, thn do an extra part e:

   e. probability a randomly selected car will spend more than 2 minutes in the drive-thru?

 

T 4/2

We briefly looked at the material we are moving into (7.3) and will probably take another look at it before the test for a few minutes.

Hmk is to study for the test as outlined in the notes from 3/21 below.

 

Th 3/21

SPRING BREAK IS HERE! HAVE A GREAT TIME!

 

ANSWERS from last hmk, not done in class:

3: Getting any 1 of 5 winning regular numbers and not getting the winning Mega:

 

Regular numbers

 

 

AND

 ÒMegaÓ numbers

5 winning

42 losing

47 # total

1 winning

26 losing

27 # total

take 1:

take 4

take 5 total

take 0

take 1

take 1 total

 

( ( 5C1*42C4 ) / 47C5 ) * ( (1C0*26C1) / 27C1 ) = ( (5*111930) / 1,533,939 ) * (26/27)

= 14550900 / 41,416,353 = 0.35

 

4: Getting any 2 of 5 winning regular numbers and not getting the winning Mega:

 

Regular numbers

 

 

AND

 ÒMegaÓ numbers

5 winning

42 losing

47 # total

1 winning

26 losing

27 # total

take 2:

take 3

take 5 total

take 0

take 1

take 1 total

 

( ( 5C2*42C3 ) / 47C5 ) * ( (1C0*26C1) / 27C1 ) = ( (10*11480) / 1,533,939 ) * (26/27)

= 2984800 / 41,416,353 = 0.07

 

Both of the above events are considered too common to give prizes for, so you win nothing!

 

LECTURE:

We skipped ch 6, as we have already talked about probability distributions in other sections, and move to ch 7 to revisit normal distributions and see probability as areas under the curve.

We looked at the table below today to find areas under the already standardized curve. We will use a modified version of this table from your book, where we employ the symmetry of the curve so that the area to the left of a negative z value is the same as the area to the right of a positive z value. In that way, you can look up the z values on the one page below as + or – , not just –.

 

BH269HD:Users:guest_:Desktop:tablez.pdf

 

To look up a particular value of z, you put together the row and column that make up the z value. The left-most column gives the ones and tenths places of the z, but the uppermost row gives the hundredths place of the z value.

Example: to find the area under the curve to the right of 1.83, since 1.83=1.8+0.03, you look to the row of 1.8 and the column of 0.03 to get an area of 0.0336 in the right tail of the distribution.

Example: for the area for values to the left of z= -1.57, putting row 1.5 with column 0.07 we get 0.0582 in the left tail of the distribution.

 

READING IN THE TEXT:

(Review of previous topics 7.1 p327-332 standardizing formula and area under the normal curve)

7.2 p337-346 finding area under the normal curve (be careful that we are using a modified version of the table in the book—the table in the book has a two-page table with separate +/- values, but the answers to the area exercises will come out the same)

 

exampleS:  

1. What is the area to the left of z= –2.04? Answer: 0.0207

2. What is the area to the right of z= 2.79? Answer: 0.0026

3. What is the area to the left of z= –0.06? Answer: 0.4761

4. What is the area to the left of z= –0.60? Answer: 0.2743

5. What is the area to the right of z= 0.60? Answer: 0.2743

6. What is the area to the right of z= –1.74? Answer: 0.9591 (from 1-0.0409)

7. What is the area to the left of z= 1.05? Answer: 0.8531 (from 1-0.1469)

8. What is the area between z= 0.87 and z= 2.03?

Answer: 0.1710 (The smaller tail corresponding to z= 2.03 has area of 0.0212 and the larger tail corresponding to z=0.87 has area of 0.1922. The smaller area is contained within the larger area, so to find the area between, take the larger and subtract the smaller: 0.1922–0.0212=0.1710)

9. What is the area between z= –0.25 and z= –1.97

Answer: 0.3769 (Same as the above, subtract smaller tail from larger: 0.4013-0.0244 = 0.3769)

10. What is the area between z= –2.09 and z=3.07?

Answer: 0.9806 (Different from the previous two problems, because the values are on opposite sides of the distribution and so it is not the case that one tail is contained within the other. You must start with 100% of the whole distribution and Òchop offÓ the two tails using subtraction:

1 – (0.0183 + 0.0011) or 1 – 0.0183 – 0.0011 which equals 0.9806)

 

HOMEWORK due Tuesday 4/2:

1. 7.2 p346 # 6 ac

2. 7.2 p346 # 8 bd

3. 7.2 p346 # 10 abc

4. Find the area between z = – 0.99 and z = – 1.09

5. Find the area between z = + 0.05 and z = + 1.05

6. Find the area between z = – 2 and z = +2.

7. Find the area between z = – 3 and z = +3.

 

TEST #3 FORMAT: (Will occur as scheduled on Th 4/4)

I will provide the general addition and multiplication rules and formulas for nPr and nCr.

1. One probability model set-up like 5.1 p235 #40 combined with a problem like 5.2 p246 #26

2. Given a table like 5.2 p248 #42, 44 but combined with material from 5.4, find probabilities of events like: P(A), P(A | B), P(A and B), P(A or B). Events may be stated in words (as in #42, 44 for example) or defined with letters such as C and D.

3. One general addition rule card problem like 5.2 #32ac or ex3 p242

4. One general multiplication rule word problem like 5.4 #12-16

5. One nPr or nCr to show meaning/cancellations from formula, like 5.5 p276 #18 or 26

6. About 3 situations like 5.5 #46-50 to write appropriate nPr, nCr but not compute it

7. One like 5.5#66 or ex15 p276 to write a probability using a quotient of nCr counts from subsets

8. Follow-up question to #7 as in hmk prob. 5.5 p278 #66, where you needed to identify all the events that make up a probability distribution for the situation: without calculating all of them, you know that out of samples of 3 colas from the 12-pack, you can only have the 4 possible events of 0 diet, 1 diet, 2 diet, and 3 diet in the samples, so P(0)+P(1)+P(2)+P(3) = 1.

9. One like 5.5 p278 #60 or Super Lotto from hmk to write probability as a quotient of nCr counts

10. Various areas to look up like todayÕs hmk from 7.2

 

T 3/19

LECTURE:

5.5: how to form more sophisticated probabilities using nCr counts.

 

READING IN THE TEXT:

5.5p275-276 ex 14, 15 probabilities involving combinations

 

exampleS:  

5.5 p278 #62

c. (55C3*45C4)/100C7 = (26235*148995)/(1.60075608x10 to the 10th power) = about 0.24

 

5.5 p278 #65a. Out of the 13 tracks, 5 are liked so 8 must be disliked. You are taking 2 of 5 liked and 2 of 8 disliked for the event probability on the top of the fraction. On the bottom of the fraction, any 4 could pop up from the 13 tracks available.

 

 

13 total

 

5 liked

8 disliked

Take 4 total

2 from here

2 from here

 

(5C2*8C2)/13C4 = (10*28)/715 = about 0.39

 

Another example not in book:

Out of 125 dishes in a box, 8 are chipped. If we select 6 dishes at random from the box, what is the probability that exactly 1 will be chipped?:

Answer:

Out of 125 dishes in a set, if 8 are chipped, 117 are not.

 

8 chipped

117 not chipped

125 total

take 1

take 5

take 6 total

 

(8C1*117C5)/125C6=(8*167549733)/4690625500 =0.29

  

There are two very recognizable lottery examples of forming probabilities using this counting method. One is the MEGA MILLIONS lottery game which we looked at today (p278 #60). It involves tickets sold in at least 42 states. Another is just for California: SUPER LOTTO PLUS:

CALIFORNIA LOTTERY: SUPER LOTTO PLUS

To play the game, you are asked to pick 5 different numbers choosing from 1 to 47 regular numbers and one ÒMegaÓ number choosing from 1 to 27. The top prize (which is the one advertised in millions) goes to whoever matches all 5 of 5 winning numbers and matches the one Mega number. Much smaller prizes are awarded for matching some of the numbers.  

Example: Getting any 3 of 5 winning regular numbers and not getting the winning Mega:

 

Regular numbers

 

 

AND

 ÒMegaÓ numbers

5 winning

42 losing

47 # total

1 winning

26 losing

27 # total

take 3:

take 2

take 5 total

take 0

take 1

take 1 total

 

( ( 5C3 * 42C2 ) / 47C5 ) * ( (1C0*26C1) / 27C1 ) = ( ( 10 * 861 ) / 1,533,939 ) * ( (1*26) / 27 )

= 223860 / 41,416,353 = 0.005405111 

Example: Getting any 3 of 5 and the Mega:

( ( 5C3 * 42C2 ) / 47C5 ) * ( 1C1 / 27C1 ) = ( ( 10 * 861 ) / 1,533,939 ) * ( 1 / 27 )

= 8610 / 41,416,353 = 0.00207889

Example: Getting all 5 of 5 and the Mega: ( 5C5 / 47C5 ) * ( 1C1 / 27C1 ) = ( 1 / 1,533,939 )( 1 / 27 ) = 1 / 41,416,353 = 0.000000002.

 

HOMEWORK due Th 3/21:

1. 5.5 p278 #66 Do parts a, b, and c, and also find P(exactly 0 diet). Check that the probabilities from all the parts form a probability distribution: P(0) + P(1) + P(2) + P(3) = 1

2. We did 5.5 p278 #60 in class. Using the same game and sets, find the probability of getting none of the winning numbers from either set!

3. In Super Lotto Plus (described in above notes), find the probability of getting 1 of the winning regular numbers and not getting the Mega number.

4. In Super Lotto Plus (described in above notes), find the probability of getting 2 of the winning regular numbers and not getting the Mega number.

To make your life easier in problems 2, 3 and 4 above, here are some shortcuts:

  nC0=1 for all n (so 5C0 = 1 for example)

  nC1=n for all n (so 5C1 = 5 for example)

  nCn=1 for all n (so 5C5 = 1 for example)

  51C5=2,349,060

    5C2=10

  42C3=11,480

  42C4=111,930

  47C5=1,533,939

 

Th 3/14

ANSWERS: previous hmk not covered in class

5.2 #44

a. P(placebo) = 100/250

b. P(went away) = 188/250

c. P(placebo and went away) = 56/250

d. P(placebo or went away) = 100/250+188/250-56/250 = 232/250

5.4 #18

a. P(former | cancer) = 91/1014

b. P(cancer | former) = 91/7848

 

LECTURE:

I introduced material from section 5.5 that we will continue with on Tuesday.

To form more complicated probabilities, one must know how to count sometimes large and complex numbers of things.

We considered the example of how to count the different ways one can take 2 letters, without repetition, from a set of 3 letters{A, B, C}. We found 6 permutations (order matters) and 3 combinations (order doesnÕt matter) with the help of a tree diagram. We can perform these counts without a tree, using the formulas in 5.5.

 

READING IN THE TEXT:

5.5 pages 266 thru the end of example 11 on p273, and try some of the computations in the skill building section on p276/277 for yourself (check answers to odds in the back of the book). Especially read about:

-- tree diagrams on p267

-- factorials on p269

-- permutation formula p270

-- combination formula p272

--p271 ex7 and p273 ex11 deciding if order is important

 

HOMEWORK due Tuesday 3/19: (Please perform problems in the order listed below)

5.5 p276 # 6, 8, 14, 16, 24 (note that #9 tells you that 0! = 1)

5.5 p277 #28, showing the possible paths on a tree diagram and check count with formula.

5.5 p277 #30, showing which outcomes in #28 above are repeats and check with formula.

5.5 p277 #18 and see how quickly the counts can get out of hand, even with small sets of objects. (Would you want to list all of these selections in a table or on a tree diagram?)

5.5 p277/278 #46, 50 deciding if order matters first, then computing the appropriate P or C.

 

T 3/12

LECTURE:

5.2 goes over the two forms for the addition rule p238 and p241/242 and

5.3/5.4 go over the two forms for the multiplication rule p251 and p259.

The 2nd forms of each rule (p242 and p259) are the general forms, which actually cover the cases of the 1st forms (p238 and p251). So you only need the general forms for all cases!

ÒE OR FÓ is a union of sets and ÒE AND FÓ is an intersection of sets.

The general addition rule of section 5.2 includes intersections as part of the union (subtracting the intersection once so as not to double count it in the union):

P(E or F) = P(E) + P(F) – P(E and F)

In table problems, we treated P(E and F) as an intersection of a row and column in the table, but now we can also use the general multiplication rule to find it:

P(E and F) = P(E)áP(F l E), where P(F l E), read ÒF given EÓ, is the conditional probability: the probability that event F occurs given that event E has already occurred or that F occurs given that E is the subset F is being chosen from.

 

We worked on an exercise in class using a table like that in 5.2 p248 #42 to highlight how to find one-event (ÒsimpleÓ probabilities), conditional probabilities, and multi-event ( ÒcompoundÓ probabilities using AND and OR).

In the table handout in class, the first three probabilities are all one-event classical forms:

1. P(target) = 145/282

2. P(29-39) = 125/282

3. P(Macys) = 137/282        

 

The next two are conditional probabilities:

4. P(target given 29-39) = 50/125 (given we select only from the 125 29-39, find P(Target))

5. P(29-39 given target) = 50/145 (given we select only from the 145 Targets, find P(29-39))

 

The next three are ways to find an intersection (mult rule):

6. P(target and 29-39) = 50/282 (there are 50 fitting both categories out of all the subjects)

7. P(target and 29-39) = P(target)* P(29-39 given target) = (145/282)*(50/145) = 50/282

8. P(29-39 and target) = P(29-39)* P(target given 29-39) = (125/282)*(50/125) = 50/282

 

The next two are finding unions (addition rule)

9. P(target or 29-39) = P(target)+P(29-39)-P(target and 29-39) = 220/282

10. P(target or macys) = P(target)+P(macys)-P(target and macys) = 145/282+137/282-0/282

   which equals 1, since you have 100% chance they will prefer one or the other since all

   those in the survey had to choose one.

You can find a table discussion similar to the one we had in class in example 4 on p242/243 and examples 1/2 on p257/258 (they refer to the same table even though in different sections).

 

READING IN THE TEXT:

5.2 p238-243 assigned previously

5.3 p250-252 (thru example 2) multiplication rule for independent events

5.4 p256-259 (thru example 3) general multiplication rule (works for both independent and dependent events, thus it is a general rule!).

 

MORE EXAMPLES (to guide you in your new hmk assignment):

5.2 p247

5. E and F share {5, 6, 7} so they are not mutually exclusive

7. S has 12 members and (F or G) = {5, 6, 7, 8, 9, 10, 11, 12} so P(F or G) = 8/12 = 2/3

     P(F or G) = P(F) + P(G) – P(Fand G) = 5/12 +4/12 – 1/12 = 8/12 or 2/3

9. E and G do not share any numbers, so they are mutually exclusive

13. P(E or F) = P(E) +P(F) – P(E and F) = 0.25 +0.45 – 0.15 = 0.55

15. P(E or F) = P(E) + P(F) = 0.25 + 0.45 = 0.70

19. P(E or F) = P(E) +P(F) – P(E and F) so 0.85 = 0.60 +P(F) – 0.05 and solving for P(F), we      get 0.85 – 0.55 or 0.30.

31. a. P(heart or club) = P(heart) + P(club) = 13/52 + 13/52 = 26/52 = 0.50

     b. P(heart or club or diamond) = P(heart) + P(club) +P(diamond)

         = 13/52 + 13/52 +13/52 = 39/52 = 0.75

     c. P(heart or ace) = P(heart) + P(ace) – P(heart and ace) = 13/52 + 4/52 – 1/52 = 16/52

43. a. P(satisfied) = 231/375

      b. P(junior) = 94/375

      c. P(satisfied and junior) = 64/375 from the intersection of the row and column in the table.

      d. P(satisfied or junior) = P(satisfied) + P(junior) – P(satisfied and junior)

          = 231/375 + 94/375 – 64/375 = 261/375

5.4 p262

3. P(E and F) = P(E)*P(F given E) so 0.6 = (0.8)(P(F given E)) so P(F given E) = 0.6/0.8 = 0.75

13. use mult rule but now in word problem form!

     P(cloudy and rainy) = P(cloudy)*P(rainy given cloudy)

     0.21 = (0.37)(P(rainy given cloudy)) so P(rainy given cloudy) = 0.21/0.37 = 0.57

15. P(16/17 and white) = P(16/17)*P(white given 16/17)

     0.062 = (0.084)( P(white given 16/17)) so P(white given 16/17) = 0.062/0.084 = 0.74

17. a. P(no given <18) = 8661/78676 = 0.11

     b. P(<18 given no) = 8661/46993 = 0.18

    

Homework due Thursday 03/14:

5.2 p245 #8, 14, 20, 40, 44

5.4 p262 #4, 8, 14, 16, 18

Hints:

In 5.2 #14, 20 and 5.4 #4, 8 write down the general rules first, then fill in the numbers given and solve for the one that is not given.

In 5.4 p262 #14, 16 both word problems involve plugging two given values in the general mult rule and solving for the other one in the way #4, 8 prepare you to do – these can be tricky, but try them anyway!

In 5.2 #40, 44 and 5.4 #18 you are writing probabilities from a table as we did in class today – they give you column and row totals in #40, but you have to find the totals first yourself in the other two problems.

 

Th 03/07

LECTURE:

Last time and today before the test, we looked at finding probabilities from tables from 5.1 and 5.2, using the classical method for forming probabilities and the general addition rule.

 

READING IN THE TEXT:

5.1 p223-227 up to but not including example 5.

5.2 p238-243 to the end of example 4.

 

EXAMPLES (to guide you in your hmk):

5.1 p233 (using the set-up for probabilities on p227)

13. cannot have a negative probability

31. P(sports) = 288/500 = 0.576

33. a. P(red) = 40/100 = 0.40

     b. P(purple) = 25/100 = 0.25

39. 118+249+345+716+3093 = 4521 never 125/4521 = 0.026, rarely 249/ 4521=0.068, sometimes 345/4521 =0.116, most 716/4521 =0.263, always 3093/4521=0.527

49. a. P(right) = 24/73 = about 0.33

     b. P(left) = 2/73 = about 0.03    c. yes, only 3% of the time

5.2 p247

25. using the addition rule for disjoint events (add the probabilities):

     a. they all add to 1

     b. gun or knife = 0.671 +0.126 = 0.797

     c. 0.126 + 0.044 + 0.010 = 0.180

     d. 1 – 0.671 = 0.329

     e. yes, they only happen 1% of the time

43. a. P(satisfied) = 231/375

      b. P(junior) = 94/375

      c. P(satisfied and junior) = 64/375 from the intersection of the row and column in the table.

      d. P(satisfied or junior) = P(satisfied) + P(junior) – P(satisfied and junior)

          = 231/375 + 94/375 – 64/375 = 261/375

 

Homework due Tuesday 03/12:

Do 5.1 p233 #12, 14, 32, 34, 40, 52

Do 5.2 p247 #26, 34, 42

 

T 03/05

Note that item 13 on the test format below has been removed. We talked about items 11 and 12 in class today and they will be on the test. Here are some examples to guide you on these items:

5.1 p233 (using the set-up for probabilities on p227)

17. There is a 42% chance of being dealt a pair (two cards of the same value) in 5-card stud poker. If you play 5-card stud 100 times, will you be dealt a pair exactly 42 times?

Answer: No, the classical method expects 42 over the long-haul, but in an experiment with a fixed number of trials (the empirical method) anything can happen; 42 times is not guaranteed.

31. P(sports) = 288/500 = 0.576

33. a. P(red) = 40/100 = 0.40

     b. P(purple) = 25/100 = 0.25

49. a. P(right) = 24/73 = about 0.33

     b. P(left) = 2/73 = about 0.03

     c. yes, only 3% of the time

Homework for now is to study for your test. You will have some homework in 5.1 and 5.2 after the test which will be introduced before the test for a few minutes and will be due on Tuesday.

 

Th 02/28

LECTURE:

We started looking at how we can put the mean and standard deviation to work now that we know how to calculate them. Overall, we want to be able to make any normal distribution have the same center (mean) of 0 and same spread (std. deviation) of 1. Once distributions are on the same scale, you can compare them.

 

READING:

3.4 p155 (thru example 1) z-score definition and formula (note that z values measure the number of std. deviations that an x data value lies from the mean).

We also talked about making area comparisons without being able to look up the areas on a table yet. The material from 7.1 p330-332 approximates this discussion. Look at how z calculations are found, given the mean and std. deviation, then look at figure 10 to see a picture of how the original x data is transformed to z data that is centered at 0 and has std. deviation of 1. Once areas for different distributions are on the same scale, you can compare them (i.e., is one area contained within the other on a shared graph? The area to the left of z = 2 is larger that the area to the left of z = 1).

  

HOMEWORK: due Tuesday 03/05:

 (do not just skip the first problem – there is a lot to do in it so it will be worth more points!)

1. Ch3 Review p173 #2 (use either formula for the std. deviation – this problem is a little time-consuming, but a good roundup of several ideas!)

2. 3.4 p162 #26 (base your answer on quartiles/outliers)

3. 3.4 p161 #12 (show comparison on a picture of a normal distribution)

4. 3.4 p161 #14 (show comparison on a picture of a normal distribution)

5. Which is larger, the area associated with values less than 55 for a distribution with a mean of 80 and std. deviation of 10, or the area associated with values less than 50 for a distribution with a mean of 80 and a std. deviation of 15? (show comparison on a picture of a normal distribution)

 

TEST# 2 Will occur as scheduled on Thursday 3/7. As always, bring your own calculator, but not a cell phone, PDA, or other transmission device.

Formulas to be provided on front board (be careful that since formulas are not provided next to the problems, you must know which formula is being asked for and what the symbols mean):

3.2 p136 table 11 formula involving deviations for std. deviation of samples,

3.2 p134 table 10 computational formula for std. deviation of populations

3.4 p155 z scores for populations

3.4 p160 lower and upper fences

FORMAT:

1. Short question on misleading graphs. (See 2.3 p106 #1-7, 10-12, 14).

2. Given data, find the mean median and mode. (see 3.1 p125 #16 for example).

3. Short answer questions from ch3 reading, for example:

     means of samples and populations (3.1 p118/119 ex1, p127 #24),

     resistance to skewing and distribution shape (3.1 p122 definition and table 4),

     when to best use means, medians, modes (3.1 reading and p129 #42),

     measures of center and dispersion best to report (3.2 p141 #8 , 3.4 p159 summary table),

     meaning of standardization (3.4 p155 definition).

4. Assess possible skewing by comparing median and mean values, or matching pictures of 

     distributions with table data (see 3.1 p126 #18 for example).

5. Given sample data, find the mean and std. deviation (see 3.2 p142 #11-16, 20 for ex.)

     (using formula involving deviations, not computational formula).

6. Given population data compute z values, mean, and std. deviation (see 3.4 p163 #30 for ex.)

     (using computational formula, not formula involving deviations).

7. Given quartiles only, assess skewing and outliers (see 3.4 p162 #20 for ex.)

8. For data, find quartiles/IQR/fences/outliers/box plot (see 3.4 p162 #22 and 3.5 p170 #12).

9. Word problem to compare z scores and relative placement (see 3.4 p161 #9-14 for ex.).

10. Question on comparison of areas under the normal curve using z values (from hmk above).

11. Short answer on meaning of empirical and classical probabilities (see 5.1 #17, 18 for ex.).

12. One short problem to state a classical probability (see 5.1 p234 #32-34 for ex.).

 (Items 11, 12 will be covered in class on Tuesday).

 

T 02/26

LECTURE: We worked on 3.4 and 3.5 together to form a box plot. We looked at examples in a class exercise that had all whole numbers in the data and relatively nice calculations. The ones for hmk will have larger and less friendly sets of data!

Answers to the second box plot exercise from class:

Put the data in order first.

Q2 = 35

Q1 =  mean of 33 and 34 = 33.5

Q3 =  mean of 39 and 42 = 40.5

IQR = 40.5 – 33.5 = 7

LF = 33.5 – 1.5(7) = 23

UF = 40.5 + 1.5(7) = 51

There is one outlier (but very close!): 22.

 

READING IN THE TEXT:

For hmk prep:

3.4 p157-160 percentiles, quartiles, and outliers discussion.

3.5 p164-167 read about how to construct a box plot from the quartiles in section 3.4 (see blue box on p165) and how the box plot gives a nice visual of data that is easier to construct than a histogram (see pictures on p167).

 

For next time and perhaps the time after that,

3.4 p155-156 standard scores (word problems)

5.1 p223-227 up to but not including example 5.

5.2 p238-243 to the end of example 4.

 

MORE EXAMPLES (to guide you in your hmk!):

3.4 p162 #21 The mean of this sample data is 3.99 and std. dev. is 1.78. Notice they put the data in order by columns, so you do not need to list it again!

a. z = (0.97- 3.99)/178 = –1.70

b. Q2 = (3.97+4)/2 = 3.99     Q1 = (2.47+2.78)/2 = 2.63    Q3 = (5.22+5.50)/2 = 5.36

IQR = Q3 – Q1 = 5.36 – 2.63 = 2.73

Left fence = 2.63 – 1.5(2.73) = –1.47

Right fence = 5.36 + 1.5(2.73) = 9.46

so no outliers (data values outside the ÒfenceÓ).

3.5 p170 #11

The data in order are:

0.598, 0.600, 0.600, 0.601, 0.602, 0.603, 0.605, 0.605, 0.605, 0.606, 0.607, 0.607, 0.608, 0.608, 0.608, 0.608, 0.608, 0.609, 0.610, 0.610, 0.610, 0.610, 0.611, 0.611, 0.612.

a. Q2 = 0.608     Q1 = (603+605)/2 = 604     Q3 = (610+610)/2 = 610

IQR = Q3 – Q1 = 610 – 604 =  6

Left fence = 604 – 1.5(6) =  595 (so no outliers on the left)

Right fence = 610 + 1.5(6) = 619 (no outliers on the right either)

Number line below box plot shows min, Q1, M, Q3, max:

 

------------ 

 

 

-----

            

.598       .604      .608 .610 .612           

 

Data appears to be skewed left.

 

HOMEWORK due Thursday 02/28:

1. 3.4 p162 #20,

2. 3.4 p162 #22bcd (given that mean of sample is 10.08 and std. dev. of sample is 1.89),

3. 3.4 p162 #24,

4. 3.5 p169 #6,

5. 3.5 p170 #12 (data in order 1.01, 1.34, 1.40, 1.44, 1.47, 1.53, 1.61, 1.64, 1.67, 2.07, 2.08, 2.09, 2.12, 2.21, 2.34, 2.38, 2.39, 2.64, 2.67, 2.68, 2.87, 3.44, 3.65, 3.86, 5.22, 6.81)

 

Th 02/21

Today, I gave a personal code by which you can verify the % I have for your Test#1 scores. You can check it at http://www.smccd.edu/accounts/callahanp under ÒtestscoresÓ by some time this afternoon.

 

LECTURE:

We did more with std. deviation, now with respect to populations.

Towards the end of class, we took a look at how we will use the mean and std. deviation that we now know how to compute. In section 3.4. standardized values put the mean and standard deviation to work. Overall, we want to be able to make any normal (symmetric) distribution have the same center (mean) of 0 and same spread (std. deviation) of 1. Once distributions are on the same scale, you can compare them or look up areas related to the distribution from a table (table A-11 in the back of your book)

 

READING IN THE TEXT:

For hmk prep:

3.2 p131-134 std. deviation of a population

3.2 p139  Empirical Rule

3.4 p155-156 z-scores

For next time prep:

3.4 p157-160 quartiles

3.5 p163-167 box plots

A class exercise on Tuesday will unite 3.4 and 3.5

(3.3 will be skipped)

 

Homework due Tuesday 02/26:

3.2 p142 #26 (note this continues 3.1 #24 about means, but now asks for std. deviations):

    part a do by both methods: deviations and computational formula,

    part b do each of your 3 samplesÕ std. deviations by either method (your choice)

3.4 p163 #30 (see table of data from 3.1 p127 #24 and 3.2 p143 #26)

Do the z scores for the whole population only, not for each sample! If you missed how to calculate z values, look at p156 and also p332 has some examples.

a. Given the info from those previous problems that the population mean of the data in the table is 26.4 and the population std. deviation of the data in the table is 12.8, find the z-scores for each data point in the table (you should end up with 9 z scores).

b. Find the mean of the 9 z-scores from part a. (sum the + and – values as they are).

c. Find the std. deviation of the 9 z-scores from part a. using the computational formula for population std. deviation as in table 10 on p134 (remember to divide by N, not n-1).

 

T 02/19

We started looking at the notion of spread from 3.2. We started an exercise in class finding the standard deviation  2 ways for the data set of x values below. Check the values:

 

Deviations method for std. deviation of a sample

 

For computing formula

x

deviations

sqr deviations

 

x

x2

2.5

 2.5-5.5= -3.0

9.00

 

2.5

6.25

3.0

 3.0-5.5= -2.5

6.25

 

3.0

9.00

4.0

 4.0-5.5= -1.5

2.25

 

4.0

16.00

6.0

6.0-5.5= 0.5

0.25

 

6.0

36.00

6.5

6.5-5.5= 1.0

1.00

 

6.5

42.25

7.5

 7.5-5.5= 2.0

4.00

 

7.5

56.25

9.0

 9.0-5.5= 3.5

12.25

 

9.0

81.00

x=38.5

= about 0 (rounding)

    = 35.00

 

x=38.5

x2 =246.75

 

Deviations method: squareroot of (35/(7-1)) = squareroot of 5.83 = 2.42

Computing formula method: Sxx from before yields 246.75 – ((38.5)(38.5)/7) = 35 so

     squareroot of Sxx/n-1 = squareroot of (35/(7-1)) = squareroot of 5.83 = 2.42

 

READING IN THE TEXT:

For hmk prep:

3.1 p122 mean vs. median

3.2 p135-137 std. deviation of a sample

For next time prep:

3.2 p131-134 std. deviation of a population

(3.3 will be skipped)

3.4 p157-160 quartiles (if there is time)

3.5 p163-167 box plots (if there is time)

 

Homework due Thursday 02/21:

3.1 p125 #14, 18, 44

3.2 p142 #8,

3.2 p142 #12 (find std. deviation by deviations formula only – see p136 table 11),

3.2 p142 #20 (find std. deviation by computation formula only – see p136 table 12)

Note: p137 tells you to take the squareroot of the values on the previous page p136 to find the std. deviation, not just the variance!

 

Th 02/14

Before the test, we took a brief look at measures of center and means of different samples from a population. Read the section 3.1 as below, then try some hmk for Tuesday.

 

READING IN THE TEXT:

3.1 p117-125 mean, median, mode, and comparing the mean vs. median

Be sure to read p122 about resistance and how distribution shapes affect the mean and median. We will go over this on Tuesday.

3.2 on spread of data will be covered on Tuesday if you want to read ahead.

 

Homework due Tuesday 02/19:

3.1 p125 #16, 24, 28, 30, 36, 42

 

T 02/12

After going over the hmk, we looked at 2.3 examples regarding use of areas in graphics, and how scaling affects perception. Please read 2.3 p100-106 and look at p106-108 #1-7, 10-12 (we looked at 2, 4, 6, 10 in class -- skip time-series plots).

Homework is to study for your test. The format is at the end of the previous notes below.

Before your test, we will take a brief look (10 to 15 mins) at the first section of ch3 (mean, median and mode). There will be hmk from 3.1 due Tuesday and we will pick up with the 3.2 concepts of center and spread.

 

Th 02/07

We spent time after the quiz talking about the vocabulary of Ch.1 in 1.1 and 1.5. If you only recently got your book, you should go back and read these sections to fill in the ideas. Sometime soon, you should also browse the pages in 1.2, 1.3, and 1.4. Although we do not have time to go over this material, it is important for your general knowledge of statistics.

 

READING IN THE TEXT:

2.2 p78-83 constructing classes and histograms for continuous data

See 2.3 p106 #1-7, 10-12, 14 which we will look at on Tuesday in class (no hmk)

 

IN-CLASS EXERCISE:

You can see the answers in the back of the book since the exercise was problem 2.2 p95 #39.

Note that class width is the difference between consecutive lower class limits, not the difference between the lower and upper limits for any one class.

1st part: classes of width 10 give:

class   frequency tally

20-29  1 

30-39  111111 

40-49  1111111111 

50-59  11111111111111 

60-69  111111

70-79  111

You can see the histogram by turning the tally marks above 90 degrees counterclockwise!

2nd part: classes of width 5 are: 20-24, 25-29, 30-34  etc.

The histogram for this has the same general shape as the one from part a above, but spreads the data out more and contains more peaks and valleys that show more about the original data. I show height in X blocks below to ensure the scale stays in line on this page, but you should just draw bars of the same height (see answer to 2.2 #39 in back of book):

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

X

X

 

 

 

 

 

 

 

 

 

 

X

X

 

 

 

 

 

 

 

 

 

 

X

X

 

 

 

 

 

 

 

 

 

 

X

X

X

 

 

 

 

 

 

 

X

 

X

X

X

X

 

 

 

 

 

 

X

 

X

X

X

X

 

X

 

 

 

X

X

X

X

X

X

X

X

X

 

__

X

X

X

X

X

X

X

X

X

X

__

 

As the book points out, there is no one best way to divide up the data. You pick what you think shows the spread of the data best. I like the one with class width 10 since it yields a fairly smooth Òbell curveÓ shape. I find the extra peaks and valleys using class width 5 to be somewhat annoying to look at.

 

Homework due Tuesday 2/12:

2.2 p91-95 #2, 4, 6, 12, 14, 30, 34, 38

Note for #30: instructions are listed before problem #27.

Note for #34: instructions are listed before problem #31.

Note for #38: make upper class limits accurate to one decimal, just like the data. Your first class should be: 8 - 9.9 (since 9.9 is the last one decimal number before 10).

 

Looking forward:

We will work on 2.3 on Tuesday. Test #1 will occur on Thursday 2/14 as scheduled and will cover the material from lecture, class exercises, homework, and quizzes.

Formulas and testpaper will be provided but you must bring a standard scientific calculator and something to write with. You will not be allowed to use calculator cell phones, PDAs, or other transmission-capable devices. You may not share a calculator (see if someone can let you borrow theirs after they turn in their test).

Format:

1a. Graph linear scatterplot data, draw best fit line estimate, and find the equation of the line using two indicated points.

1b. Given the Òsummation tableÓ for x, y, xy, and x squared for the data use the equations for Sxx, Syy, etc. to find the actual line of best fit.

 

2. Given a set of exponential scatterplot data, turn it into linear scatterplot data using logs, then graph the linear data. One or two follow-up questions may ask you to plug in a given x or y value to a given exponential equation and solve for the other variable.

 

3. Given the line of best fit for some ÒloggedÓ data (x, logy), find the exponential of best fit for the original data (x, y) by ÒunloggingÓ the slope and y-intercept.

 

4. Some short-answer questions on chapter 1.1 and 1.5 reading and definitions, including statistics, samples and populations, qualitative and quantitative variables, discrete and continuous variables, and bias in sampling.

 

5. Given a set of data, find/show various parts of the following: frequency and relative frequency distributions, frequency and relative frequency bar graphs and side-by-side comparison of two sets of data, and why one representation is better than another.

 

6. Given a small set of data, construct classes of given width and lower first class limit, and form the resulting frequency distribution. Be prepared to provide a histogram if asked for.

 

7. Answer short questions regarding use of areas in graphics, and how vertical scaling affects perception. (See 2.3 p106 #1-7, 10-12, 14 which we will look at on Tuesday).

 

T 02/05

I now assume that you have access to the book and can do some reading. Just in case, though, I will put in some notes from Ch1 and 2.1 below. I felt that we could have a better discussion of Ch. 1 vocabulary after you have read it, so it is included in the hmk due Thursday and we will talk more about it then.

 

In class today, we took a look at bar graphs in 2.1. Note that comparison of numbers of college grads using frequencies shows an increase in Internet access. But from this, we cannot tell if the increase was just due to a larger group being sampled in 2003 or if it represents a larger proportional increase within the whole population. Based on the samples taken, can we infer that Internet access has increased for the population of all college grads? A different comparison can be made to answer this question using relative frequencies:

 

 

2000 relative frequency

2003 relative frequency

No college

24662000/90503000 =  0.27

65862000/165899000 =  0.40

Some college

31462000/90503000 =  0.35

50931000/165899000 =  0.31

College grad

34379000/90503000 =  0.38

49106000/165899000 =  0.30

 

Now we can see that all of the categories increased with respect to frequency, but relative frequency shows that the last two categories actually decreased in proportion to the whole. So in general we find that the proportion of college grads with access to the Internet has actually decreased.

 

To read for todayÕs hmk:

1.1 p3-8, noting definitions especially:

A sample is a subset of the population studied, and we make inferences about the population based on that sample.

Variables are the characteristics of the individuals within the population.

Qualitative, or categorical variables allow for classification of individuals based on some attribute or characteristic. Be careful that numbers can be used to identify categories, but arithmetic operations between the numbers are meaningless. Example: zip codes group areas together, but the difference (subtraction) of two zip codes is meaningless data.

Quantitative variables provide numerical measures of individuals, where arithmetic operations can be performed on the numbers involved and provide meaningful results. Example: people can be grouped according to height in inches, and how much two peopleÕs heightsÕ differ is meaningful data.

A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number (such as 0, 1, 2, etc) of possible values. Example: The number of cars that go through a fast-food line is discrete because it results from counting

A continuous variable is a quantitative variable that has an infinite number of possible values that result from making measurements. Example: The number of miles a car can travel with a full tank of gas is continuous because the distance would have to be measured

 

To read for todayÕs hmk:

1.5 p38-42 about bias in sampling briefly:

sampling bias uses a technique that favors one part of a population over another,

undercoverage causes a sample to not be fully representative of the whole population,

nonresponse of sample subjects to surveys causes error due to missing data that may or may not be minimized by callbacks and incentives.

response bias can result from respondents not feeling comfortable with interviewers or misrepresenting facts or lying, or from questions that are leading in the way they are phrased (poorly worded questions). Example: A policeman asks students in a classroom to fill out a survey involving whether they have used drugs and what kinds they have used. Anonymous discussion of the results will follow. Response bias could occur if the students feel uncomfortable giving this information to a policeman, or if students misrepresent facts because they either donÕt want to face their problems or think it will appear cool to their friends.

 

(To read over the coming weeks at your leisure -- no homework problems on these:

1.2 p16-17, about observational and designed experiments

1.3 p23-26 up to ex. 3, about simple random sampling

1.4 p30-35 about sampling methods)

 

To read for todayÕs hmk:

2.1 p63-66 and example 6 p68-69 about frequency, relative frequency, bar graphs:

A frequency distribution lists each category of data and the number of occurrences for each category of data.

The relative frequency is the percent of observations within a category and is found by dividing the category frequency by the sum of all the frequencies in the table.

A bar graph categories labeled on the horizontal axis and frequencies on the vertical axis, with bars extending from the horizontal axis to the height that is the frequency and where bars are usually not touching but are of the same width.

A side-by-side bar graph can be used to compare data sets and should use relative frequencies to ensure that the sets are being measured on the same scale, where bars being compared from the same category usually have no space between them but space is still left between different categories.

Note that different problems can ask for different things! You could be asked to provide:

a frequency bar graph,

a side-by-side frequency bar graph,

a relative frequency bar graph, or

a side-by side relative frequency bar graph.

Know which one is being asked for and what each entails.

 

HOMEWORK due Thursday 2/7:

(last time I will type out the assignment problems)

1.1 p12 short answer:

#26 Is Òassessed value of a houseÓ a qualitative or quantitative variable?

#28 Is Òstudent ID numberÓ a qualitative or quantitative variable?

#32 Is Ònumber of sequoia trees in a acre of YosemiteÓ discrete or continuous?

#34 Is ÒInternet connection speed in kilobytes per secondÓ discrete or continuous?

#36 Is ÒAir pressure in pounds per sq. inch in a tireÓ discrete or continuous?

1.5 p43 Consider the type of possible bias for each of the following:

#14 The village of Oak Lawn wishes to conduct a study regarding the income level of all households within the village. The manager selects 10 homes in the southwest corner of the village and sends out an interviewer to the homes to determine income.

#16 Suppose you are conducting a survey regarding the sleeping habits of students. From a list of registered students, you obtain a simple random sample of 150 students. One survey question is Òhow much sleep do you get?Ó.

#18 An ice cream chain is considering opening a new store in OÕFallon. Before opening, the company would like to know the % of homes there that regularly visit ice cream shops, so the researcher obtains a list of homes and randomly selects 150 to send questionnaires to. Of those mailed out, 4 are returned.

2.1 p72

#20 A survey asked 770 adults who used the internet were asked about how often they participated in online auctions. The responses were as follows:

frequently  54

occasionally  123

rarely  131

never  462

a. construct a relative frequency distribution (just the %Õs, not a graph!).

b. what proportion never participate?

c. construct a frequency bar graph.

d. construct a relative frequency bar graph.

2.1 p72

#22 A survey of U.S. adults in 2003 and 2007 asked ÒWhich of the following describes how spam affects your life on the Internet?Ó

   Feeling         2003   2007

Big problem     373     269

Just annoying   850     761

No problem      239     418

DonÕt know      15        45

a/b. Construct the relative frequency distributions for 2003 and 2007.

c. Construct a side-by-side relative frequency bar graph (for all of the categories).

d. Compare each yearÕs feelings and make some conjectures about the reasons for similarities and differences.

 

Th 01/31

Supplementary notes, followed by homework:

 

Linear and exponential patterns (not in book):

A linear relationship y=mx+b is built by repeated addition. We add positive numbers for an increasing line (positive slope) or negative numbers for a decreasing line (negative slope).

An exponential relationship y= b(a)x is built by repeated multiplication. The a is the amount by which we multiply each time (ÒaÓ contains the rate of increase or decrease since a=1+r or 1-r, where r is the rate). For decreasing exponential relationships, a<1 and for increasing exponential relationships, a>1. For decreasing linear relationships y=mx+b, m is negative and for increasing ones, m is positive. As with lines, b is the y-coordinate of the y-intercept. Intercepts are included in each of the following example tables, but could be solved for if missing.

 

The following are some tables of data to illustrate what sets of linear and exponential data look like and how their equations are written (verify the equations by plugging in points):

 

x

y

0

12

1

9

2

6

3

3

 

is a decreasing linear set of data because you are adding 3 each time, so y= 3x+12.

 

x

y

0

20

1

27

2

34

3

41

 

is a increasing linear set of data because you are adding +7 each time, so y=7x+20.

  

x

y

0

50

1

75

2

112.5

3

168.75

 

is a increasing exponential set of data because you are multiplying by 1.5 each time so

y= 50(1.5)x. Written y= 50(1 + 0.5)x  it shows the rate of increase is 0.5 or 50%.

 

x

y

0

250

1

225

2

202.5

3

182.25

 

is a decreasing exponential set of data because you are multiplying by 0.9 each time so

y= 250(0.9)x. Written y= 250(1 0.10)x  it shows the rate of decrease is 0.10 or 10%.

 

Exponential equation of best fit for a scatterplot of seemingly exponential data:

This exponential part is in the more expensive version of this text, but should not be skipped, so I will supplement your version with notes in class and here.

 

If you have a scatterplot of linear data, you saw in class last time that it was relatively easy and accurate to estimate the line of best fit from a graph and also find the best fit line using the equations from last time. However, if you have a scatterplot of data that is best described by an exponential curve, it is difficult to draw a good curve and you wouldnÕt know how to find its equation because it does not have a constant slope (i.e, you could not take two points and use the slope formula or point-slope form!).

 

But if you take the logarithm of each y value in the exponential data (leaving the x values the same), that is, turn (x, y) into (x, logy) in the table, you will have transformed it into linear data! Then use the equations for line of best fit to find y = mx+b for the ÒloggedÓ data (x, logy) and ÒunlogÓ the slope m and y-intercept b to find the ÒaÓ and ÒbÓ in y= b(a)x for the original data.

We did an example of this process in class. Here is another example, but with data that is not perfectly exponential as it was in class so we cannot write the equation from the table values:

 

x

y

1

398.11

3

199.53

4

156.49

 

It is difficult to estimate an exponential scatterplot relationship, but it can be turned into a linear relationship by taking the logarithm of the y values (graph it if you donÕt believe it –unfortunately, I cannot show the graphing part here!).

  

x

y

1

log(398.11)=2.60

3

log (199.53)=2.30

4

log (156.49)=2.19

 

Now we can find the best fit line for this (x, logy) linear scatterplot by making a summations table with and use the standard deviation calculations (Sxx, etc.) for finding the best fit line.

 

x

y

xy

x2

1

2.60

2.60

1

3

2.30

6.90

9

4

2.19

8.76

16

x=8

y=7.09

xy=18.26

x2 =26

 

avgx=8/3=2.67 and avgy=7.09/3=2.36

Sxx = 26 (64/3)=4.67

Sxy = 18.26 [(8)(7.09)/3]= 0.65

slope =  0.65/4.67 =  0.14

y-intercept = 2.36 (0.14)( 2.67) =2.73

So the best fit line for the logged data is y= 0.14x+2.73

 

To find the best fit exponential for the original data, ÒunlogÓ the slope and y-intercept of the line above: raise 10 to the power of each separately and then write the equation for the exponential of best fit for the original table data (x, y).

a=10slope =10-0.14 = 0.72

b=10y-intercept = 102.73= 537.03

So the best fit exponential for the original data is y= b(a)x = 537.03(0.72)x.

 

(Check your answer: does plugging x=1 into your best fit exponential give you something close to the original table value of 398.11? It shouldnÕt be exact because the original data was not perfectly exponential, but it should be in the ballpark! Same for the other two points.).

 

Brief practice solving for variables in linear and exponential problems from Algebra:

 

Given the equation of a line y = 5x + 7,

If x = 4 is given, we can solve for y: y = 5 (4) +7 = 27.

If y = 9 is given, we can solve for x. Since 9 = 5x + 7 subtract 7 from both sides to get 2 = 5x. Then divide both sides by 5 to solve for x: 2/5 = x.

 

Given the equation of an exponential y= 12(5)x,

If x = 3, then we can solve for y: y = 12(5)3 = 12(125) = 1500.

If y = 24, then we can solve for x, but it involves logarithms to rescue x from being an exponent: 24 = 12(5)x. Divide both sides by 12 to get 2 = (5)x. If you take the logarithm of both sides of the equation, you get log 2 = log (5)x. Properties of logs give you log 2 = x log (5). So to solve for x, divide both sides by log 5 to get x = log 2 / log 5 , which by calculator is about 0.43.

 

Homework (due Tuesday 02/05):

Treat the following table data as forming an exponential scatterplot (not perfectly exponential, but best described by an exponential function):

 

x

y

0

25.00

1.5

40.20

2

92.50

3.2

167.30

4

294.35

 

a. Take the original (x, y) values in the table and make a new table (x, log y).

That is, find log 25, log 40.20, etc.

 

b. Find the line of best fit for the values in the (x, logy) table using Sxx and Sxy.

Hint: you should get the following summations to plug in (find and check them for yourself):

x= 10.7    y= 9.66    xy= 23.33    x2 = 32.49

(For y notice that you are not summing the original y values to get 619.35 — you sum the logged y values to get the y summation of 9.66!):

 

c. ÒUnlogÓ the slope m and the y intercept b from the best fit line for the (x, logy) data in part b to get the Òa and bÓ for the best fit exponential y= b(a)x for the original (x, y) data using a=10slope and b=10y-intercept .. Does it describes the data well? Compare with a graph of the original data.

 

d. Use the best fit exponential equation from part c to estimate the value of y when x is 2.5.

 

e. Use best fit exponential equation from part c to estimate the value of x when y is 300.

 

T 01/29

We worked on linear scatterplots. Some supplementary notes follow since I wonÕt assume you have the book until next week.

 

Reading in the text:

--Read 4.1 objective 1 ÒDraw and interpret scatter diagramsÓ

--Read 4.2 example 1 and objective 1 Òfind the least squares regression lineÉÓ and note that I am using the alternate form of the equations in the footnotes.

 

Supplementary notes/examples on lines of best fit:

Given several data points (x,y) you fill out the table below (the data points' coordinates are the x and y values. The symbol   means add them ).

 

Example: for the data points (2,7) and (4,8) we know that the slope of the line thru them is

1/2 = 0.5 so that using point-slope form for the equation of a line:

y 7 = 0.5(x 2)

y = 0.5x +6

This is the equation of the line thru the above points, a line with slope 0.5 and y-intercept 6.

Now let us use the equations for the line of best fit:

Set up a table using the data points with the following quantities and sum them up:

 

x

y

xy

x2

2

7

14

4

4

8

32

16

x = 6

y = 15

xy = 46

x2 = 20

 

(n=2 in this short ex. for the 2 data pts given)

=6/2=3 and

=15/2=7.5

Use all of these numbers to plug into the formulas for the line of best fit:

Sxx = x2 ((x)2/n) = 620 (36/2) = 2

Sxy = xy ((x)( y) / n) = 46 [(6)(15)/2] = 46 45 = 1

Slope of best line = Sxy/ Sxx = 1/2 = 0.5

Y-Intercept of best line =     (slope)( ) = 7.5 (0.5)(3) =7.5 1.5 = 6

The best fit line is then y=0.5x+6 (which matches the equation found at the beginning exactly, because 2 points make a line, not a scatterplot estimation!

 

Example: for the data pts (1,9), (2,8), (3,6), and (4,3) find the equation of the line of best fit:

 

x

y

xy

x2

1

9

9

1

2

8

16

4

3

6

18

9

4

3

12

16

x = 10

y = 26

xy= 55

x2 = 30

 

Note that n=4 is the number of data points

=10/4=2.5

=26/4=6.5

Using the formulas above,

Sxx = 30 (100/4)=30 25=5

Sxy = 55 [(10)(26)/4]=55 65= 10

Slope = 10/5= 2

Y- intercept = 6.5 (2)(2.5) =6.5+5 =11.5

The best fit line is then y= 2x+11.5

 

Homework (due Thursday 01/31):

Treat the following table data as forming a linear scatterplot and do as in the in-class exercise:

 

x

y

30

5

40

60

55

140

 

a. Sketch the points on a hand-drawn graph (just on binder paper), draw what you think is the line of best fit. Estimate the y-values on your line for x = 35 and x = 50 and use these points to find the slope of your line. Write the equation of your line using point-slope or slope-intercept form.

 

b. Fill out a summation table and find the line of best fit using the equations Sxx, Sxy, etc.