ISED 160 NOTES ABOUT HOMEWORK, ANNOUNCEMENTS, ETC.
Assigned on: |
BE SURE TO CLICK ON RELOAD/REFRESH ON YOUR COMPUTER OR THE CURRENT ADDITIONS TO THE PAGE MAY NOT APPEAR! You may also not see current pages if your computer does not have
an up-to-date browserÉ download a new version or use a library/lab computer. Scroll down as new assignments are added to the old. New assignments are generally posted by 2:00 pm of the lecture day unless otherwise noted.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Most current work is listed first,
followed by previous entries: |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 12/06 |
Test #5 will occur as scheduled on Thursday
12/08 and will consist of one page of short answer questions (from 10.1,
10.2, 10.3, 11.1 about use of the tables in accepting and rejecting
hypotheses with classical and p-value methods, and forming of hypotheses and
sentence writing), and one page with 2 word problems (z test, t test) to
perform a complete significance test. See below notes from previous classes for
examples. Note: Test 5 will occur on 12/08 as scheduled and will
not be a dropped grade. Students who take Test 5 as scheduled will be done
with the course (no comprehensive final). If you miss test #5, you must email me before
noon Friday 12/09 and respond to my follow-up emails that day so that we will
both know what to expect the following week (finals week). You will then
perform a more difficult comprehensive test during finals week to take the
place of test 5 only. If you do not contact me by noon Friday 12/09 or fail
to show up for your new test appointment during finals week, you will receive
a zero score for test 5. This final is only offered to give those missing
Test 5 a chance to avoid a zero score for that test. It will not be offered
as a device to raise grades from the rest of the semester. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 12/01 |
Next week is
our last week, so here are some examples (old and new material) to guide you
in this all-important stretch run. IÕm hoping for lots of great test 5
grades! ANSWERS TO TWO OF LAST HMK PROBLEMS: (Decisions done
using classical and p-value methods) 10.3 p487 #10 The
standardized sample t is 1.11. From row 12 of the t-table, with all of alpha
(0.10) in the right tail, look up in column 0.10 and find the critical value
1.356. Accept the null hypothesis. Look in row 12
for the closest values to the sample t of 1.11 and see 1.083 <
1.11< 1.356 so the p value is between 0.15 and 0.10 (from the top row of
the table). Then p is greater than the given alpha of 0.05 so accept the null
hypothesis. 10.3 p487 #11 The
standardized sample t is -3.11. From row 34 of the t-table, with half of
alpha (0.005) in each tail, look up in column 0.005 and find the critical
value -2.728. Reject the null hypothesis. Look in row 34
for the closest values to the sample t of -3.11and see 3.002 < 3.11
< 3.348 so the p value is between 2(0.0025) and 2(0.001) (from the top row
of the table and using the fact that the hypothesis is two-sided). So 0.002
< p < 0.005, p is less than the given alpha of 0.01 so reject the null
hypothesis. EXAMPLES of t table use and t word
problems: 1. Find an estimate for the p value for a
one-sided t test with 0.01 level of significance, n= 20 and sample t= 2.34,
and would you choose to reject or accept the null hypothesis? Answer: Using row 19 of the t table with tail area of 0.01 the estimate
for the p value for method 2 is 0.01<p<0.02 since 2.205<
2.34<2.539. Accept the null hypothesis. 2. Find an estimate for the p value for a two-sided t test
with 0.05 level of significance, n=36 and sample t= 1.98, and would you
choose to reject or accept the null hypothesis? Answer: Using row 35 of the t table the estimate for the p value for
method 2 is 0.05<p<0.10 since 1.690< 1.98<2.030 (areas doubled
for two-sided test). Accept the null hypothesis. 3. Find an estimate for the p value for a
one-sided t test with n=40 and sample t=3.6? Without a given level of
significance, what can you say about rejecting or accepting the null
hypothesis just based on your estimate of the p value? Answer: Using row 39 of the t table the estimate for the p value for
method 2 is p< 0.0005 since sample of 3.6>3.558. Reject the null
hypothesis since it is smaller than any reasonable alpha. 4. Perform a
complete t test: To find out if it seems reasonable that the local town
library is lending an average of 4.2 books per patron, a random sample of 13
people was taken and yielded an average of 4.75 with std. deviation of 1.65
books. Test at the 0.10 level. Answer: The alternate hypothesis is that m4.2,
alpha is 0.10, and we compute sample t=1.20. Classical
method: Using row n-1=12 of the t table with tail areas of 0.05 the critical
value is 1.782 p-value method: estimate for the p value is 0.20<p<0.30 (double
tail areas) since 1.083<sample of 1.20<1.356. Accept the null
hypothesis. We have not found any evidence that the library is lending a
different avg. number of books than 4.2 per person. 5. Perform a complete t test: A kind of alfalfa is advertised as
having an average yield of 2.0 tons per acre. The contention by the FarmerÕs
Association is that the true average yield for this kind of alfalfa is less
than 2.0 tons per acre. The yield of alfalfa from six test plots gives an
average of 1.6 tons per acre with std. deviation of 0.43. Test at the 0.05
level of significance. Answer: The alternate hypothesis is that m<2, alpha is 0.05, and we
compute sample t= -2.28. Classical
method: Using row 5 of the t table with tail area of 0.05 the critical value is
2.015 p-value method:
the estimate for the p value is 0.05<p<0.10 since 1.476<sample of
2.28<2.015. Reject the null
hypothesis. We have found evidence that this kind of alfalfa actually yields
less than 2 tons per acre. READING for TuesdayÕs lecture: We will take a look at
section 11.1 p509-513. The matched pairs problems allow you to compare two
sets of data, but are still t tests with one calculation as in 10.3. The only
different part is how the hypotheses are stated. The data from one group are
subtracted from the data in the matched group (before and after studies are a
good matched pairs model). The null hypothesis states that there is no
difference (average difference = 0) between the two groups: Ho: m = 0. The
alternate hypothesis shows a difference of some kind, either m < 0, m >
0, or m is not equal to 0 depending on the phrasing of the word problem and
the order in which the differences are found. For example, suppose we
wish to find out if a new fertilizer makes tomatoes give larger yields, and
we give plot A the fertilizer but do not give plot B. We assume that there
will be no difference between the two, so Ho: m = 0 If we want to show that
plot A tomato yields are larger, H1 depends on how the differences are
computed. If we subtract the yields in order of plot A – plot B, then H1: m > 0 because larger
numbers minus smaller numbers will give a positive number (> 0). But if we subtract the
yields in order of plot B – plot A, then H1: m < 0 because
smaller numbers minus larger numbers will give a negative number (< 0). Matched pairs word problem example: An agricultural field trial compares the yield of two varieties
of tomatoes for commercial use. The researchers divide in half each of 11
small plots of land in different locations (half gets variety A and half gets
variety B) and compare the yields in pounds per plant at each location. The
11 differences (variety A minus variety B) give an average of 0.54 and std.
deviation of 0.83. Is there evidence at the 0.05 level of significance that variety
A has a higher yield than variety B? (Assume differences computed by A yield
minus B yield). Answer: The null hypothesis is that m = 0 and the alternate hypothesis is that m > 0. We compute sample
t=2.16. Using row n-1=10 of the t table with right tail area of 0.05 the
critical value is 1.812 and the estimate for the p value is 0.025 < p <
0.05 since 1.812 < 2.16 < 2.228. Either way, we reject the null
hypothesis. We have found evidence that variety A has a higher yield than
variety B. HOMEWORK (due Tuesday 12/06) 1. short answer use of t
table: Find an estimate for the p value for a
two-sided t test with 0.05 level of significance, n= 13 and sample t= 1.85,
and would you choose to reject or accept the null hypothesis? 2. short answer use of t
table: Find an estimate for the p value for a
one-sided t test with n=26 and sample t=0.67. Without a given level of
significance, what can you say about rejecting or accepting the null
hypothesis just based on your estimate of the p value? 3. full significance test:
10.3 p488 #14, 4. full significance test:
10.3 p488 #16, 5. full significance test:
10.3 p489 #24b (use population mean 7, sample mean 7.01, sample std.
deviation 0.0316) Test #5 will occur as scheduled on Thursday
12/08 and will consist of one page of short answer questions (from 10.1,
10.2, 10.3, 11.1 about use of the tables in accepting and rejecting
hypotheses with classical and p-value methods, and forming of hypotheses and
sentence writing), and one page with 2 word problems (z test, t test) for
which you must perform a complete significance test. A few examples of short
answer questions are below. Some short-answer
questions for practice: Example: Find the critical value in a one-sided
z test, n = 45, sample z = -2.59, alpha 0.01? Answer: 2.326. Example: a. What is the critical value for a
two-sided t-test with n=33 sample t – 3.02 and alpha of 0.01, , and b.
do you reject or accept the null hypothesis? Answer: a.With row 32 and upper tail of 0.005
since half of alpha goes in each tail, the critical values are + and –
2.738, so b. we would reject the null hypothesis. Example: Estimate the p value for a one-sided
z-test, n = 23, sample z= 1.15 and alpha = 0.10. Answer: Since this is a z value problem, you
go down to the bottom row of the table and see that 1.15 is between 1.036 and
1.282 so the p value is between 0.10 and 0.015. Example: In the previous example, would you
accept or reject the null hypothesis? Answer: p > alpha, so accept. Example: Estimate for the p value for a
two-sided t-test with n=31 and sample t= 3.75. Answer: In row 30, our t is off the table to
the right, so we know that the p value is smaller than twice the 0.0005 that
is above the last table entry, i.e., p < 0.001. Example: In the previous example, would you
accept or reject the null hypothesis? Answer: the p value is rare, so we would
reject the null hypothesis no matter what alpha given. Example: Estimate the p value for a two-sided
t-test, n=24, sample t=-2.48, alpha is 0.02. Answer: In row 23, it puts us between 0.01 and
0.02 which we must double because it is two-sided, so the p value is between
0.02 and 0.04. Accept the null hypothesis since p > alpha. Example: Write the hypotheses and sentence of
conclusion only for the following situation: The average score on the SAT Math exam is 505. A test preparatory
company claims that the mean scores of students who take their course is
higher than 505. Suppose we reject the null hypothesis. Answer: Ho
: M=505 Hi : M>505. The company
has evidence students who take their course will on average have a higher
score than the 505 of all students who take the SAT Math exam. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 11/29 |
ANSWERS TO SOME OF LAST HMK PROBLEMS: 4. supplemental problem: A researcher
believes that the average height of a woman aged 20 years or older is greater
now than the 1994 mean of 63.7 inches. She obtains a sample of 45 woman and
finds the sample mean to be 63.9 inches. Assume a population std. deviation
of 3.5 inches and test at the 0.05 level. ANSWER: Hypotheses Ho: m = 63.7 Hi: m > 63.7 Level of Significance alpha = 0.01 Data and calculations z=(63.9-63.7)/(3.5/sqroot45= 0.38 Decision (to accept or reject the null hypothesis) Critical value method: all of alpha (0.01)
into right tail gives a critical z value of +2.326 and the sample z is closer
to center than that, so accept Ho. p-value method: on the z table, 0.38
gives 0.3520 p value which is more than the alpha of 0.01, so accept Ho. Sentence of Conclusion The researcher has found no evidence that 20 year or older
women are now taller than 63.7 inches. 5. supplemental problem: The average daily
volume of Dell computer stock in 2000 was 31.8 million shares. A trader wants
to know if the volume has changed and takes a random sample of 35 trading
days and the mean is found to be 23.5 million shares. Using a population std.
deviation of 14.8 million, test at the 0.01 level of significance. ANSWER: Hypotheses Ho: m =31.8 Hi: m not equal to 31.8 Level of Significance alpha = 0.01 Data and calculations z=(23.5-31.8)/(14.8/sqroot35)=-3.32 Decision (to accept or reject the null hypothesis) Critical value method: half of alpha
(0.005) into each tail gives a critical z values of +/-2.576 and the sample z
is farther out than that, so reject Ho. p-value method: on the z table, 3.32
gives 0.0005 in each tail for a total p value of 0.001 which is less than the
alpha of 0.01, so reject Ho. Sentence of Conclusion The trader has found evidence that the average daily volume
of Dell stock has changed from the 2000 value of 31.8 million shares. LECTURE -- section 10.3: Today, we moved on to a refinement of
the testing strategy. Instead of having
the population std. deviation given Òfrom previous studiesÓ, we can rely
completely on our sample use s (the std. deviation of the sample). However,
by relying completely on our sample, we have more chance of error, so the
normal distribution will have a correction factor depending on the sample
size. This means we will have to use a new table with more rows to take care
of various sample sizes. There is a copy of this table in the back of your
book. Note that the top row and
the bottom row have the numbers you were using in the abbreviated table for
looking up critical values for z tests. To use the new part of the
table, you take one less than the sample size, df=n-1 and go down to that row
instead of down to the bottom where the z values lie. Use the table
symmetrically so that it works for negative t values and gives areas in the left
tail of the distribution also. EXAMPLES USING THE t-TABLE TO FIND
CRITICAL VALUES AND P-VALUE ESTIMATES: 1. What is the
critical value for a one-sided test with n=20 and alpha =0.05? ANSWER: df=20-1=19 and that row put together with the column of 0.05
gives a critical value of 1.729 2. What would
the critical value be for the above situation if it were two-sided instead of
one-sided? ANSWER: In the same row df=19, you would look at the column with area
0.025, since half of the alpha of 0.05 goes into each tail, and this would
give you a critical value of 2.093. 3. Find an estimate for the p value in a one-sided test with n=33
and sample t=0.52. ANSWER: df=33-1=32, so we look in that row on the
new t table to find the next higher and lower numbers with respect to 0.52.
But since 0.52 < 0.682 the p value then is greater than the area of 0.25
for the t value of 0.682. That is, p > 0.25. 4. Find an
estimate for the p value for a two-sided test with n=25 and sample t value of
1.52. ANSWER: df=25-1=24 and in that row, 1.318 <
1.52 < 1.711 so the right or left tail area for the p value is between
0.10 and 0.05, but we have a two-sided test so we double the areas to get the
sum of the left and right tail areas: 0.10 < p < 0.20. 5. Problem 5
from your last homework would be the same in section 10.3 except for the
values in the decision: Decision (to accept or reject the null
hypothesis) Critical value method: half of alpha (0.005)
into each tail and looking in row 34 (one less than the sample size of 35)
gives critical t values of +/-2.728 and the sample t is farther out than
that, so reject Ho. p-value method: on the t table in row
34 again, we look for the closest values to 3.32: 3.002<3.32<3.348 and
find a p value estimate of 2(0.0.001)>p>2(0.0025), or
0.002<p<0.005 which is less than alpha of 0.01. We reject the null
hypothesis. 6. in the text
10.3 p487 #7: The standardized sample t is 2.502. From row 22 of the t-table,
with half of alpha (0.005) in each tail, look up in column 0.005 and find the
critical value 2.819. Accept the null hypothesis. Look in row 22 for the
closest values to the sample t of 2.502 and see 2.183 <
2.502 < 2.508 so the p value is between 2(0.01) and 2(0.02) (from the top
row of the table and using the fact that the hypothesis is two-sided). So
0.02 < p < 0.04 p is greater than the given alpha of 0.01 so accept the
null hypothesis. 7. in the text
10.3 p488 #9: The standardized sample t is -1.677. From row 17 of the
t-table, with all of alpha (0.05) in the right tail, look up in column 0.05
and find the critical value -1.740. Accept the null hypothesis. Look in row
17 for the closest values to the sample t of 1.677 and see 1.333 <
1.677 < 1.740 so the p value is between 0.05 and 0.10 (from the top row of
the table). Then p is greater than the given alpha of 0.05 so accept the null
hypothesis. HOMEWORK DUE Thursday 12/01 10.3 p487 #6,
8, 10, 11, 12 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 11/17 |
PREVIOUS MATERIAL: At the end of lecture,
I presented the results of the lottery trial from after Test #3. The winning
numbers for 10/26 were 15, 20, 25, 40, 41 and Mega 20. Here are the results
for all 232 practice plays done by students in three classes (categories list
the number of regular numbers gotten and M stands for having gotten the Mega
number). It matched expected probabilities well (that we learned to calculate
before Test3 and that appear on the back of the playslip). Some of the plays
could not be considered ÒrandomÓ because they used the same numbers as other
plays but with one difference each time, so that may have thrown the numbers
off a little. One student got 4 of the regular numbers and the Mega, but the
estimated actual payoff of $1500 would not be much out of a $15 million
dollar pot!
LECTURE: We worked on the whole
significance test in class today. This is the material from section 10.2 in
your book. Look at the examples in the section and the following additional
examples: (Modified)In-class example (changed
std. dev to 1.5 to give more appealing values!): Researchers wanted to
measure the effect of mothersÕ alcohol use on the development of the hippocampal
region of the brain in adolescents, and find out whether the volume of this
portion of their brains would be less than the normal volume of 9.02 cubic
cm. They sampled 32 such adolescents and found an average volume of 8.10
cubic cm. Assuming a population std. deviation of 1.5, what could they
conclude at the 0.01 level of significance? Hypotheses: Ho: m = 9.02 Hi: m < 9.02 Level of Significance: alpha = 0.01 Data and calculations: z = (8.10 – 9.02)/(1.5/sqroot32) = –3.47 Decision: Classical: For 0.005 of
alpha going in each tail, we find a critical z value of 2.576 and the sample
z is farther out from the theoretical population mean. Reject Ho. P-value: Look up sample z
of 0.0003 on the z table p =
2(0.0003) = 0.0006 whereas the alpha was 0.10. Reject Ho. Conclusion: The researchers have
strong evidence that the hippocampus of these adolescents is
significantly smaller than 9.02 cubic cm. Example: Grant is in the market to
buy a three-year-old Corvette. Before shopping for the car, he wants to
determine what he should expect to pay. According to the blue book, the
average price of such a car is $37,500. Grant thinks it is different from
this price in his neighborhood, so he visits 15 neighborhood dealers online
and finds and average price of $38,246.90. Assuming a population std.
deviation of $4100, test his claim at the 0.10 level of significance. Hypotheses: Ho: population mean =
37500 Hi: population mean not equal to 37500 Level of Significance: alpha = 0.10 Data and calculations: z = (38246.90 - 37500)/(4100/sqroot15) = 0.71 Decision: Classical: For 0.05 of
alpha going in each tail, we find a critical z value of 1.645 and the sample
z is not as far out from the theoretical population mean. P-value: Look up sample z
of 0.71 on the z table p =
2(0.2389) = 0.4778 whereas the alpha was 0.10. By either method, we find that
we do not have evidence to reject the null hypothesis, so we accept it. Conclusion: Grant does not have any
evidence that the mean price of a 3 yr. old Corvette is different
from $37,500 in his neighborhood. Example: According to the U.S.
Federal Highway Administration, the mean number of miles driven annually in
1990 was 10,300. Bob believes that people are driving more today than in 1990
and obtains a simple random sample of 20 people and asks them the number of
miles they drove last year. Their responses give an average of 12,342 miles.
Assuming a std. deviation of 3500 miles, test BobÕs claim at the 0.01 level
of significance. Hypotheses: Ho: population mean = 10300 Hi: population mean > 10300 Level of Significance: alpha = 0.01 Data and calculations: Z=(12342-10300)/(3500/sqroot20)=2.61 Decision: The alternate hypothesis is one-sided on the right, the sample
z works out to be 2.61, and we reject the null hypothesis. (Classical: alpha of 0.01
gives a critical z value of 2.326 and 2.61 is farther out than that. P-value:
on the z
table, we look up 2.61 and find p = 0.0045 which is less than the alpha of 0.01.
By either of these methods, we reject the null hypothesis). Conclusion: Bob has found
significant evidence that people are driving more today than in 1990, when
they drove an average of 10,300 miles. Homework due Tuesday
11/29: For each of the
following word problems, perform a complete significance test with: --Hypotheses
(null and alternate), --Level of
significance (given in problem), --Sample z
calculation, --Decision to
accept or reject the null hypothesis (show using both methods) --Sentence of
conclusion relating back to the original problem. 1. section 10.2
p477 #22 b, c, d 2. section 10.2
p478 #26 b using 6.3 for the sample mean (from avg of table values) 3. section 10.2
p478 #28 b 4. supplemental
problem: A researcher believes that the average height of a woman aged 20
years or older is greater now than the 1994 mean of 63.7 inches. She obtains
a sample of 45 woman and finds the sample mean to be 63.9 inches. Assume a
population std. deviation of 3.5 inches and test at the 0.05 level. 5. supplemental
problem: The average daily volume of Dell computer stock in 2000 was 31.8
million shares. A traded wants to know if the volume has changed and takes a
random sample of 35 trading days and the mean is found to be 23.5 million
shares. Using a population std. deviation of 14.8 million, test at the 0.01
level of significance. HAPPY
THANKSGIVING: ENJOY YOUR BREAKÉYOU DESERVE IT! |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 11/15 |
READING IN THE TEXT (same as last
time): Ch 10.1
p455-458 (skip type 1 and 2 errors) 10.2 p463-top of
474 (skip objectives 4 and 5) LECTURE: We have talked about what the level of significance is and how it is used, how to find the z value calculation using the sample data, and how to make a decision to accept or reject Ho. Today, we talked about how to state the hypotheses from word problems and how to write the sentence of conclusion as we went over the hmk.
HYPOTHESES: The null hypothesis, symbolized by Ho is what is accepted as true for the population mean until evidence to the contrary is found. The alternate hypothesis, symbolized by H1 is what the investigator or researcher is trying to show (always relating to the number for M used in the null hypothesis). You must form alternate hypotheses based on the intent of the investigator in the problem, not from your own feelings about the situation.
CONCLUSION: You must state what you have found from the sampleÕs evidence, or lack thereof. Write a grammatically complete sentence with the following elements: Tell 1. if you have Òfound evidenceÓ or Ònot found evidenceÓ against the null hypothesis, 2. about what (what was the subject of the investigation?), 3. with respect to what number (what was the number in question in the hypotheses?).
If you are rejecting the null hypothesis, you have found evidence against the null hypothesis and therefore evidence for your claim in the alternate hypothesis. If you are accepting the null hypothesis, you have not found evidence against it and therefore have not found evidence to back up your claim in the alternate hypothesis. Example: An energy official claims
that the output of oil per well in the US has increased from the 1998 level
of 11.1 barrels per day. Suppose that after she takes a random sample and calculates
the sample z value she decides to reject the null hypothesis Ho. Write the
hypotheses and the sentence of conclusion. Answer: H0 : M =11.1 H1 : M >11.1 The energy official has found evidence that the output of oil per well in the U. S. has increased significantly from the 1998 level of 11.1 barrels per day. Example: A Muni bus drives a
prescribed route and the supervisor wants to know whether the average run
arrival time for buses on this route is about every 28 minutes. Suppose that
after we calculate the sample z value the data causes the supervisor to
accept the null hypothesis. Write the hypotheses and the sentence of
conclusion. Answer: H0 : M = 28 H1 : M is not equal to 28 The supervisor has found no evidence that the average run arrival time for buses on this route is significantly different from 28 minutes. Example: A manufacturer produces a
paint which takes 20 minutes to dry. He wants make changes in the composition
to get nicer colors, but not if it increases the drying time needed. Suppose
that after he calculates the sample z value the data causes him to reject the
null hypothesis. Write the hypotheses and the sentence of conclusion. Answer: H0 : M = 20 H1 : M > 20 The manufacturer has found evidence that the composition change significantly increases the drying time, so he will not make a change. (Notice that he is using the test to pull him away from a bad decision). HOMEWORK (due Thursday 11/17): 10.1 p461/462 (the first three groups are paired problems involving the same situation): #16 (state hypotheses) and 24 (write sentence of conclusion), #18 (state hypotheses) and 26 (write sentence of conclusion), #20 (state hypotheses) and
28 (write sentence of conclusion), 10.2 p476 (try some more like last hmk- the answers to one
method are in the back of the book!) #15 (do both methods:
classical and p-value) #17 (do both methods:
classical and p-value) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 11/10 |
Sorry to burden
you with such long notes, but ch.10 can be a difficult read so I want to
point out the main issues and give you more examples than the book provided! READING IN THE TEXT: Ch 10.1
p455-458 (skip type 1 and 2 errors) 10.2 p463-top
of 474 (skip objectives 4 and 5) this section is the difficult reading! SUMMARY OF CH. 10 We will now
move into performing Òsignificance testsÓ. The burden is being shifted from
the middle of the distribution (which was interesting in confidence
intervals) to the tails of the distribution (which are interesting in
significance tests). The middle area, or percentage, was the confidence
level. The area in the tails is now referred to as the significance
level, denoted by alpha, or . ÒSignificance levelÓ because we
will not consider a distance from center to be significant until it goes past the
critical z values that mark the spots that are defined by the confidence and
significance levels. In a significance
test, some body or authority has made a claim that the population mean has a
certain value. We (the researchers) want to put that claim to the test by
taking our own sample average and seeing if it comes reasonably close to the
proposed population mean (and thus makes us believe the original claim), or
if it is sufficiently different from the population mean to cast doubt on the
original value claimed. The stages of the full test are: HYPOTHESES: The null
hypothesis Ho
is what is accepted as true for the population mean until evidence
to the contrary is found. The alternate
hypothesis H1
is what the investigator is trying to show (always relating to the number for
used in
the null hypothesis). LEVEL OF
SIGNIFICANCE: This is which is given in the statement of the word problem, and
it marks the region where one stops believing the null hypothesis and starts
accepting the alternate hypothesis instead. It can also be thought of as the
total area corresponding to rejection of Ho. DATA AND
CALCULATIONS: A sample is
taken to test the null hypothesis and must be standardized with z calculation
for the sampling distribution (see p464 for an example). DECISION: Find if the sample
is too rare (as defined by ) to believe is what it was
claimed to be. There are two methods: the p-value approach (p470) and the
classical approach (p466). CONCLUSION: You must state
what you have found from the sampleÕs evidence, or lack thereof. Write a
grammatically complete sentence with the following elements: Tell 1. if you have
Òfound evidenceÓ or Ònot found evidenceÓ against the null hypothesis, 2. about what
(what was the subject of the investigation?), 3. with respect to what number (what was
the number in question in the hypotheses?). We will be
doing whole significance tests soon, but first, we must take a more careful
look at each part. We start with how to make a DECISION to reject or accept
the null hypothesis. LECTURE NOTES: I gave a
handout with the following abbreviated table of critical values, so you do
not have to look them up on the table backwards each time you want to do a
problem. The top row represents the area in either the left or right tail of
the distribution, and the bottom row represents the positive or negative
critical value. Refer to it as you look at the example problems below:
Two-sided significance test example: If the null
hypothesis is that the mean of a population is 35 and the alternate hypothesis
is that the mean of the population is not 35 (within a certain amount of
acceptable error) and a level of significance is given as 0.05 and you take a
sample and standardize it to get z=2.25, does it give enough evidence to
reject the null hypothesis and therefore accept the alternate hypothesis? P-value approach Compare the alpha area with the p-value
area. The p area is the probability that you would get a value as far away or
farther away from the center as the sample value you got. For a sample of
2.25, the area outside of that would be 0.0122 using your old z table. So for
a two sided test such as above, the two tails would have .0122 in them based
on the sample z. Then the p value is 0.0122+0.0122=0.0244. If p<alpha, you
reject the null hypothesis and if p>alpha, then you accept it. Here,
0.0244<0.05 so we reject the null hypothesis. Classical approach Another way of determination is to
compare the sample z to the critical value of z. The critical values come from
the alpha value. That is, alpha is divided by 2 to get 0.025 (this is
how much goes in each tail of the distribution) and on the short table above,
you see a critical z* of 1.96. If you get a sample z that is further away
from the mean, you have evidence to reject the null hypothesis, as in
this case, since the sample z of 2.25 is farther away from the mean that
1.96. One-sided significance test example: If the null hypothesis is that the
mean of a population is 45 and the alternate
hypothesis is one-sided with the mean is less than 45 (you donÕt care if it
is greater) and alpha is 0.02, sample z=1.96 P-value approach The p value comes from only one tail, and for z=1.96
it is 0.0250 using the z table. Since p is > alpha, we choose to accept
the null hypothesis since the sample is not as rare in probability of
occurrence as the alpha. Classical approach If you have a one-sided alternate
hypothesis ( that is where the mean is > or < a number instead of Ònot
equal toÓ), you donÕt take half of alpha as we did in the two-sided problems,
you put it all into the tail of interest. In
the situation above, we only care about values that stray too far above what
is claimed to be the center. Put all of alpha into the right hand tail. The z
value with 0.02 area to its right is about +2.054 from the table above. Since
1.96 is closer to center, we got a routine sample (one that would happen 98%
of the time) so there is nothing strange about the center being where it is
claimed to be. We accept the null hypothesis. More example problems 1. H0:
m=11 and H1: m not equal to 11 and =0.01
sample z= 2.67 Answer: p value: The area to the left of 2.67 is 0.0038
so the p value is the sum of the left and right tails, 0.0038+0.0038=0.0076<
0.01 (p<alpha) , so reject the null hypothesis. classical: Half of alpha, 0.005 goes into each
tail since the alternate hypothesis is two-sided. The critical values for
0.005 are + or - 2.576 and 2.67 is farther away from center than this, so reject
the null hypothesis. 2. H0:
m=265 and H1: m<265 and =0.01
sample z= -2.25 Answer: p value: The area to the left of -2.25 is the p
value 0.0122> 0.01 (p>alpha) so accept the null hypothesis. classical: All of alpha, 0.01 goes into the left
tail of the distribution since the alternate hypothesis only pertains to
values <265. The critical value for 0.01 is -2.326 and -2.25 is closer to
center than this, so accept the null hypothesis. 3. H0:
m=35 and H1: m>35 and =0.05
sample z= 2.23 Answer: p value: The area to the right of 2.23 is
0.0129 so the p value is 0.0129< 0.05 (p<alpha) so reject the null
hypothesis. classical: All of alpha 0.05 goes into the right
tail of the distribution since the alternate hypothesis only pertains to
values >35. The critical value for 0.05 is 1.645 and 2.23 is farther from
center than this, so reject the null hypothesis. 4. H0:
m=1.23 and H1 m not equal to 1.23 and =0.02
sample z= -2.45 Answer: p value: The area to the left of -2.45 is
0.0071 and to the right of +2.45 is the same, so the p value is
0.0071+0.0071=0.0142< 0.02 (p<alpha) so reject the null hypothesis. classical: Half of alpha, 0.01, is put into each
of the left and right tails of the distribution since the alternate
hypothesis pertains to values not equal to 1.23, that is, both greater than
and less than 1.23. The critical values are + or - 2.326 and -2.45 is farther
away from center than this, so reject the null hypothesis. 5. H0
m=0.045 and H1: m>0.045 and =0.005
sample z= 2.06 Answer: p value: The area to the right of 2.06 is
0.0197 so the p value is 0.0197> 0.005 (p>alpha) so accept the null
hypothesis. classical: All of alpha, 0.005, is put into the
right tail of the distribution. The critical value for 0.005 is 2.576 and
2.06 is closer to center than this, so accept the null hypothesis. 6. H0:
m=4500 and H1: m<4500 and =0.025
sample z= -1.83 Answer: p value: The area to the left of -1.83 is
0.0336 so the p value is 0.0336> 0.025 (p>alpha) so accept the null
hypothesis. classical: All of alpha is put into the left tail
of the distribution due to the alternate hypothesis. The critical value for
0.025 is -1.96 and -1.83 is closer to center than this, so accept the null
hypothesis. HOMEWORK due Tuesday 11/15: Draw distributions for each
with relevant z values and areas (use table above for critical values): 10.2 p476 12. verify sample z = 1.92
from part a, and do parts b, c, d 13. verify sample z = 3.29
from part a, and do parts b, c, d 14. verify sample z =
–1.32 from part a, and do parts b, c, d 16. verify sample z = 1.20
and do the rest of part a, b 18. verify sample z = 2.61
and do the rest of part b, c |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 11/08 |
Test #4 will occur on
Thursday. Homework is to study for it. Some additional examples, as
requested: --7.3 Example: In an experiment to determine the amount
of time required to assemble an "easy to assembleÓ toy, the average time
to assemble it was 27.8 minutes with a standard deviation of 4.0 minutes.
What is the probability that a randomly selected person will assemble the toy
in more than 30 minutes? Answer: You want to find the area under the
curve corresponding to x values more than 25. When x=30, z= (30-27.8)/4=
0.55. We use this standardized value of x to look up the area under the
curve. On the table, this z value gives an area of 0.2912. Since this is the
area for values more than 30 ( to the right of 30) this is our answer. About
29% of the time one would expect a person to assemble the toy in more than 30
minutes. Example: In a very large world history class,
the final exam grades have a mean of 66.5 and a standard deviation of 12.6. Above what score lie the highest 25% of the scores? Answer: This is a ÒbackwardsÓ problem since you are looking for an x value, having been given an area. You are given the area of 0.25, so draw this area as a right tail in the distribution since the highest scores in the class are to the right of the mean (the average score is in the center). The closest value to 0.25 in the table is 0.2514. This area corresponds to a z value of +0.67 (note that the one in class was negative!). This is a positive z value since the higher grades are in the right side of the distribution. You will not get the correct answer if you do not include the correct sign of the z value. We ÒunstandardizeÓ this value by using the formula to solve for x and get +0.67=(x-66.5)/12.6 so
x=(+0.67)(12.6)+66.5=74.942. About 25% of the class had scores of 74.9 or higher. --8.1 Example: According to the U.S. Federal Highway
Administration, the mean number of miles driven annually in 1990 was 10,300.
A simple random sample of 20 people and asks them to disclose the number of
miles they drove last year and gives an average of 12,342 miles. Assuming a
std. deviation of 3500 miles, standardize the sample mean. Answer: z=(12342-10300)/(3500/sqroot20)=2.61 --9.1 Example: What are the critical values for a confidence level
of 57%? Answer: The total area in the tails is 0.43 so half in
each tail is 0.215 which gives z = 0.79 --9.1 Short answer about
changes in n, E, confidence levels and intervals, etc. Example: . If you take
a larger sample size does the confidence interval become wider (less precise)
or narrower (more precise) about the mean? Answer: Narrower (since error gets smaller). --9.1 Example: A large hospital found
that in 50 randomly selected days it had on average 96.4 patient admissions
per day, assuming a std. deviation of 12.2 from previous studies. Construct a
90% confidence interval for the actual daily average number of hospital
admissions per day. Answer: 96.4-2.84 = 93.56 and
96.4+2.84 = 99.24 Example: In the problem above, how large a sample of days
must we choose in order to ensure our estimate is off by no more than 2 daily
admissions of patients? Answer: z*=1.645, std.dev.=12.2, E=2 so n=101 Example: A Gallup poll asked 500 randomly
selected Americans, "How often do you bathe each week?". Results of
the survey indicated an average of 6.9 times per week. Using a population
std. deviation of 2.8 days, what could we say with 80% confidence about the
error in using 6.9 times per week as an estimate of the true average number
of times Americans bathe? Answer: Using the formula for error, n=500,
s=2.8, critical z=1.282, so we get E=0.16. That is, in 80 samples out of 100
we would expect the number of times that Americans bathe per week could be
estimated by 6.9 times and be off by no more than 0.16 times in either
direction. (The less confident you are willing to be in your estimate, the
better estimate you get). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 11/03 |
LAST Homework
items we did not get to go
over in class: 9.1 p389 # 24 a. n=20,
stddev=17, critical z=1.88, we get an error of 7.15, so 123 – 7.15 = 115.85 and
123 + 7.15 = 130.15 115.85 < m < 130.15. b. n=12,
stddev=17, critical z=1.88, we get an error of 9.23 (increases from part a),
so 123 – 9.23= 113.77 and
123 + 9.23= 132.23 113.77 < m
< 132.23. c. n=20,
stddev=17, critical z=1.44, we get an error of 5.47 (decreases from part a),
so 123 – 5.47 = 117.53 and
123 + 5.47 = 128.47 117.53 < m
< 128.47. d. No, because
the sample size was less than 30. e. Increase the
mean, shifting the interval to the right. 9.1 p389 # 30 To increase the precision
of the interval is to make the lower and upper bounds come closer together,
and this can be done by increasing the sample size or lowering the confidence
level. LECTURE/ READING IN THE TEXT: We worked on word problems
in class involving confidence intervals, error, and sample size. You can find
the formulas and examples in 9.1 p405-415, as in the reading from last time. HOMEWORK due Tuesday 11/08: 1. 9.1 p418 #34
2. 9.1 p420 # 44
3. Supplemental problem to the text: A random sample of 300 telephone calls made to the office of a large corporation is timed and reveals that the average call is 6.48 minutes long. Assume a std. deviation of 1.92 minutes can be used. If 6.48 minutes is used as an estimate of the true length of telephone calls made to the office, a. What can the office manager say with 99% confidence about the maximum error? b. What can the office
manager say with 90% confidence about the maximum error? (do not do a whole confidence interval for these, just use the error formula).
4. In the text, 9.1 p418 #36 the answer to part a is x bar (the sample mean) = $3727. Using this info, do parts c, d, and e.
5. Supplemental problem to the text: A large hospital finds that in 50 randomly selected days it had, on average 96.4 patient admissions per day. From previous studies it has been determined that a population std. deviation of 12.2 days can be used. Using a 90% confidence level, a. How large a sample of days must we choose in order to ensure that our estimate of the actual daily number of hospital admissions is off by no more than five admissions per day? b. How large a sample of days
must we choose to have 25% of the error we had in part a? TEST #4 FORMAT for Thursday 11/10. A copy of the z table will be provided, along with
formulas for z values (populations and sampling distributions), confidence
intervals, error, and sample size, and the critical
values for 90/95/99%. --Several word problems
(perhaps with related parts to cut down on repeated calculations and table
look-ups) like those in 7.3, with one ÒbackwardsÓ problem for sure. Note that
you use the z formula for populations. --Standardize a
sample mean given the z formula for sampling distributions as in 8.1. Other
issues in 8.1, such as comparing population and sampling distributions (like
the quiz today) and questions of normality and whether sample size must be 30
or more will not be included, as they either do not fit well into the test or
duplicate the work tested in other problems (such as those from 7.3). --Show how to find the z* or
critical values for a given confidence level using the z table backwards as
in 9.1. --Some short
answer questions on sampling distributions and the effect of changes to
sample size and confidence on error and confidence intervals, as in 9.1. (For
example, what is the effect on the error in using the sample mean to estimate
the population mean if you take a smaller sample size? It gets bigger). --At least one
each of word problems (some with follow-up parts), as in 9.1 and supplemental
problems from homework, dealing with confidence intervals, error and sample size,
not necessarily in that order. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 11/01 |
Test grades
so far: By 5 pm today, I will link
up the ÒtestscoresÓ item for your class at
http://www.smccd.edu/accounts/callahanp
so you can compare the averages I have for you with what you got on
your first 3 tests. The scores are by code given in class today. If you were
not there to receive your code, please ask for it on Thursday. Homework items we did not get to go over in class: 8.1 p389 # 22 a. use the z calculation
for the population: z = (40 – 43.7) / (4.2) = – 0.88 so area to
the left is 0.1894 using the table. b. use the z calculation
for the sampling distribution z = (40 – 43.7) / (4.2/sqroot 9) =
–2.64 so area to the left is 0.0041. c. use the z calculation
for the sampling distribution z = (40 – 43.7) / (4.2/sqroot 12) =
–3.05 so area to the left is 0.0011. d. as sample size is
increased, the probability decreases in the tails because the curve is less
spread out due to a smaller std. deviation. e. use the z calculation
for the sampling distribution z = (46 – 43.7) / (4.2/sqroot 15) = 2.12
so area to the left is 0.0170, which happens in less than 2% of all samples,
so is not common. 8.1 p389 # 24 a. use the z calculation
for the population: z = (95 – 85) / (21.25) = 0.47 so area to the right
is 0.3982 using the table. b. use the z calculation
for the sampling distribution z = (95 – 85) / (21.25/sqroot 20) = 2.10
so area to the right is 0.0179. c. use the z calculation
for the sampling distribution z = (95 – 85) / (21.25/sqroot 30) = 2.58
so area to the right is 0.0049. d. as sample size is
increased, the probability decreases in the tails because the curve is less
spread out due to a smaller std. deviation. e. happens in less than
½ % of all samples, so is very uncommon. 8.1 p389 # 30 a. since n = 40, this is a
large enough sample to ensure that the sampling distribution will be
approximately normal. b. mean of sampling
distribution same as population 20, and std. deviation is sqroot20/sqroot40 =
sqroot of ½ which is about 0.707. c. z = (22.1 – 20) /
(sqroot20/sqroot 40) = 2.97 so area to the right is 0.0015 using the table,
and this is unusual, since it would indicate that a sample like this would
happen less than 2 times in 1000 samples. It would be considered an anomaly:
an indicator that business fluctuates greatly during this time. READING IN THE TEXT: Material in 9.1 in the
text: p405-414 (and read p414-415
for sample size formula for next time if you have the time!). HOMEWORK due Thursday 11/03: Make note for yourself in
comparing the parts to a problem how changes in confidence and sample size
affect error and interval width. In your work below, round error and
confidence interval values to 2 decimal places. Turn in: 9.1 p416 #14 (do as in 7.2
p347 #23-26) 9.1 p416 #16 (this was done
in 7.2 p347 #23 also, so you have the answer from there!) 9.1 p416 # 22 (critical
value for 90% given on p410 and for 98% as the answer to 9.1 #13) 9.1 p416 # 24 (use the
critical value for 94% you found in 9.1 #14, and for 85%, the answer is given
in 9.1 #15). 9.1 p416 #30 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 10/27 |
LECTURE: We looked at the word problems from homework in
detail, and then started looking at section 8.1: about the sampling
distribution, which lets us work with a normal distribution of all possible samples
from a population even the original population itself is not normal. If a
population is normal, then the sampling distribution with any size sample is
normal. But if a population is not normal, this text chooses that the sample
size must be greater than or equal to 30 for the sampling distribution to be
considered approximately normal. You will be using a
modified z calculation that takes into account the size of a sample. I
suggest to avoid errors, you should protect the numerator and denominator with
parentheses or multiply by the reciprocal of the fraction in the denominator
instead of dividing by it: For example, if the mean of
the population is 12.7, the std. deviation of the population is 2.5, the
sample size is 5, and the sample mean is 12.5, READING IN THE TEXT: Browse section 8.1
p377-388, note definitions on p381 and 385, see ex3 p382 and ex5 p387. We will move on to 9.1 next
lecture if you want to start reading ahead. exampleS: 8.1 p389 # 19 a. Since the sample size is
less than 30, the population must be normal with mean of sampling
distribution same as population 64, and std. deviation is 17/sqroot 12 = 4.91 b. z = (67.3 – 64) /
(17/sqroot 12) = 0.67 so P(<67.3) is area to left of 0.67 = 1 –
0.2514 = 0.7486 c. z = (65.2 – 64) /
(17/sqroot 12) = 0.24 so
P(>65.2) is area to right of
0.24 = 0.4052 8.1 p389 # 21 a. use the z calculation
for the population: z = (260 – 266) / (16) = – 0.38 so area to
the left is 0.3520 using the table. b. The sampling
distribution is normal with mean of 266 and std. deviation of 16/sqroot20 =
3.58 c. use the z calculation
for the sampling distribution z = (260 – 266) / (16/sqroot 20) =
–1.68 so area to the left is 0.0465 d. use the z calculation
for the sampling distribution z = (260 – 266) / (16/sqroot 50) =
–2.65 so area to the left is 0.0004 e. since 0.0004 is small,
the result is unusual f. find the area between
266 – 10 = 256 and 266 + 10 = 276 z = (256 – 266) /
(16/sqroot 15) = –2.42 so area to left is 0.0078 z = (276 – 266) /
(16/sqroot 15) = 2.42 so area to right is 0.0078 area between is 1 –
0.0078 – 0.0078 = 0.9844 HOMEWORK (due on Tuesday 11/01): 8.1 p389 # 18, 20, 22, 24, 30 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 10/25 |
LECTURE: We worked on normal word problems
from 7.3 again today, looking at variations of problems from your last
homework. As there are not enough examples and problems in the book, I will
supplement the book with some different problems below and ask for some
variations of them for homework. In a ÒforwardsÓ
problem, given an x value (or two x values), use the standardizing formula z = ( x –
mean)/(std. deviation) to find the z value(s). Then look up the area in the
tail of the distribution corresponding to each z value. Find the area you
want using these tail areas. In ÒbackwardsÓ
problems, given an area (%, probability, proportion) find the x value that
bounds it by reversing the process from the ÒforwardsÓ problems. Identify the
given area in a picture and search the middle of the table for the closest
area to the one you are given, map it backwards to find the row and column it
belongs to in order to find the z value, then take the resulting z value and
ÒunstandardizeÓ it (solve for x) in the formula z = ( x –
mean)/(std. deviation)! Supplementary
exampleS (most problems in the last hmk were
ÒforwardsÓ problems, so 3 of these 4 examples will be the harder ÒbackwardsÓ
ones): 1. A salesman has an average car route trip time of 4.3
hours with std. deviation of 0.2 hours. What is the probability that the
length of his car trip will last anywhere from 4 to 4.5 hours? Answer: This is a ÒforwardsÓ problem. For x=4, z=(4-4.3)/0.2=-1.5 and for x=4.5. z=(4.5-4.3)/0.2=1.0. The area to the left of –1.5 is 0.0668 and the area to the right of 1.0 is 0.1587. The area between is 1-(0.1587+0.0668)=0.7745, so there is about a 77% chance that his trip will last anywhere from 4 to 4.5 hours. 2.The lengths of sardines
received by a cannery have a mean of 4.64 inches and a standard deviation of
0.25 inches. If the distribution of these lengths can be approximated closely
with a normal distribution, below which length lie the shortest 18% of the
sardines? Answer: This is a ÒbackwardsÓ problem since you are looking
for an x value (length of sardines), having been given an area. The area of
18% or 0.18, is a left-hand tail area, because it represents below-average
lengths. The closest value to this in the table is 0.1814. This area
corresponds to a z value of – 0.91. We
ÒunstandardizeÓ this value by using the formula to solve for x and get – 0.91=(x-4.64)/0.25 so x=(–
0.91)(0.25)+4.64= –0.2275+4.64=4.41.
About 18% of the sardines measure 4.4
inches or shorter. 3.The
average assembly time for a product is 27.8 minutes with a standard deviation
of 4.0 minutes. Above what number of minutes lie the 25% slowest assembly
times? Answer: This is a ÒbackwardsÓ problem since you are
looking for an x value (number of minutes), having been given an area. You
are given the area of 25% or 0.25, and the closest value to this in the table
is 0.2514. This area corresponds to a z value of +0.67. This is a z value on
the right side of the distribution since the ÒslowestÓ assembly times
involve the most number of minutes, and these occur in the right side of the
distribution, where the average time of 27.8 is in the middle. Here is a case
where the wording in the problem might not match your intuition! The
right-hand side of the distribution does not necessarily represent the biggest,
strongest, fastest numbers! Now we ÒunstandardizeÓ this
value by using the formula to solve for x : +0.67=(x-27.8)/4.0 so
x=(+0.67)(4.0)+27.8=30.48 or about 30 minutes. The 25% slowest assembly times take
about 30 or more minutes. (almost the same as the previous ÒforwardsÓ version
of the problem). 4.
The average assembly time for a product is 27.8 minutes with a standard
deviation of 4.0 minutes. Above what number of minutes lie the 60% slowest
assembly times? Answer: This is a ÒbackwardsÓ problem since you are
looking for an x value (number of minutes), having been given an area. You
are given the area of 60% which as a decimal is 0.60, but you can only find
areas from 0% to 50% on the table! To find an area of 60% above an x value,
the x value must be to the left of the mean. So you must use the area to the
left of this x value, which is 40%, or 0.40. The closest value to 0.40 in the
table is 0.4013. Looking from this area to what row and column it belongs to,
we see this area corresponds to a z value of –0.25.
This is a z value on the left side of the distribution, so it is
negative. Now we ÒunstandardizeÓ this
value by using the formula to solve for x : –0.25=(x-27.8)/4.0 so x=(–0.25)(4.0)+27.8=26.8
minutes. The 60% slowest assembly times take about
26.8 or more minutes. READING IN THE TEXT: 7.3 p349-352, but using the
table given in class. HOMEWORK due Th 10/27 (the last two
are ÒbackwardsÓ problems): 1. If
final exam grades have a mean of 66.5 and a standard deviation of 12.6,
what percent of the class should receive an A if AÕs are earned by those with
scores of 87 or better? 2. The average amount of
radiation to which a person is exposed while flying by jet across the U.S. is
4.35 units with std. deviation of 3.2. What is the probability that a
passenger will be exposed to more than 4 units of radiation? 3. The average
time to assemble a product is 27.8 minutes with a standard deviation of 4.0
minutes. What percent of the time can
one expect to assemble it in anywhere from 30 to 35 minutes? 4. The number
of days that patients are hospitalized is on average 7.1 days with std.
deviation of 3.2 days. How many days do the 20% longest-staying patients stay? 5. For a salesman driving between
cities, the average trip time is 4.3 hours with std. deviation of 0.2 hours.
Below what time lie the fastest 10% of his trips? |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 10/20 |
READING IN THE TEXT: 7.3 p349-353 exampleS: 7.3 p354 #17 a. area less than z =
(20-21)/1 = –1 is 0.1587 so about 16% of the eggs are expected to hatch
in less than 20 days. b. area more than z =
(22-21)/1 = +1 is 0.1587 so about 16% of the eggs are expected to hatch in
more than 22 days. c. area less than z =
(19-21)/1 = +2 is 0.0228 and area more than z = 0 is 0.50, so the area
between is 1 – 0.0228 – 0.50 = 0.4772 so about 48% of the eggs
are expected to hatch in 19 to 21 days. d. area less than z =
(18-21)/1 = +3 is 0.0013 which happens 0.13% of the time (much less than 1%).
7.3 p354 #21 (other parts I did not do in class) b. area less than z =
(250-266)/16 = –1 is 0.1587 so about 16% of pregnancies last less than
250 days. d. area more than z =
(280-266)/16 = +0.88 is 0.1894 so about 19% of pregnancies last more than 280
days. e. area no more than z = (245-266)/16
= –1.31 is the same as area less than –1.31 which is 0.0951 so
about 10% of pregnancies last no more than 250 days. f. area less than z =
(224-266)/16 = –2.63 is 0.0043 so pregnancies lasting less than 224
days happen less than ½ of a percent of the time, therefore are
considered rare. 7.3 p355 #29 Be careful of this oneÉyou are working backwards! b. Find the z value such
that there is 0.03 area in the right tail. Since 0.0301 is the closest area
to 0.03 in the table and it corresponds to a z value of 1.88, we must work
backward to find the x value for this z in the formula. We are given mean 17
and std. deviation 2.5 so z = (x
– 17)/2.5. Then if z = 1.88,
1.88 = (x – 17)/2.5. Multiply both sides by 2.5 to get
(1.88)(2.5) = (x – 17) and then add 17 to both sides to get (1.88)(2.5)
+ 17 = x so x = 21.7 minutes. HOMEWORK due Tuesday 10/25 7.3 p354 #18 do all parts abcd, then
do an extra part e: What is the probability
that a randomly selected sixth-grade student reads less than 125 words per minute? #20 do all parts abcd #28 do as with the third
example above (#29). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 10/18 |
Homework is to study for
Test #3. The format is in the previous notes below. Before the test, I will
be giving you some instruction about problems from 7.3, which involves some
of what you are being tested on. I will put up a few notes after the test, so
please check back then. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 10/13 |
Homework items we did not get to go over in class: 5.5 p278 #66 a.
(3C2*9C1)/12C3
= (3*9)/220 = about 0.1227 b.
(3C1*9C2)/12C3
= (3*36)/220 = about 0.4909 c.
(3C3*9C0)/12C3
= (1*1)/220 = about 0.0045 d.
(3C0*9C3)/12C3
= (1*84)/220 = about 0.3818 e. It is a
probability distribution: every event is represented by the parts above, and 0.1227 + 0.4909
+ 0.0045 + 0.3818 = 0.9999 ~ 1 (due to rounding) LECTURE: We have done what we are going
to do in 5.5, and will skip ch 6, as we have already talked about probability
distributions in other sections. So now we go to ch 7 and back to normal
distributions! Recall from chapter 3, once you know how to calculate the mean and standard deviation for a normal distribution, you can put these two important numbers to work to ÒstandardizeÓ any normal distribution. We convert a particular score in the distribution into a standardized z score by way of the following standardizing formula: z = (x – mean)/ std.deviation What if a normal distribution has a mean of 63.2 and a standard deviation of 7.8, find the z value for each using the formula: a. 63.2 Answer: z=(63.2-63.2)/7.8= 0 (the mean always standardizes to 0!) b. 58.9 Answer: z=(58.9-63.2)/7.8= -0.55 c. 64.8 Answer: z=(64.8-63.2)/7.8= 0.205128205.... = 0.21 (round to 2 places!) We do this in order to be able to look up the area under the curve for a given value using the table. When you work on these problems involving areas under the normal curve, draw a normal curve and fill in the numbers of interest for a particular problem on number lines below the curve. Once you have all known values on your picture, think about what area under the curve you are looking for and what areas you have from the table for the z values of interest. Then you must decide how to use the table values to find the area you need, and this may not be immediately apparent! We focused today just on using the table below to find areas under the already standardized curve. The z table gives the areas under the std. normal curve for particular z scores. We will use a modified version of this table from your book, where we employ the symmetry of the curve so that the area to the left of a negative z value is the same as the area to the right of a positive z value. In that way, you can look up the z values below as + or – instead of just –. I gave a copy of the
modified table in class today—ask for it next time if you were not
there.
To look up a particular value of z, you put together the row and column that make up the z value. The left-most column gives the ones and tenths places of the z, but the uppermost row gives the hundredths place of the z value. For instance, if the z value is 1.83, since 1.83=1.8+0.03, you look to the row of 1.8 and the column of 0.03 to find the area under the curve : 0.0336. For example: What is the area under the
standard normal curve for values to the left of z= -1.57 ? Answer: Putting row 1.5 with column 0.07 we get 0.0582 exampleS: 1. What is the area to the left of z= –2.04? Answer: 0.0207 2. What is the area to the right of z= 2.79? Answer: 0.0026 3. What is the area to the left of z= –0.06? Answer: 0.4761 4. What is the area to the left of z= –0.60? Answer: 0.2743 5. What is the area to the right of z= 0.60? Answer: 0.2743 6. What is the area to the right of z= –1.74? Answer: 0.9591 (from 1-0.0409) 7. What is the area to the left of z= 1.05? Answer: 0.8531 (from 1-0.1469) 8. What is the area between
z= 0.87 and z= 2.03? Answer: 0.1710 (The smaller tail corresponding to z= 2.03 has area of 0.0212 and the larger tail corresponding to z=0.87 has area of 0.1922. The smaller area is contained within the larger area, so to find the area between, take the larger and subtract the smaller: 0.1922–0.0212=0.1710) 9. What is the area between
z= –0.25 and z= –1.97 Answer: 0.3769 (Same as the above, subtract smaller tail from larger: 0.4013-0.0244 = 0.3769) 10. What is the area
between z= –2.09 and z=3.07? Answer: 0.9806 (Different from the previous two problems,
because the values are on opposite sides of the distribution and so it is not
the case that one tail is contained within the other. You must start with
100% of the whole distribution and Òchop offÓ the two tails using
subtraction: 1 – (0.0183 + 0.0011)
or 1 – 0.0183 – 0.0011 which equals 0.9806) READING IN THE TEXT: (Review of previous topics 7.1 p327-332 standardizing formula and area under the normal curve) 7.2 p337-346 finding area under the normal curve (be careful that we a using a modified version of the table in the book—the table in the book has a two-page table with separate +/- values, but the answers to the area exercises should ultimately come out the same) HOMEWORK due Tuesday 10/18 1. In Super Lotto Plus (previous hmk), find the probability of getting
2 of the regular numbers and not getting the Mega number. To make your life a
little easier, here are some shortcuts: nC0=1 for all n (so 5C0 = 1 for example) nC1=n for all n (so 5C1 = 5 for example) nCn=1 for all n (so 5C5 = 1 for example) 5C2=10 42C3=11,480 47C5=1,533,939 2. Using the table and material from lecture today, find the area
between z = – 2 and z = +2. 3. Using the table and material from lecture today, find the area
between z = – 3 and z = +3. 4. 7.2 p346 # 6 parts a and c 5. 7.2 p346 # 8 parts b and d 6. 7.2 p346 # 10 parts b and c 7. 7.2 p347 # 16 (see ex. 5 p341 and search the areas in the table for
the closest value to 0.2000 and tell what z value that area belongs to
– this is using the table backwards) 8. 7.2 p347 # 18 (see ex. 5 p341 and 7 p343 and use 15% or 0.1500 to do
as in #16!). 9. 7.2 347 #23 (see ex. 8 p343 putting 0.10 in each of the left and
right tails) 10. 7.2 347 #25 (see ex. 8 p343 putting 0.005 in each of the left and right
tails) TEST #3 FORMAT: Test #3 will occur as
scheduled on Th 10/20. I will provided the general
addition and multiplication rules and formulas for nPr and nCr. It will likely contain (I
will firm it up on Tuesday): 1. One short problem like 5.1 p234 #32-34 2. One probability model set-up like 5.1 p235 #40 combined with
questions from a problem like 5.2 p246 #26 3. Given a table like 5.2 p248 #42, 44 but combining it with material
from 5.4, find probabilities of events like: P(A), P(A given B), P(A and
B), P(A or B). Events may be stated in words (as in #42, 44 for example) or
defined with letters such as A and B, and the multiplication and addition
rules must be shown explicitly. 4. One general addition rule card problem like 5.2 #32ac or ex3 p242 5. One general multiplication rule word problem like 5.4 #12-16 6. One nPr or nCr to show computation from given formula, like 5.5 p276
#18 or 26 7. About 4 situations like 5.5 #46-50 to decide if order matters and
write appropriate nPr/nCr but not compute it 8. Write a more complicated probability using a quotient of nCr counts
from subsets like 5.5#66 or ex15 p276 9. A problem like 5.5 p278 #60 or Super Lotto examples to write an
event probability with a quotient of nCr counts 10. Various areas to look up like todayÕs hmk from 7.2 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 10/11 |
CLASS EXERCISE: For 5 identical job positions there are
15 applicants, 6 of whom are female. What is the probability that in filling
these 5 positions, we will get exactly 2 females? Answer:
(6C2*9C3)/15C5=(15*84)/3003
=0.42 (More
examples of this below). LECTURE: We looked at the general addition
and multiplication rules from 5.2 and 5.4 in a quiz, went over hmk from 5.5
(how to tell if order matters or not in deciding to compute Permutations nPr
= n!/(n-r)! and Combinations nCr = n!/[r!(n-r)!]). We looked at one more new
topic from 5.5 today: how to form more sophisticated probabilities using nCr
counts. READING IN THE TEXT: 5.5 p271 ex7 and p273 ex11
and p275 Ex. 14 (deciding if order is important) 5.5p275-276 ex 14, 15
probabilities involving combinations exampleS: 5.5 p277 #47
order matters: 20P4 5.5 p277 #51
order doesnÕt matter: 50C5 5.5 p278 #62 c.
(55C3*45C4)/100C7 = (26235*148995)/(1.60075608x10 to the 10th
power) = about 0.24 5.5 p278 #65a. Out of the 13 tracks, 5 are liked so 8 must be disliked. You are taking 2 of 5 liked and 2 of 8 disliked for the event probability on the top of the fraction. On the bottom of the fraction, any 4 could pop up from the 13 tracks available.
(5C2*8C2)/13C4 = (10*28)/715 = about 0.39 Another example not in book: Out of 125
dishes in a box, 8 are chipped. If we select 6 dishes at random from the box,
what is the probability that exactly 1 will be chipped? This is using the
techniques from card counting above in the same way, but with a different set
and subsets: Answer: Out of 125
dishes in a set, if 8 are chipped, 117 are not.
(8C1*117C5)/125C6=(8*167549733)/4690625500
=0.29
One of the most famous
examples of this counting method is: THE CALIFORNIA LOTTERY:
SUPER LOTTO PLUS To play the game, you are asked to pick 5 different numbers choosing from 1 to 47 regular numbers and one ÒMegaÓ number choosing from 1 to 27. The top prize (which is the one advertised in millions) goes to whoever matches all 5 of 5 winning numbers and matches the one Mega number. Much smaller prizes are awarded for matching some of the numbers. Prizes are awarded to the following winning combinations:
As I suggested in class, it
can be quite helpful to map out the strategy for outcomes by breaking down each
number set into the important subsets and see how many are to be taken from
each. Getting any 3 of 5 and not getting the Mega:
( ( 5C3 * 42C2 ) / 47C5 ) * ( (1C0*26C1) / 27C1 ) = ( ( 10 * 861 ) / 1,533,939 ) * ( (1*26) / 27 ) = 223860 / 41,416,353 = 0.005405111 because they rounded the odds of 1/185.01 to 1/185.
Some more examples without
the tables: Getting any 3 of 5 and the Mega (notice that you are selecting 3 of 5 winners and 2 of 42 losers): ( ( 5C3 * 42C2 ) / 47C5 ) * ( 1C1 / 27C1 ) = ( ( 10 * 861 ) / 1,533,939 ) * ( 1 / 27 ) = 8610 / 41,416,353 = 0.00207889 which is slightly different from above, because the odds work out to about 1/4810.26 and the lottery rounds the figure to 1/4810 as above. Getting all 5 of 5 and the Mega (any 5 of the 47 could be chosen, but you want all 5 of the 5 available winners, and any 1 of the 27 possible Megas could be chosen, but you want 1 of only 1 successful): ( 5C5 / 47C5 ) * ( 1C1 / 27C1 ) = ( 1 / 1,533,939 )( 1 / 27 ) = 1 / 41,416,353 = 0.000000002 as above. HOMEWORK due Th 10/13: 1. 5.5 p277 #48, 2. In the Illinois Lottery, in how many different ways can one pick the set of six winning numbers from the 51 available to choose from? Does order matter? (Compute both permutations and combinations for this one. Then think about the probability of winning being 1 chance in the number you just found. For example, if you found the number of different ways to pick lottery numbers was 12,345 , then if you purchased one ticket, you would have a one chance in 12,345 of winning. That is, 1/12,345 = 0.000081004 chance of winning. Consider that there are about 13 million people in Illinois, assume that almost that many tickets are sold each time, and that the Lottery officials would like someone to win at least every other drawing! Which looks more appropriate now, 51C6 or 51P6?) 3. 5.5 p278 #60, 4. In Super Lotto Plus
above in the examples, find the probability of not getting any of the regular
numbers and not getting the Mega number either! 5. 5.5 p278 #66 Do parts a,
b, and c, and also find P(0 diet). Check that all the parts form a
probability distribution. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 10/06 |
LAST HMK: We went over the problems
in 5.2 and 5.4 in detail, with the exception of 5.4 #18: a. P(former given cancer) = 91/(782+91+141) = 0.09 b. P(cancer given former) = 91/(91+7757) = 0.01 LECTURE: I introduced material from
section 5.5 that we will continue with on Tuesday. To form more complicated probabilities,
one must know how to count sometimes large and complex numbers of things. We considered the example
of how to count the different ways one can take 3 letters, without
repetition, from a set of four letters{ A, B, C, D }. We found 24 permutations
(order matters) and 4 combinations (order doesnÕt matter) with the help of a
tree diagram. We could have performed these counts without a tree, using the
formulas in 5.5. Before another example, please read the pages below. READING IN THE TEXT: 5.5 pages 266 thru the end
of example 11 on p273, and try some of the
computations in the skill building section on p276/277 for yourself (check answers to odds in the back of the
book). Especially read about: -- tree diagrams on p267 -- factorials on p269 -- permutation formula p270 -- combination formula p272 --p271 ex7 and p273 ex11
deciding if order is important another
example: Suppose that we wish to list the number of ways that we can choose three letters at a time from the following set of five letters { A, B, C, D, E } without choosing a letter more than once at a time and where order of the letters is important (i.e., ABC is not the same sample as CBA because the order of selection is different, so they therefore represent different choices). We could make the following selections (one would not want to list them with a Òtree diagramÓ!):
So there are 60 ways to select 3 letters from a set of 5 where order of the letters is important. The easier way to come up with this number without listing the selections is to use the formula on p270! Check that you get 5P3 = (5!)/(5-3)! = 5!/2! = (5*4*3*2*1)/(2*1) = 120/2 = 60 using the formula on p270.
Now suppose that we wish to make the same count, but where order of letters is not important (i.e., ABC is considered the same sample as CBA). Our table would now lose many of its items. If ABC is the same as ACB, BAC, BCA, CAB, CBA. If ABD is the same as ADB, BAD, BDA, DAB, DBA. If ABE is the same as AEB, BAE, BEA, EAB, EBA. If ACD is the same as ADC, CAD, CDA, DAC, DCA, If ACE is the same as AEC, CAE, CEA, EAC, ECA, If ADE is the same as AED, DAE, DEA, EAD, EDA, if BCD is the same as BDC, CBD, CDB, DBC, DCB. if BCE is the same as BEC, CBE, CEB, EBC, ECB. if BDE is the same as BED, DBE, DEB, EBD, EDB. if CDE is the same as CED, DCE, DEC, ECD, EDC.
That leaves us with 10 different ways to choose letters, represented by the following individuals: ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE. Or, we could just count them using the formula on p272! Check that you get 5C3 = (5!)/((3!)(5-3)!) = 5!/(3!*2!) = (5*4*3*2*1)/((3*2*1)*(2*1)) = (120)/((6)(2)) = 120/12 = 10 using the formula on p272. HOMEWORK due Tuesday 10/11: (Please perform problems in the order listed below) 5.5 p276 # 6, 8, 14, 16, 24
(note that #9 tells you that 0! = 1 by definition) 5.5 p277 #28, showing the possible paths on a tree diagram. 5.5 p277 #30, showing which outcomes in #28 above are repeats of other outcomes. 5.5 p277 #18 and see how
quickly the counts can get out of hand, even with small sets of objects.
Would you want to list all of these selections in a table or on a tree
diagram? 5.5 p277/278 #46, 50
deciding if order matters first, then computing the appropriate P or C. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 10/04 |
ANSWERS to 5.1
p233 #12, 14, 32, 34, 40, 52 from last hmk (bring questions next time if you
have some): 12. All of them
are yellow, since P(yellow) = 1 = 100% 14. Not a
probability model since 0.1+0.1+0.1+0.4+0.2+0.3 = 1.2 not 1 32. a. P(at
least once) = 341/110 = 0.31 b. we expect a larger sample of female
adults to produce a similar result: that about 31% of all female adults
volunteered at least once last year. 34. a.
P(Titleist) = 35/80 = 0.44
b. P(Top) = 20/80 = 0.25
c. Based on this sample, we would expect larger bags of this type to
contain about 44% Titleist and about 25% TopFlite. 40. Adding the
total of the categories, we get 4521 so the probabilities are 118/4521 = 0.03,
249/4521 = 0.06, 345/4521 = 0.08, 716/4521 = 0.16, 3093/4521 = 0.68. 52. a. P(8
girls) = P(1st girl)*P(2nd girl)***P(8th
girl) =
½*½*½*½*½*½*½*½=
0.00390625 so this is
classical since it results from expectation, not experimentation or
experience. b. Empirical since
based on an experimental survey of 1000. c. Subjective since
based on personal experience. d. Empirical since
based on trials in an experiment. LECTURE: We looked at section 5.2
the two forms for the addition rule p238 and p241/242 and the two forms for
the multiplication rule p251 and p259. The second form of each rule (p242 and
p259) is the general form, which actually covers the cases of the first forms
presented (p238 and p251). ÒE or FÓ is a union of sets
and ÒE and FÓ is an intersection of sets. The general addition rule of section 5.2
includes intersections as part of the union (subtracting the intersection
once so as not to double count it in the union): P(E or F) = P(E) + P(F)
– P(E and F) In table problems, we
treated P(E and F) as an intersection of a row and column in the table. Now
we can also use the general multiplication rule to find it: P(E and F) = P(E)áP(F given E), where P(F given E) is the conditional
probability: the probability that event F occurs given that event E has
occurred or that E is a subset being chosen from. You can find a table
discussion similar to the one we had in class in example 4 on p242/243 and
example 1/2 on p257/258 (they refer to the same table even though they are in
different sections). READING IN THE TEXT: 5.2 p238-243 assigned
previously 5.3 p250-252 (thru example
2) multiplication rule for independent events 5.4 p256-259 (thru example 3)
general multiplication rule (works for both independent and dependent events,
thus it is a general rule!). MORE EXAMPLES (to guide you
in your hmk): 5.2 p247 5. E and F
share {5, 6, 7} so they are not mutually exclusive 7. S has 12
members and (F or G) = {5, 6, 7, 8, 9, 10, 11, 12} so P(F or G) = 8/12 = 2/3 P(F or G) = P(F) +
P(G) – P(Fand G) = 5/12 +4/12 – 1/12 = 8/12 or 2/3 9. E and G do
not share any numbers, so they are mutually exclusive 13. P(E or F) =
P(E) +P(F) – P(E and F) = 0.25 +0.45 – 0.15 = 0.55 15. P(E or F) =
P(E) + P(F) = 0.25 + 0.45 = 0.70 19. P(E or F) =
P(E) +P(F) – P(E and F) so 0.85 = 0.60 +P(F) – 0.05 and solving
for P(F), we
get 0.85 – 0.55 or 0.30. 31. a. P(heart
or club) = P(heart) + P(club) = 13/52 + 13/52 = 26/52 = 0.50 b. P(heart or club
or diamond) = P(heart) + P(club) +P(diamond)
= 13/52 + 13/52 +13/52 = 39/52 = 0.75 c. P(heart or ace)
= P(heart) + P(ace) – P(heart and ace) = 13/52 + 4/52 – 1/52 =
16/52 43. a. P(satisfied) =
231/375 b. P(junior) = 94/375 c. P(satisfied and junior) =
64/375 from the intersection of the row and column in the table. d. P(satisfied or junior) =
P(satisfied) + P(junior) – P(satisfied and junior) = 231/375
+ 94/375 – 64/375 = 261/375 5.4 p262 3. P(E and F) = P(E)*P(F
given E) so 0.6 = (0.8)(P(F given E)) so P(F given E) = 0.6/0.8 = 0.75 13. use mult rule but now
in word problem form! P(cloudy and rainy) = P(cloudy)*P(rainy
given cloudy) 0.21 = (0.37)(P(rainy given cloudy)) so
P(rainy given cloudy) = 0.21/0.37 = 0.57 15. P(16/17 and white) =
P(16/17)*P(white given 16/17) 0.062 = (0.084)( P(white given 16/17))
so P(white given 16/17) = 0.062/0.084 = 0.74 17. a. P(no given <18) =
8661/78676 = 0.11 b. P(<18 given no) = 8661/46993 =
0.18 Homework due Thursday 10/06: 5.2 p245 #8, 14, 20, 32, 44 5.4 p262 #4, 8, 14, 16, 18 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 9/29 |
LECTURE: Last time and today before
the test, we talked about the beginnings of probability in Ch. 5 with the
classical and empirical methods of computing probabilities in 5.1. We looked
at finding probabilities from tables from 5.2, but today, I did not have
enough time to explain the addition rule properly. I only showed you the
addition rule for disjoint events, which works for some tables, but for the
table I used in class, I needed to show you the general addition rule which
takes into account Òdouble-countingÓ in each set being added. So I am
omitting the part requiring the general rule from the last assigned problem below
and will pick up with it on Tuesday. READING IN THE TEXT: 5.1 p223-227 up to but not
including example 5. For next time: 5.2 p238-243
to the end of example 4. EXAMPLES (to guide you in
your hmk): 5.1 p233 (using the set-up for probabilities on
p227) 13. cannot have
a negative probability 31. P(sports) =
288/500 = 0.576 33. a. P(red) =
40/100 = 0.40 b. P(purple) =
25/100 = 0.25 39.
118+249+345+716+3093 = 4521 never 125/4521 = 0.026, rarely 249/ 4521=0.068,
sometimes 345/4521 =0.116, most 716/4521 =0.263, always 3093/4521=0.527 49. a. P(right)
= 24/73 = about 0.33 b. P(left) = 2/73 =
about 0.03 c. yes,
only 3% of the time 5.2 p247 25. using the
addition rule for disjoint events (add the probabilities): a. they all add to
1 b. gun or knife =
0.671 +0.126 = 0.797 c. 0.126 + 0.044 +
0.010 = 0.180 d. 1 – 0.671
= 0.329 e. yes, they only
happen 1% of the time 43. a. P(satisfied) =
231/375 b. P(junior) = 94/375 c. P(satisfied and junior) = 64/375
from the intersection of the row and column in the table. Part d requires the general
addition rule, which we will talk about on Tuesday: d. P(satisfied or junior) =
P(satisfied) + P(junior) – P(satisfied and junior) = 231/375 +
94/375 – 64/375 = 261/375 Homework due Tuesday 10/04: Do 5.1 p233
#12, 14, 32, 34, 40, 52 Do 5.2 p247
#26, 42abc (skip part d which would require the general addition rule). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 9/27 |
As promised, leftover
quizzes that were not picked up in class last week and the current quiz from
today are all in a folder outside of my office BH 269. Pick them up if they
will help you study. Please clip them back after you take yours. I removed item #11
from the first version of the test format since we need to spend more quality
time with it before testing (but we started to go over that material today).
Revised format follows: TEST #2 FORMAT Formulas to be provided: 3.2 p133-136 formula for std. deviations of samples, 3.2 p139 the numbers 68/95/99.7 from the empirical rule, ChebyshevÕs inequality 3.4 p155 z scores for populations 3.4 p160 lower and upper fences Questions: 1. short answer questions about means of samples and populations, medians, modes, and distribution shape, resistance to skewing, meaning of stadardization, and measures to report, (see text 3.1 p118/119 ex1, p122 definition and table 4, p129 #24,42, 3.2 p142 #8 and 3.4 p155 definition, p159 summary table, p163 #30). 2. short answer like 3.1 p126 #18 match pictures of distributions with table data. 3. like 3.1 p125 #16 find the mean median and mode. 4. like 3.2 p142 #11-16,20 where you are given a set of data and asked to find the mean and std. deviation of a sample (using deviation formulas, not computational formula). 5. like 3.2 p144/145 #35-38 plus supplemental parts, using the empirical rule to find areas. 6. like 3.2 p145 #39-40 use of ChebyshevÕs inequality. 7. short answer comparison of areas to left/right of z values (already standardized) to tell which is bigger. 8. like 3.4 p161 #9-14 compare z scores and relative placement. 9. like 3.4 p162 #22 and 3.5 p170 #12 to find quartiles, IQR, fences, outliers and make a box plot. 10. like 5.1 p233/234
#17, 18 (part of todayÕs material). HOMEWORK is to study
for the test. LECTURE: One problem from the
hmk that we did not go over in class: Ch 3 review p173 #2 a. sample mean: add
up the table values 91610 then divide by 9 to get 10178.89 median after putting
data in order = 9980 (4 larger values and 4 smaller values) b. range = 14050
– 5500 = 8550 sample std. deviation
from the computing formula: sum of x = 91610 and
sum of squared x = 1008129252 Sxx = 1008129252
– (91610squared / 9) = 1008129252 – 932488011.11 = 75641240.89 s = squareroot of
(75641240.89 / (9 – 1)) = 3074.92 Q1 = avg of 7200 and
7889 = 7544.50 Q3 = avg of 12999 and
13999 = 13499 IQR = Q3 – Q1 =
13499 – 7544.5 = 5954.50 c. new table sum is
91610 + 27000 = 118610 so new mean is 13178.89 but median same range = 41050 –
5500 = 35550 sample std. deviation
from the computing formula: sum of x = 118610 and
sum of xsquared = 2495829252 Sxx = 2495829252
– (118610squared / 9) = 932681240.90 s = squareroot of
(932681240.90 / (9 – 1)) = 10797.46 IQR is same Conclusions: the
mean, std. deviation and range changed fairly dramatically, but the median
and the IQR stayed the same, meaning that they are ÒresistantÓ to outliers. We also started
looking at taking probabilities from tables of values. We will look again at
this before your test and I will put some notes and hmk up after the test, so
please check back. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 9/22 |
ANSWERS to previous hmk
problems for your reference: 3.2 p145 #38 a. 68% between 4 –
0.007 = 3.993 and 4 + 0.007 = 4.007 b. these values are 2 std.
deviations from the mean, so 95% c. the area outside of the
area in b would be 1 – 0.95 = 0.05 or 5% d. the area between 4 and
4.007 is 0.68/3 = 0.34 and the area between 4 and 4.021 is 0.997/2 = 0.4985,
so 0.4985 – 0.34 = 0.1585 or 15.85% extra parts e. the % > 4.007 is 0.50
– 0.34 = 0.16 or 16% f. the % greater than 3.993
is 0.50 + 0.34 = 0.84 or 84% 3.2 p145 #40 a. k = 2 so (1 –
1/2squared)100% = (1 – 0.25)100% = 75% b. k = 1.5 so (1 –
1/(1.5squared))100% = 55.56% and 27.3 – 1.5(8.1) =
15.15 and 27.3 + 1.5(8.1)
= 39.45 c. 27.3 – 3(8.1) = 3
and 27.3 + 3(8.1) = 51.6 so (1 – 1/3squared)100% = (1 – 8/9)100%
= 88.89% ANSWERS to the last
in-class exercise from T 9/20: 4. Dr. JinÕs patient z =
(190 – 200)/10.5 = -0.95 and Dr. BrownÕs patient
z = (169 – 177)/10.9 =
-0.73 and all of the patients in
both of these distributions are high BP patients, so it is better to be as
low in the curve as possible: Dr. JinÕs patient is better off relative to his
group. LECTURE: We finished looking
at material in section 3.4 comparing z scores and interquartile range and 3.5
boxplots. ANSWER to the box plot
example in class: Data 3 5 15 32 34 36 40 42
43 45 48 52 67 75 a. Q2 = (40+42)/2 = 41 Q1 = 32 Q3 = 48 IQR = Q3 – Q1 = 48
– 32 = 16 Left fence = 32 –
1.5(16) = 32 – 24 = 8 (so
outliers on the left are 3 and 5) Right fence = 48 + 1.5(16)
= 48 + 24 = 72 (so outlier on the right is 75) Number line below box plot
shows min, Q1, M, Q3, max:
Data appears to be skewed
left. MORE EXAMPLES (to guide you
in your hmk!): 3.4 p162 #21 The mean of this sample data is 3.99 and std. dev.
is 1.78. Notice they put the data in order by columns, so you do not need to
list it again! a. z = (0.97- 3.99)/178 =
–1.70 b. Q2 = (3.97+4)/2 =
3.99 Q1 =
(2.47+2.78)/2 = 2.63
Q3 = (5.22+5.50)/2 = 5.36 IQR = Q3 – Q1 = 5.36
– 2.63 = 2.73 Left fence = 2.63 –
1.5(2.73) = –1.47 Right fence = 5.36 +
1.5(2.73) = 9.46 so no outliers (data values
outside the ÒfenceÓ). 3.5 p170 #11 The data in order are: 0.598, 0.600, 0.600, 0.601,
0.602, 0.603, 0.605, 0.605, 0.605, 0.606, 0.607, 0.607, 0.608, 0.608, 0.608,
0.608, 0.608, 0.609, 0.610, 0.610, 0.610, 0.610, 0.611, 0.611, 0.612. a. Q2 = 0.608 Q1 = (603+605)/2 =
604 Q3 =
(610+610)/2 = 610 IQR = Q3 – Q1 = 610
– 604 = 6 Left fence = 604 –
1.5(6) = 595 (so no outliers on
the left) Right fence = 610 + 1.5(6)
= 619 (no outliers on the right either) Number line below box plot
shows min, Q1, M, Q3, max:
Data appears to be skewed
left. READING IN THE TEXT: 3.4 p157-160 percentiles,
quartiles, and outliers discussion. 3.5 p164-167 read about how
to construct a box plot from the quartiles in section 3.4 (see blue box on
p165) and how the box plot gives a nice visual of data that is easier to
construct than a histogram (see pictures on p167). Homework due Tuesday 09/27: 1. 3.4 p162 #20, 2. 3.4 p162 #22 (given that mean of sample is 10.08 and std. dev. of
sample is 1.89) 3. 3.5 p169 #6, 4. 3.5 p170 #12 (data in order 1.01, 1.34, 1.40, 1.44, 1.47, 1.53,
1.61, 1.64, 1.67, 2.07, 2.08, 2.09, 2.12, 2.21, 2.34, 2.38, 2.39, 2.64, 2.67,
2.68, 2.87, 3.44, 3.65, 3.86, 5.22, 6.81) 5. Ch3 Review p173 #2 TEST #2 FORMAT Formulas to be provided: 3.2 p133-136 formula for std. deviations of samples, 3.2 p139 the numbers 68/95/99.7 from the empirical rule, ChebyshevÕs inequality 3.4 p155 z scores for populations 3.4 p160 lower and upper fences Questions: 1. Short answer questions about means of samples and populations, medians, modes, and distribution shape, resistance to skewing, meaning of stadardization, and measures to report, (see text 3.1 p118/119 ex1, p122 definition and table 4, p129 #24,42, 3.2 p142 #8 and 3.4 p155 definition, p159 summary table, p163 #30). 2. short answer like 3.1 p126 #18 match pictures of distributions with table data. 3. like 3.1 p125 #16 find the mean median and mode. 4. like 3.2 p142 #11-16,20 where you are given a set of data and asked to find the mean and std. deviation of a sample (using deviation formulas, not computational formula). 5. like 3.2 p144/145 #35-38 plus supplemental parts, using the empirical rule to find areas. 6. like 3.2 p145 #39-40 use of ChebyshevÕs inequality. 7. short answer comparison of areas to left/right of z values (already standardized) to tell which is bigger. 8. like 3.4 p161 #9-14 compare z scores and relative placement. 9. like 3.4 p162 #22 and 3.5 p170 #12 to find quartiles, IQR, fences, outliers and make a box plot. 10. like 5.1 p233/234 #18, 34 (part of TuesdayÕs material). 11. like 5.2 p248
#42abc, 44abc to find probabilities from a table of data (part of TuesdayÕs
material). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 9/20 |
LECTURE: We started looking at how we
can put the mean and standard deviation to work now that we know how to
calculate them. Overall, we want to be able to make any normal distribution
have the same center (mean) of 0 and same spread (std. deviation) of 1. Once
distributions are on the same scale, you can compare them. READING IN THE TEXT: 3.4 p155 (thru example 1)
z-score definition and formula (note that z values measure the number of std.
deviations that an x data value lies from the mean). We also talked about making
area comparisons without being able to look up the areas on a table yet. The
material from 7.1 p330-332 approximates this discussion. Look at how z
calculations are found, given the mean and std. deviation, then look at
figure 10 to see a picture of how the original x data is transformed to z
data that is centered at 0 and has std. deviation of 1. Once areas for
different distributions are on the same scale, you can compare them (i.e., is
one area contained within the other on a shared graph? The area to the left
of z = 2 is larger that the area to the left of z = 1). Homework due Thursday 09/22: (I am putting the points each
problem will be worth to emphasize that you should not just skip the last
problem – there is a lot to do in it!) 1. (2pts)Which is larger, the area associated with values
less than 55 for a distribution with a mean of 80 and std. deviation of 10,
or the area associated with values less than 50 for a distribution with a
mean of 80 and a std. deviation of 15? 2. (2pts) 3.4 p161 #12 3. (2pts) 3.4 p161 #14 4. (4pts) 3.4 p163 #30 (see table of data from 3.1 p127 #24
and 3.2 p143 #26) a. Given the info from those previous problems that the population mean
of the data in the table is 26.4 and the population std. deviation of the
data in the table is 12.8, find the z-scores for each data point in the table
(you should end up with 9 z scores). b. Find the mean of the 9 z-scores from part a. (sum the + and –
values as they are). c. Find the std. deviation of the 9 z-scores from part a. using the
computational formula for population std. deviation as in table 10 on p134
(remember to divide by N, not n-1). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 9/15 |
In-class exercise on
Empirical Rule p139: For a normal distribution
with mean 75 and std. deviation 3, a. what are the values that
mark 1, 2, and 3 std. deviations on either side of the mean? 1 std. deviation: 75 – 3 = 72 and 75+3 = 78 2 std. deviations: 75 – 2(3) = 69 and 75 +2(3)
= 81 3 std. deviations: 75 – 3(3) = 66 and 75 +
3(3) = 84 b. what is the area under
the curve for values between 72 and 81? between 72 and 75 lies half of the 0.68 for one std.
deviation, or 0.34 and between 75 and 81 lies half of the 0.95 for two std.
deviations, or 0.475, so the area between 72 and 81 is 0.34 + 0.475 =
0.815 (81.5%) c. what is the area under
the curve for values between 69 and 72? between 69 and 75 lies half of the 0.95 for two std.
deviations, or 0.475 and between 72 and 75 lies half of the 0.68 for one std.
deviation, or 0.34, and the area between 69 and 75 contains the area between
72 and 75, so we must subtract the smaller area from the larger
area to find the area in between: 0.475 – 0.34 = 0.135 MORE EXAMPLES: Suppose a distribution has
a mean of 120 and a std. deviation of 25: a. what values are 1 std.
deviation from the mean (z = -1 and z = 1)? Answer: 120 – 25 = 95
and 120 + 25 = 145 b. what values are 2 std.
deviation from the mean (z = -2 and z = 2)? Answer: 120 – 2(25) =
70 and 120 + 2(25) = 170 c. what values are 3 std.
deviation from the mean (z = -3 and z = 3)? Answer: 120 – 3(25) =
45 and 120 + 3(25) = 195 d. what % of scores is
greater than 170? Answer: the area between
the mean and z = 2 is 0.95/2 = 0.475, so the area under the curve to the
right of x=170 (or z = 2) is 0.50 – 0.475 = 0.025 e. what % of scores is less
than 170? Answer: from above, 1
– 0.025 = .975 f. what % of scores is
greater than 70? Answer: the area between z
= -2 and the mean is 0.95/2 = 0.475, so the area under the curve to the right
of x=70 (or z = 2) is 0.475 + 0.50 = 0.975 g. what % of scores is
between 70 and 145? Answer: the area between
x=70 (or z= -2) and the mean is 0.95/2 = 0.475, and the area between the mean
and x=145 (or z=1) is 0.68/2 = 0.34, so the area under the curve between 70
and 145 is 0.475 + 0.34 = 0.815 h. what % of scores is
between 170 and 195? Answer: the area between
x=170 (or z= 2) and the mean is 0.95/2 = 0.475, and the area between the mean
and x=195 (or z=3) is 0.997/2 = 0.4985, so the area under the curve between
170 and 195 is 0.4985 – 0.475 = 0.0235 READING IN THE TEXT: From today, read 3.2
p139-141 (including example 8) read about the Empirical Rule and ChebyshevÕs
inequality. If you want to read ahead
for next week, we will be skipping 3.3, but will cover 3.4 p155-156 z-score
comparisons 3.4 p157-160 quartiles,
fences, and outliers 3.5 p163-167 box plots and move on to Ch. 5 if we
have the time. Homework due Tuesday 09/20 (I am adding extra questions to 36
and 38 like the examples I included above to give more practice with finding
areas): 3.2 p145 #36 do parts a, b, c, and also do the following: d. what % of scores is less than 743? e. what % of scores is between 287 and 857? f. what
% of scores is between 173 and 743? #38 do parts a, b, c, d, and also do the following: e. what % of bolts is greater than 4.007? f. what
% of bolts is greater than 3.993? #40 (use the formula on p140 and try #39 for practice and check answers
in back of book!) |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 9/13 |
Participation note: In
class, I expect students need instruction and will give full attention to the
material. I see students texting above and below the table, and I donÕt like
to police such behavior, but I do make note of it and remind them of what I
have seen when they tell me they do not understand the material or when they
do poorly on tests. I also remember while I am grading. LAST HMK: As we went over hmk in
class, I did not have time to show you the whole frequency tally and
histogram. Here it is for your reference: A class size of 4 will
sufficiently spread out the data. You may choose something else!
A primitive histogram is
all I can do here (yours would have a scale on the x and y axes and would
have bars, not XÕs):
The mean = 104.1 and the
median = 104 Mean is good measure of
central tendency when the distribution is symmetric as this one is. IN-CLASS WORK: We worked on the concept of
ÒskewnessÓ, then moved to looking at the two most important numbers that
describe a distribution: measure of center: mean, and
measure of spread: standard
deviation. These numbers will be
extremely important later in Ch.7 and beyond for normal distributions. We will form the sampling
distribution (made up of all possible samples of a certain size that can be
taken from a population), and it will be normal regardless of the shape of
the population. Then we will use the mean and std. deviation to make the
distribution into a standard shape and read off areas under this standard
normal curve from a table. I mention this now because in the last homework on
3.1, they started asking you to think about means of different samples taken from
a population, and in this section 3.2, they are asking you to think about the
mean and standard deviation as important for a good reason that you donÕt
know yet, but that I want you to be anticipating! READING IN THE TEXT: Read p131-138 (skipping empirical
rule and ChebyshevÕs inequality for now). Be careful that the book
separates variance from standard deviation (standard deviation is the square
root of variance), whereas in class, we did not separate them. Also make note
that the formulas for populations and samples are a little different: --the symbols used for mean
and standard deviation: mean of a sample: mean of a population: standard deviation of a
sample: s standard deviation of a
population: --the standard deviation
calculations: for samples you divide by
the Òcorrection factorÓ n-1 for populations you divide
by the whole population size N We did a problem in class
much like the one in example 4 p136, where we computed standard deviation of
a sample in two ways: once using individual deviations and once using the
computational formula (which employs the Sxx we used for finding lines of
best fit). You see that you get the same answer. In example 3 p134, they are
doing the same thing, but to a population instead of a sample, so they do not
subtract 1 from the size. Homework due Thursday 09/15: Most of the problems in
this section have data sets too large for easy hand-calculations, so we will
limit ourselves to the beginning problems with smaller data sets! (I am
asking for only one method for std. deviation in each problemÉdo both only if
you have the time and the desire!): 3.2 p142 #8 fill in the blanks with vocabulary from the reading #12 find s (sample std. dev.) using a deviations table as in table 11
p136 #16 find (population
std. dev.) using a deviations table as in table 9 p134 #20 find s (sample std. dev.) using the computational formula as in
table 12 p136 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 9/08 |
Congratulations on
completing 1/5 of the semester (it goes fast, doesnÕt it?)! Before starting the test, I
introduced the concept of ÒcenterÓ, or average, of a data set. The three
types of centers are the mean (quantitative), the median (quantitative), and
the mode (quantitative or qualitative). The mean is the usual arithmetic average
that you used when you were finding lines of best fit. The median is
the middle value of a set of data in ascending numerical order (for odd
numbers of data it is a data point, but for even numbers of data, find the
mean of the two data points nearest to the center). Some sets you are given
in the exercises are not in order to start with! The mode is the data value that occurs
most frequently. Please read about them and do some homework with regard to
them for Tuesday. We will learn about standard deviations from 3.2 with an
in-class exercise on Tuesday. READING IN THE TEXT: 3.1 p117-125. Some main issues: p118 – In example 1,
notice how the average from a sample of 4 students was 80, whereas the
average of the whole population of 10 students was 79. There are many
possible samples of size 4, and most will yield an average close to 79, but
few, if any, will be exactly 79. This is preparing you to think about the sampling
distribution in chapter 8. p119 – In the second
activity box, look at the mean as the Òcenter of gravityÓ. This concept will
be used in section 3.2 to motivate standard deviation. p120 – Before you
start finding the median, make sure you check step 1: the data must be in
ascending order of smallest to greatest. p122 – Look at figure
7 and read about how the median is resistant to outliers that ÒskewÓ the
data. The info in this figure will be asked about on Test #2. Homework due Tuesday 09/13: 3.1 p125 #16, 18, 24, 30,
42 Hints: #18, refer to figure 7
p122, #24b, choose 3 different
groups of 4 people at a time and find a mean for each sample of 4, where the
samples can share people but must have at least one person different from
another sample—donÕt worry too much about how to choose at random: just
pick some! |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 9/06 |
Reminders: Today is the last day to
drop the course. Withdrawals after today go on your record. Please make sure you have
read the course syllabus at http://www.smccd.edu/accounts/callahanp/syl160F11.html (some questions being asked
are answered there and should not be asked!). Homework is to study for the test (no problems to turn in). Test #1 will occur as
scheduled on Thursday 09/08 and will cover the material from lecture, class
exercises, homework, and quizzes from our first 5 class sessions. You will be provided with
formulas for slope, point-slope, slope-intercept, general form for the
exponential, and the set of formulas for lines of best fit, e.g., Sxx, Syy,
etc. Paper will be provided; you need only bring a standard scientific
calculator and something to write with. You will not be allowed to use
calculator cell phones, PDAs, or other transmission-capable devices. You may
not share one calculator amongst more than one test taker. If you forget your
calculator, see if someone can lend you an extra or let you borrow theirs
after they turn in their test. Below each problem type is a reference to where
in the book you can find more problems to practice. Be prepared to perform
the following: 1. Given a set of linear
scatterplot data, plot the points on a given graph, draw an estimate of the
line of best fit, and estimate the equation of the line using two points. (See 4.2 p195-196) I will give you the
Òsummation tableÓ for x, y, xy, and x squared for the scatterplot data
already filled out and you will use the equations for Sxx, Syy, etc. to find
the actual line of best fit. (We simplified the calculations
in class with the alternate book formulas. See left sidebar on p198 for
alternate formula and p206 #26 for a manageable set of data on which to use
the formulas for lines) 2. Given a set of
exponential scatterplot data, turn it into a set of linear scatterplot data
using logs, graph the linear data, draw a best-fit estimate and be ready to
estimate the equation of the best line you drew. One or two follow-up
questions may ask you to plug in a given x or y value to your equation and
solve for the other. (Supplemental material from
class: see notes below from 1st week) 3. Given the line of best
fit for some ÒloggedÓ data (x, logy), find the exponential of best fit for
the original data (x, y) by ÒunloggingÓ the slope and y-intercept. (Supplemental material from
class: see notes below from 1st week) 4. Some short-answer
questions on chapter 1.1 and 1.5 reading and definitions, including
statistics, samples and populations, qualitative and quantitative variables,
discrete and continuous variables, and bias in sampling. (See 1.1 p12 #21-36, 45-48
and 1.5 p43 #13-24) 5. Given a set of data,
find/show various parts of the following: relative frequencies, frequency and
relative frequency bar graphs (may include side-by-side comparison of two
sets of data), and why one representation is better than another. (See 2.1 p72 #17-29) 6. Given a small set of
data, construct classes of a given width and starting place and form the
resulting frequency distribution. Be prepared to provide a histogram if asked
for. (See 2.2 p95 #37-40) 7. Answer short questions
regarding use of areas in graphics, and how vertical scaling affects
perception. (See 2.3 p106 #1-7, 10-12)
(skip time-series plots). TodayÕs material from 2.3
and more examples: Today we looked at section
2.3 (graphical misrepresentations of data) to conclude the material for the
test. You can read about this in 2.3 p100-106. Among other considerations,
look for vertical scale manipulation, graphics that are unclear or sized
incorrectly, incorrect use of areas (which are difficult to perceive as they
change in both length and width). We looked at 2.3 p106 #2, 4, 10 (some more
are mentioned below also): #1 The heights of the
podiums (podia?) do not correspond to the numbers they are supposed to
represent. #2 The heights of the
podiums do not look proportional with the numbers, are the graphics supposed
to be included in the perceived heights of the ÒbarsÓ?, the graphics distract
one from focusing on the numbers, what is the y-scale?, why doesnÕt the beer
have a bar?, the width of the burger gives an unfair visual comparison with
the slenderness of the beer bottle, etc. It is better to let one dimensional
equal-width bars show the comparison and avoid graphics and changing volumes
of pictures. #3 Watch out for the
vertical scale not starting at 0! A graph with 0 included would show a much
less dramatic decrease in the numbers. #4 Same complaint as #3,
causing it to look like there is a much more dramatic increase from one
category to another than there actually is. The figure for 25-35 looks like
it is less than half that for the 35-45 year olds, but comparing the %
values, 10% to 13%, they are not quite as different as they look! #7 Same complaint as #3 and
#4. By starting the scale at 0.1 instead of 0, the 25-35 bar looks to be more
than three times the size of the 45-55 bar. Redraw it starting at 0 and see
what it should really look like! #10 Comparing the numbers,
696.3 is about 93 times larger than 7.5, but volume-wise, the bigger barrel
can only fit maybe 5 or 6 of the smaller barrels put into it. Graphics
involving area change both in width and height (unlike bars that just change
in height) and are very deceptive! #11 The bars do not seem to
obey a set scale. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 9/01 |
I placed 2 weeks of
material here to give you time to get access to the book, but as I advised
you, we have reached the point that I must assume you made timely plans to
gain access to the book. If you do not have the book yet and the bookstore is
on order, you should buy an ebook to have instant access. If you are waiting
for an order to come in, you should have made plans to get the assignments
from someone in class. The Pearson site shows the
book and its features at: http://www.mypearsonstore.com/bookstore/product.asp?isbn=0321641876 and the Coursesmart site
shows where to purchase the ebook: http://www.coursesmart.com/0321644832/?a=1773944 We spent time after the
quiz talking about the vocabulary of Ch.1 in 1.1 and 1.5. Now that you have
the book, you should go back and read these sections to fill in the ideas.
Sometime soon, you should also read the pages in 1.2, 1.3, and 1.4 that I
suggested last time. We may not do concrete problems there, but the material
is important to browse for your general knowledge of statistics. TodayÕs in-class exercise
was intended to show you the similarities and differences between graphical
representations of qualitative and quantitative data along with how to choose
classes. In 2.1, qualitative
data is grouped naturally by the categories that describe the data and is
shown using bar
graphs (where there is space between bars on the x line since this data
can be put in any order and does not suggest that one category picks up where
another left off). In 2.2, quantitative
continuous
data must be grouped into categories, so you choose a class width from a
beginning class limit to divide up the data into manageable and equally-sized
categories. The data is then placed into these classes and the number of data
in each is counted (frequency) and then displayed on a histogram (where the x line is a
number line so there is no space between the bars because the categories must
be ordered according to magnitude). We ran out of time for the
second half of the exercise about continuous data, but you can see the answers in the back of the book since the exercise was problem 2.2 p95 #39. Briefly, 1st part:
classes of width 10 give: 20-29 1 30-39 111111 40-49 1111111111 50-59 11111111111111 60-69 111111 70-79 111 pg 83 has histogram
examples, but you can also see the histogram by turning the tally marks above
90 degrees counterclockwise! 2nd part: classes of width 5 are:
20-24, 25-29, 30-34 etc. and if you draw a histogram
for this frequency distribution, it has the same general shape as the one
from part a above, but spreads the data out more and contains more peaks and
valleys that show more about the original data. As the book points out, there
is no one best way to divide up the data. You pick what you think shows the
spread of the data best. READING IN THE TEXT: 2.2 p78-83 2.2 p88-89 identifying the
shape of a distribution Homework due Tuesday 9/6: 2.2 p91-95 #2, 4, 6, 12,
14, 30, 34, 38 We will work on selected
other topics from sections 2.2, 2.3, and possibly 3.1 on Tuesday if you want
to do some looking ahead. Note that your first test will be on Thursday and
it will cover the material up to today and some of what we do on Tuesday. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 8/30 |
READING IN THE TEXT: By Thursday, I will assume
that everyone has obtained access to the text and can do some reading. Until
then, I am putting some definitions here from the text and will ask you to
try some exercises involving identifications with these definitions. We will
talk more about them as we go over the homework in class. The in-class
exercise today was about bar graphs (the material from 2.1) and one of the
homework problems at the bottom is from there. 1.1 p3-8, noting
definitions especially: A sample is a subset of the population
being studied, and we make inferences about the population based on that sample. Variables are the characteristics of the individuals within
the population. Qualitative, or categorical variables allow for classification
of individuals based on some attribute or characteristic. Quantitative
variables provide numerical measures of individuals, where arithmetic
operations can be performed and provide meaningful results. Examples: If we consider ÒgenderÓ as a variable, we would say
it is qualitative because the categories of male and female are not
numerical measures. If we consider ÒtemperatureÓ as a variable, we would say
it is quantitative because we can measure temperature numerically and
perform arithmetic operations on the values. But be careful! If we consider
Òzip codeÓ as a variable, it may look like it is quantitative because you see
numbers, but actually, it is qualitative since the numbers just helps
to categorize locations and you would not perform arithmetic operations on
them (such as adding two zip codes together, which would have no meaning). A discrete variable is a quantitative
variable that has either a finite number of possible values or a countable
number of possible values, where countable means that the values result from
counting, such as 0, 1, 2, etc. A continuous variable is a quantitative
variable that has an infinite number of possible values that result making
measurements. Examples: The number of cars that go through a fast-food line
is discrete because it results from counting, but the number of miles a car
can travel with a full tank of gas is continuous because the distance would
have to be measured. To read when you get your
book (no homework problems on these): 1.2 p16-17, about
observational and designed experiments 1.3 p23-26 up to ex. 3,
about simple random sampling 1.4 p30-35 about sampling
methods 1.5 p38-42 about bias in
sampling: sampling bias uses a technique that favors one part of a
population over another, undercoverage (flawed sampling
technique) causes a segment to not
be fully representative of the whole population, nonresponse of sample subjects to surveys causes error that may
or may not be minimized by callbacks and incentives. response bias can result from respondents not feeling comfortable
with interviewers, respondents misrepresenting facts or lying, and questions
that are leading in the way they are phrased (poorly worded questions). Example: A policeman asks students in a classroom to fill
out a survey involving whether they have used drugs and what kinds they have
used. Anonymous discussion of the results will follow. Response bias
could occur if the students feel uncomfortable giving this information to a
policeman, or if students misrepresent facts because they either donÕt want to
face their problems or want to appear cool to their friends. 2.1 p63-66 and example 6
p68-69 about frequency and relative frequency, bar graphs, and pie charts: A frequency distribution lists each
category of data and the number of occurrences for each category of data. The relative frequency is the percent of
observations within a category and is found by dividing the category
frequency by the sum of all the frequencies in the table. A bar graph categories labeled on the
horizontal axis and frequencies on the vertical axis, with bars extending
from the horizontal axis to the height that is the frequency and where bars
are usually not touching but are of the same width. A side-by-side bar graph can be used to
compare data sets and should use relative frequencies to ensure that the sets
are being measured on the same scale, where bars being compared from the same
category usually have no space between them but space is still left between
different categories. Homework due Thursday 9/1: 1.1 p12 #26 Consider Òassessed
value of a houseÓ as a variable. From the definitions and examples above,
would it be called qualitative or quantitative? #28 Consider Òstudent ID
numberÓ as a variable. From the definitions and examples above, would it be called
qualitative or quantitative? #32 Referring to the
definitions and examples above, is the quantitative variable Ònumber of
sequoia trees in a randomly selected acre of YosemiteÓ discrete or
continuous? #34 Referring to the
definitions and examples above, is the quantitative variable ÒInternet
connection speed in kilobytes per secondÓ discrete or continuous? #36 Referring to the
definitions and examples above, is the quantitative variable ÒAir pressure in
pounds per sq. inch in a tireÓ discrete or continuous? 1.5 p43 Consider the type of possible
bias for each of the following: #14 The village of Oak Lawn wishes to
conduct a study regarding the income level of all households within the
village. The manager selects 10 homes in the southwest corner of the village
and sends out an interviewer to the homes to determine income. #16 Suppose you are conducting a survey
regarding the sleeping habits of students. From a list of registered
students, you obtain a simple random sample of 150 students. One survey question
is Òhow much sleep do you get?Ó. #18 An ice cream chain is
considering opening a new store in OÕFallon. Before opening, the company
would like to know the percentage of homes there that regularly visit an ice
cream shop, so the researcher obtains a list of homes and randomly selects
150 to send questionnaires to. Of those mailed out, 4 are returned. 2.1 p72 #20 A survey asked 770
adults who used the internet were asked about how often they participated in
online auctions. The responses were as follows: frequently 54 occasionally 123 rarely 131 never 462 a. construct a relative
frequency distribution. b. what proportion never
participate? c. construct a frequency
bar graph. d. construct a relative
frequency bar graph. 2.1 p72 #22 A survey of U.S. adults
in 2003 and 2007 asked ÒWhich of the following describes how spam affects
your life on the Internet?Ó Feeling 2003 2007 Big problem 373 269 Just annoying 850 761 No problem 239 418 DonÕt know 15
45 a/b. Construct the relative
frequency distributions for 2003 and 2007. c. Construct a side-by-side
relative frequency bar graph. d. Compare each yearÕs
feelings and make some conjectures about the reasons for similarities and
differences. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th 08/25 |
Supplementary notes, followed by
homework: Finding the exponential equation of
best fit for a scatterplot of seemingly exponential data: This exponential part is in the more expensive version of this
text, but should not be skipped, so I will supplement your version with notes
in class and here. If you have a scatterplot of linear data, you saw in class last time that it was relatively easy and accurate to estimate the line of best fit from a graph and also find the best fit line using the equations from last time. However, if you have a scatterplot of data that is best described by an exponential curve, it is difficult to draw a good curve and you wouldnÕt know how to find its equation because it does not have a constant slope (i.e, you could not take two points and use the slope formula or point-slope form!).
But if you take the logarithm of each y value in the exponential data (leaving the x values the same), that is, turn (x, y) into (x, logy) in the table, you will have transformed it into linear data! From here, you could use the equations for line of best fit to find y = mx+b for the ÒloggedÓ data (x, logy). Then you can ÒunlogÓ the slope m and y-intercept b to find the ÒaÓ and ÒbÓ in y= b(a)x for the original data.
We did an example of this process in class. Here is another example, but with data that is not perfectly exponential as it was in class: The following below set of data is best represented with an exponential relationship y= b(a)x but since it is not perfectly exponential, we cannot write the equation from the table values.
It is difficult to estimate an exponential scatterplot relationship, but it can be turned into a linear relationship by taking the logarithm of the y values (graph it if you donÕt believe it –unfortunately, I cannot show the graphing part here!).
Now we can find the best fit line for this (x, logy) linear scatterplot by making a summations table with and use the standard deviation calculations (Sxx, etc.) for finding the best fit line.
=8/3=2.67 and =7.09/3=2.36 Sxx = 26-(64/3)=4.67 Sxy = 18.26-[(8)(7.09)/3]= -0.65 slope = -0.65/4.67 = -0.14 y-intercept = 2.36-(-0.14)( 2.67) =2.73 So the best fit line for the logged data is y=-0.14x+2.73
To find the best fit exponential for the original data, ÒunlogÓ the slope and y-intercept of the line above: raise 10 to the power of each separately and then write the equation for the exponential of best fit for the original table data (x, y). a=10slope =10-0.14 = 0.72 b=10y-intercept = 102.73= 537.03 So the best fit exponential for the original data is y= b(a)x = 537.03(0.72)x.
(Check your answer: does plugging x=1 into your best fit exponential give you something close to the original table value of 398.11? It shouldnÕt be exact because the original data was not perfectly exponential, but it should be in the ballpark! Same for the other two points.). Notice that this is a decreasing
exponential relationship. For decreasing exponential relationships, a<1
and for increasing exponential relationships, a>1. For decreasing linear
relationships y=mx+b, m is negative and for increasing ones, m is positive. Linear
and exponential patterns (not in book): A linear relationship y=mx+b is built by repeated addition. We add positive numbers for an increasing line (positive slope) or negative numbers for a decreasing line (negative slope). An exponential relationship
y= b(a)x is built by repeated multiplication. The a in the exponential is the amount by which we
multiply each time (ÒaÓ contains the rate of increase or decrease since a=1+r
or 1-r, where r is the rate). As with lines, b is the y-coordinate of the
y-intercept, which is the point that has x = 0. Intercepts are included in
each of the following example tables. The following are some tables of data to illustrate what sets of linear and exponential data look like and how their equations are written:
is a decreasing linear set of data because you are adding -3 each time, so y=-3x+12. (Verify that the points in the table lie on this line by plugging the values in and checking them).
is a increasing linear set of data because you are adding +7 each time, so y=7x+20.
is a increasing exponential set of data because you are multiplying by 1.5 each time so y= 50(1.5)x.
is a decreasing exponential set of data because you are multiplying by 0.9 each time so y= 250(0.9)x. Brief practice solving for variables in linear and exponential problems from Algebra: Given the equation of a
line y = 5x + 7, If x = 4 is given, we can
solve for y: y = 5 (4) +7 = 27. If y = 9 is given, we can
solve for x. Since 9 = 5x + 7 subtract 7 from both sides to get 2 = 5x. Then
divide both sides by 5 to solve for x: 2/5 = x. Given the equation of an
exponential y= 12(5)x, If x = 3, then we can solve
for y: y = 12(5)3
= 12(125) = 1500. If y = 24, then we can
solve for x, but it involves logarithms to rescue x from being an exponent:
24 = 12(5)x. Divide both sides by 12 to get 2 = (5)x. If you take the logarithm of both sides of the equation, you get
log 2 = log (5)x. Properties of logs give you log 2 = x log (5). So
to solve for x, divide both sides by log 5 to get x = log 2 / log 5 , which
by calculator is about 0.43. Homework (due Tuesday 08/30): 1. Treat the following table data as forming a linear scatterplot:
a. Sketch the points on a hand-drawn graph (just on binder paper), draw what
you think is the line of best fit, and write the equation of your line. b. Fill out a summation table and find the line of best fit using the
equations Sxx, Sxy, etc. 2. Treat the following table data as forming an exponential scatterplot
(not perfectly exponential, but best described by an exponential function).
Refer to the example given above in the notes:
a. Take the original (x, y) values in the table and make a new table (x, log y). That is, find log 25, log 34.12, etc. b. Find the line of best fit for the values in the (x, logy) table
using Sxx and Sxy. Hint: you should get the following summations to plug in (find and check them for yourself): x= 10.7 y= 9.66 xy= 23.33 x2 = 32.49 (For y notice that you are not summing the original y values to get 619.35— you sum the logged y values to get the y summation of 9.66!): c. ÒUnlogÓ the slope m and the y intercept b from the best fit line for the (x, logy) data in part b to get the Òa and bÓ that are the components of the best fit exponential y= b(a)x for the original (x, y) data using a=10slope and b=10y-intercept .. Does this equation look like it describes the data well? Compare with a graph of the original data. d. Use the equation for the best fit exponential from part c to estimate the value of y when x is 2.5. e. Use the equation for the best fit exponential from part c to estimate the value of x when y is 300. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T 08/23 |
We worked on linear scatterplots. Some supplementary notes follow since I wonÕt assume you have the book until next week. Reading in the text: --Read 4.1 objective 1 ÒDraw and interpret scatter diagramsÓ --Read 4.2 example 1 and objective 1 Òfind the least squares regression lineÉÓ and note that I am using the alternate form of the equations in the footnotes. Supplementary notes on lines of best fit: Given several data points (x,y) you fill out the table below (the data points' coordinates are the x and y values. The symbol means add them ). For example, for the data points (2,7) and (4,8) we know that the slope of the line thru them is 1/2 = 0.5 so that y-7 = 0.5(x-2) so that y = 0.5x +6 is the equation of the line thru them, that is, a line with slope 0.5 and y-intercept 6. Now let us use the equations for the line of best fit: Set up a table with the following quantities and sum them up
(n=2 in this short ex. for the 2 data pts given) =6/2=3 and =15/2=7.5 After having done this with all of the given data points use all of these numbers to plug into the formulas for the line of best fit (which are below) Sxx = x2 - ((x)2/n) = 620 - (36/2) = 2 Sxy = xy - ((x)( y) / n) = 46 - [(6)(15)/2] = 46-45 = 1 Slope of best line = Sxy/ Sxx = 1/2 = 0.5 Y-Intercept of best line = - (slope)( ) = 7.5-(0.5)(3) =7.5 -1.5 = 6 The best fit line is then y=0.5x+6 (which matches the equation found at the beginning exactly, because 2 points make a line, not a scatterplot!
Another example: For the data pts (1,9), (2,8), (3,6), and (4,3):
Note that n=4 is the number of data points =10/4=2.5 =26/4=6.5 Using the formulas above, Sxx = 30-(100/4)=30-25=5 Sxy = 55-[(10)(26)/4]=55-65= -10 Slope = -10/5= -2 Y- intercept = 6.5-(-2)(2.5) =6.5+5 =11.5 The best fit line is then
y= -2x+11.5 |