ISED 160 NOTES
ABOUT HOMEWORK, ANNOUNCEMENTS, ETC.
Assigned
on: |
BE SURE TO
CLICK ON RELOAD/REFRESH ON YOUR COMPUTER
OR THE CURRENT ADDITIONS TO THE PAGE MAY NOT APPEAR! You may also not see current pages if your computer does not have an up-to-date browserÉ download a new version or use a library/lab computer. Scroll down
as new assignments are added to the old. New assignments are generally posted
by 2:00 pm of the lecture day unless otherwise noted. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Most current
work is listed first, followed by previous entries: |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
5/14 |
Review: z-tests use row z on the t-table: Classical
method for z tests: If you have a one-sided z test with alpha = 0.01, n = 35 and sample z = 2.45, Match the bottom row z (that says z to the left) with the 0.01 column to find critical z value of 2.326. Reject Ho. If it had been
a two-sided test, take half of the alpha (0.01/2 = 0.005) to put in each tail
and look at 0.005 column in row z to find 2.576. Accept Ho. P-value
method for z tests: If you again
have the one-sided z test with alpha = 0.01, n = 35 and sample z = 2.45, go
to the bottom row z of the t table and find that 2.326 < 2.45 < 2.576,
so 0.005 < p < 0.01. Since the p < alpha of 0.01, reject Ho. If it had been
a two-sided test, you would have 0.01 < p < 0.02. Then the p >
alpha, so accept Ho. t-tests use df = n-1 row on the t-table: Classical
method for t tests: If you have a
one-sided z test with alpha = 0.01, n = 35 and sample z = 2.45, go to row n-1
= 34 and find the critical value of 2.441. Reject Ho. If you have a
two-sided test instead, you would go to the same row, but to the 0.005 column
to find the critical value of 2.728. Accept Ho. P-value
method for t tests: If you have a
one-sided z test with alpha = 0.01, n = 35 and sample z = 2.45, you look for
2.45 in row 34, and can only find the two closest values to estimate it:
2.441 < 2.45 < 2.728, so 0.005 < p < 0.01, p< alpha. Reject
Ho. If the test had
been two-sided, double the values to find 0.01 < p < 0.02, so p >
alpha. Accept Ho. Test
#5 will occur as scheduled on Thursday 5/16 and will consist of
one page of short answer questions (from 10.1, 10.2, 10.3, 11.1 about use of
the z and t tables to accept/reject hypotheses with classical/p-methods, and
forming of hypotheses and sentence writing), and one page with 2 word
problems (z test, t test) to perform complete significance tests. Some
short-answer questions for practice: Example:
Find the critical value in a one-sided z test, n = 45, sample z = -2.59,
alpha 0.01? Answer:
2.326. Example:
a. What is the critical value for a two-sided t-test with n=33 sample t
– 3.02 and alpha of 0.01, , and b. do you reject or accept the null
hypothesis? Answer:
a.With row 32 and upper tail of 0.005 since half of alpha goes in each tail,
the critical values are + and – 2.738, so b. we would reject the null
hypothesis. Example:
Estimate the p value for a one-sided z-test, n = 23, sample z= 1.15 and alpha
= 0.10. Answer:
Since this is a z value problem, you go down to the bottom row of the table
and see that 1.036 < 1.15 < 1.282 so 0.10 < p < 0.15. Example:
In the previous example, would you accept or reject the null hypothesis? Answer:
p > alpha, so accept. Example:
Estimate for the p value for a two-sided t-test with n=31 and sample t= 3.75.
Answer:
In row 30, our t is off the table to the right, so we know that the p value
is smaller than twice the 0.0005 that is above the last table entry, i.e., p
< 0.001. Example:
In the previous example, would you accept or reject the null hypothesis? Answer:
the p value is rare, so we would reject the null hypothesis no matter what
alpha given. Example:
Estimate the p value for a two-sided t-test, n=24, sample t=-2.48, alpha is
0.02. Answer:
In row 23, it puts us between columns 0.01 and 0.02 which we must double
because it is two-sided, so 0.02 < p < 0.04. Accept the null hypothesis
since p > alpha. Example:
Write the hypotheses and sentence of conclusion only for the following
situation: The
average score on the SAT Math exam is 505. A test preparatory company claims
that the mean scores of students who take their course is higher than 505.
Suppose we reject the null hypothesis. Answer:
Ho
: M=505 Hi : M>505. The company
has evidence students who take their course will on average have a higher
score than the 505 of all students who take the SAT Math exam. Note: Test 5 will not be
a dropped grade. Students who take Test 5 as scheduled will be done with the
course (no comprehensive final). The final is offered only to give those
missing Test 5 a chance to avoid a zero score for that test. It will not be
offered as a device to raise grades from the rest of the semester. If you miss test
#5, you must email me before noon Friday 5/17 and respond to my follow-up
emails that day so that we will both know what to expect the following week
(finals week) and you will then perform a more difficult comprehensive test
during finals week to take the place of test 5 only. If you do not contact me
by noon Friday 5/17 or fail to show up for your new test appointment during
finals week, you will receive a zero score for test 5. Grades: I will post
the Test#5 grades on my website under the codes I gave you, hopefully within
a week of the test, but I will not post final grades at that site. I will
submit final grades to the school sometime before the deadline and you may
view yours then at the SF State Gateway (MySFSU). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
5/9 |
LECTURE/READING: (Read section 11.1) We took a look at 11.1 p522
Ex. 2. The matched pairs problems are still t tests as in 10.3, but allow you
to compare two sets of data by looking at differences between the data. For H0 assume there is no difference between the data: Ho:
m = 0. For H1 there is a difference of some kind, either m < 0,
m > 0, or m is not equal to 0 depending on the phrasing of the word
problem and the order in which the differences are found. Matched pairs word problem
example: An
agricultural field trial compares the yield of two varieties of tomatoes for
commercial use. The researchers divide in half each of 11 small plots of land
in different locations (half gets variety A and half gets variety B) and
compare the yields in pounds per plant at each location. The 11 differences
(variety A minus variety B) give an average of 0.54 and std. deviation of
0.83. Is there evidence at the 0.05 level of significance that variety A has
a higher yield than variety B? (Assume differences computed by A yield minus
B yield). Answer: Ho: m = 0 If we want to show that
plot A tomato yields are larger, H1
depends on how the differences are computed. If we subtract the yields in
order of plot A – plot B, then H1: m > 0 because larger numbers minus smaller
numbers will give a positive number (> 0). But if we subtract the
yields in order of plot B – plot A, then H1: m < 0 because smaller numbers minus larger
numbers will give a negative number (< 0). So in this
case, H1: m > 0 We compute
sample t=2.16. Using row n-1=10 of the t table with right tail area of 0.05
the critical value is 1.812 and the estimate for the p value is 0.025 < p
< 0.05 since 1.812 < 2.16 < 2.228. Either way, we reject the null
hypothesis. We have found evidence that variety A has a higher yield than
variety B. HOMEWORK
due Tuesday 05/14 (for each matched pairs t-test below, write hypotheses,
calculations, decisions and sentences of conclusion): 1. 11.1 p516 #14c only, given
n=6, the mean of the differences (blue-red) is 0.093, s = 0.17. 2. 11.1 p519 # 20 given
differences are computed by ÒThrify minus HertzÓ and n = 10, the mean of the
differences is -0.259 and s = 9.20. 3. The design of controls
and instruments affects how easily people can use them. A student project
investigated this effect by asking 25 right -handed students to turn a
clockwise screw handle (favorable to right-handers) with their right hands
and then turn a counterclockwise screw handle (favorable to left-handers)
again with their right hands. The times it took for each handle were measured
in seconds, and the 25 differences (clockwise minus counterclockwise) gave an
average of -13.32 seconds with std. deviation 22.94. Is there evidence that
right-handed people find the clockwise screw handle easier to use? Test at
the 0.01 level. Test #5 is a mandatory part of your grade that
will occur as scheduled on Thursday 5/16 and is the last graded activity of
the semester (if you miss it, you will be offered the opportunity to take a
final exam to replace it – this is the only instance in which the final
will be used). It will consist of one page of short answer questions (from
10.1, 10.2, 10.3, 11.1 about use of the t table for both z and t tests in accepting
and rejecting hypotheses with classical and p-value methods, and forming of
hypotheses and sentence writing), and one page with 2 word problems (z test,
t test) to perform a complete significance test. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
5/7 |
The
methods of 10.3 are the same as those from 10.2 with one exception. Since you
must rely completely on your sample for the std. deviation (instead of having
the population std. deviation from previous studies) you must use a different
row than the z row at the bottom of the table. To
use the new part of the t table, you take one less than the sample size,
df=n-1. EXAMPLES
using the t-table: 1. What is the critical value for a one-sided test with n=20 and
alpha =0.05? ANSWER: df=20-1=19 and that row
with the column of 0.05 gives a critical value of 1.729 2. What would the critical value be for the above situation if
it were two-sided? ANSWER: In the same row df=19,
you would look at the column with area 0.025, since half of the alpha of 0.05
goes into each tail, and this would give you a critical value of 2.093. 3. Find an estimate for the p value for a one-sided test with
n=8 and sample t value of 1.15. ANSWER:
df=8-1=7 and in that row, 1.119 < 1.15 < 1.415 so p value is between
0.10 and 0.15. 4. Find
an estimate for the p value in a one-sided test with n=33 and sample t=0.52. ANSWER:
df=33-1=32, so we look in that row on the new t table to find the next higher
and lower numbers with respect to 0.52. But since 0.52 < 0.682 the p value
then is greater than the area of 0.25 for the t value of 0.682. That is, p
> 0.25. 5. Find an estimate for the p value for a two-sided test with
n=25 and sample t value of 1.52. ANSWER:
df=25-1=24 and in that row, 1.318 < 1.52 < 1.711 so the right or left
tail area for the p value is between 0.10 and 0.05, but we have a two-sided
test so we double the areas to get the sum of the left and right tail areas:
0.10 < p < 0.20. 6. Perform a
complete t test: To find out if it seems reasonable that the local town
library is lending an average of 4.2 books per patron, a random sample of 13
people was taken and yielded an average of 4.75 with std. deviation of 1.65
books. Test at the 0.10 level. Answer: The alternate hypothesis is that m4.2,
alpha is 0.10, and we compute sample t=1.20. Classical method:
Using row n-1=12 of the t table with tail areas of 0.05, critical value is
1.782 p-value method:
estimate for p is 0.20<p<0.30 (double tail areas) since
1.083<1.20<1.356. Accept Ho. We
have not found any evidence that the library is lending a different avg.
number of books than 4.2 per person. HOMEWORK
DUE Thursday
5/9 10.3 p487 #6, 10, 12, 16, 24 (Skip a. For b use pop mean 7,
sample mean 7.01 and s = 0.0316) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
5/2 |
LECTURE
– We will need the following table for section 10.3, and since it
has info for the material from 10.2 included on it, it is the only table we
will use going forward: Note
that the top row and the bottom row have the numbers you were using in the
abbreviated table for looking up critical values for z tests.
For
right now, we will just look at those two rows. Use the table symmetrically
so that it works for negative z values with areas in the left tail. To
perform the p value method, the best we can do is to make estimates. EXAMPLES
using the new table: 1. What is the critical value for a one-sided z test with alpha
=0.05? ANSWER: 1.645 2. What would the critical value be for the above situation if
it were two-sided? ANSWER: With tail area 0.025,
since half of the alpha of 0.05 goes into each tail, this would give you
critical values of + or - 1.960. 3. Find an estimate for the p value for a one-sided z test with
sample z value of 1.15. ANSWER:
In row z, 1.036 < 1.15 < 1.282 so p value is between 0.10 and 0.15. 4. Find
an estimate for the p value in a one-sided z test with sample z = 0.52. ANSWER:
in row z, since 0.52 < 0.674 the p value then p > 0.25. If it had been
a two-sided test, P>2(0.25), p>0.50. In either case, those values are
common, so one would accept Ho! 5. Find an estimate for the p value for a two-sided test with
sample z value of 1.52. ANSWER:
1.282 < 1.52 < 1.645 so the right or left tail area for the p value is
between 0.05 and 0.10, but we have a two-sided test so we double the areas to
get the sum of the left and right tail areas: 0.10 < p < 0.20. Homework
due Tuesday 5/07: 1. a. Find a p value estimate using the new table for a 1-sided
z test with sample z =2.45. b. If
alpha (level of significance) is 0.02, would you accept or reject Ho? 2. a. Find a p value estimate using the new table for a 1-sided
z test with sample z =3.50. b.
Without need of an alpha, would you accept or reject Ho? 3. a. Find a p value estimate using the new table for a 2-sided
z test with sample z =1.83. b. If
alpha (level of significance) is 0.05, would you accept or reject Ho? 4. a. Find a p value estimate using the new table for a 2-sided
z test with sample z =0.46. b. Without
need of an alpha, would you accept or reject Ho? (For each of the following word problems, perform a complete
significance test) 5. section 10.2 p477 #22 (skip part a) 6. section 10.2 p478 #26 (skip part a, use 6.3 for the sample
mean) 7. section 10.2 p478 #28 (skip part a) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
4/30 |
NOTES
FOR SECTION 10.1: How to state the hypotheses
from word problems and write the sentence of conclusion. HYPOTHESES: H0 is what is accepted as true for the population
mean until evidence to the contrary is found. H1 is what the investigator or researcher is trying
to show. CONCLUSION: You must state what you have found from the sampleÕs evidence,
or lack thereof. Write a grammatically complete sentence with the following
elements: Tell 1. if you have Òfound evidenceÓ or Ònot found evidenceÓ against
the null hypothesis, 2. about what (what was the subject of the investigation?), 3. with respect to what number (what was the number in question
in the hypotheses?). If you are rejecting H0 you have found evidence
against H0 and therefore evidence for H1. If you are accepting H0, you have not found evidence
against H0 and therefore have not found evidence to back up your
claim H1. Write sentences from the H1 standpoint. SECTION
10.2 Word problems: Do what was in 10.1, but throw the LEVEL of significance,
CALCULATION of sample z, and the DECISION to accept/reject Ho into the
middle! 10.1 Example: A
Muni bus drives a prescribed route and the supervisor wants to know whether
the average run arrival time for buses on this route is about every 28
minutes. Suppose that after we calculate the sample z value the data causes
the supervisor to accept H0.
Write the hypotheses and the sentence of conclusion. Answer: H0: M = 28 H1 : M is not equal to 28 The
supervisor has found no evidence that the average run arrival time for buses
on this route is significantly different from 28 minutes. 10.1 Example: A
manufacturer produces a paint which takes 20 minutes to dry. He wants make
changes in the composition to get nicer colors, but not if it increases the
drying time needed. Suppose that after he calculates the sample z value the
data causes him to reject H0.
Write the hypotheses and the sentence of conclusion. Answer: H0 : M = 20 H1 : M > 20 The
manufacturer has found evidence that the composition change significantly
increases the drying time, so he will not make a change. (Notice that he is
using the test to pull him away from a bad decision). 10.2 Example
(whole test): According
to the Highway Administration, the mean number of miles driven annually in
1990 was 10,300. Bob believes that people are driving more today than in 1990
and obtains a simple random sample of 20 people and finds an average of
12,342 miles. Assuming a std. deviation of 3500 miles, test BobÕs claim at
the 0.01 level of significance. Hypotheses: Ho:
population
mean = 10300 Hi:
population
mean > 10300 Level of Significance: alpha = 0.01 Data and calculations: Z=(12342-10300)/(3500/sqroot20)=2.61 Decision: Classical:
alpha of 0.01 gives a critical z value of 2.326 so reject Ho. P-value:
on the z table, 2.61 gives p = 0.0045 so p < alpha. Reject Ho. Conclusion: Bob has found significant
evidence that people are driving more today than in 1990, when they drove an
average of 10,300 miles. 10.2 Example
(whole test): Before
shopping for a used Corvette, Grant wants to determine what he should expect
to pay. The blue book average is $37,500. Grant thinks the price is different
in his area, so he visits 15 area dealers online and finds and average price
of $38,246.90. Assuming a population std. deviation of $4100, test his claim
at the 0.10 level of significance. Hypotheses: Ho: population mean = 37500 Hi:
population
mean not equal to 37500 Level of Significance: alpha
= 0.10 Data and calculations: z
= (38246.90 - 37500)/(4100/sqroot15) = 0.71 Decision: Classical:
For 0.05 of alpha going in each tail, we find critical z values of +/-1.645.
Accept Ho. P-value:
For z of 0.71 on the z table p = 2(0.2389) = 0.4778. Since p > alpha,
accept Ho. Conclusion: Grant
does not have any evidence that the mean price of a 3 yr. old Corvette is
different from $37,500 in his neighborhood. HOMEWORK (due
Thursday 5/2): 1.
10.1 p461/462 do pair of
problems #16 (state hypotheses) and 24 (write sentence), 2.
10.1 p461/462 do pair of
problems #18 (state hypotheses) and 26 (write sentence), 3.
10.1 p461/462 do pair of
problems #20 (state hypotheses) and 28 (write sentence), 4. Do complete test (hypotheses, level, calculation, decision,
sentence): A researcher believes that the average height of a woman aged 20
years or older is greater now than the 1994 mean of 63.7 inches. She obtains
a sample of 45 woman and finds the sample mean to be 63.9 inches. Assume a
population std. deviation of 3.5 inches and test at the 0.05 level. 5. Do complete test (hypotheses, level, calculation, decision,
sentence): The average daily volume of Dell computer stock in 2000 was 31.8
million shares. A trader wants to know if the volume has changed and takes a
random sample of 35 trading days and the mean is found to be 23.5 million
shares. Using a population std. deviation of 14.8 million, test at the 0.01
level of significance. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
4/25 |
Keep
working É only 3 weeks of regular session left! LECTURE
NOTES: Last time, we went over the hmk problems but added the knowledge
of how to calculate the sample z value and how to perform the significance
test using p-values. A z calculation is on p464 and p-value tests are
performed in ex3 p470 and ex4 p472. The p area is the probability that you would get a value as far
away or farther away from the mean as the sample value you got. If p <
alpha, reject Ho and if p > alpha, accept Ho You will need to refer to your old z-table to find the p-values. Abbreviated table of critical values for reference with the
classical approach:
ONE-SIDED
EXAMPLE: H0: m = 45 , H1: m < 45 , alpha = 0.05 sample data: n = 24,
sample mean = 40.8, population
std dev. = 10.5 z = (40.8 – 45) divided by (10.5 / sqrroot 24) = –
1.96 Classical The critical
value for 0.05 is –1.645 and –1.96 is farther out, so reject H0. P-value
For
z = –1.96 the p area is 0.0250 using
the z table. Since p is < alpha, we reject H0
since the sample is more rare in probability of occurrence than
the alpha. ONE-SIDED
EXAMPLE: H0: m = 0.045 and H1: m > 0.045 and alpha =
0.005 sample z= 2.06 sample data: n = 17,
sample mean = 0.0055,
population std dev. = 0.001 z = (0.0055 – 0.005) divided by (0.001 / sqrroot 17) =
2.06 Classical
The critical value for 0.005 is 2.576 and 2.06 is closer to
center, so accept H0. P-value
The area to the right of 2.06 is 0.0197 so the p value is 0.0197
> 0.005 (p>alpha) so accept H0 since the sample is not as
rare in probability of occurrence as the alpha TWO-SIDED
EXAMPLE: H0: m = 35 , H1: m is not 35 , alpha =
0.05 sample data: n = 40,
sample mean = 37.63,
population std dev. = 7.4 z = (37.63 – 35) divided by (7.4 / sqrroot 40) = 2.25 Classical
Each tail gets alpha divided by 2 = 0.025 which has a
critical z of 1.96. The sample z of 2.25 is farther away from the mean than
1.96 so reject H0. P-value
For a sample of 2.25, the tail area would be 0.0122 using your
old z table. So for a two sided test, each tail has .0122 so the p value is
0.0122+0.0122=0.0244. Since 0.0244 < 0.05 so we reject H0. MORE
EXAMPLES of p value method only: 1. Ho: m=11 and H1: m 11 and =0.01 sample z= 2.67 The area from the z table for z = 2.67 is 0.0038 and since we
are using a two-sided test, we add the areas for +/- 2.67: p = 2(0.0038) =
0.0076, so p < . Reject Ho. 2. Ho: m=265 and H1: m < 265 and =0.01 sample z= -2.25 The area from the z table for z = -2.25 is p = 0.0122, so p >
. Accept Ho. 3. Ho: m=35 and H1: m > 35 and =0.05 sample z= 2.23 The area from the z table for z = 2.23 is p = 0.0129, so p < . Reject Ho. 4. Ho: m=1.23 and H1 1.23 and =0.02 sample z= -2.45 The area from the z table for z = 2.45 is 0.0071 and since we
are using a two-sided test, we add the areas for +/- 2.45: p = 2(0.0071) =
0.0142, so p < . Reject Ho. 5. Ho m=0.045 and H1: m > 0.045 and =0.005 sample z= 2.06 The area from the z table for z = 2.06 is p = 0.0197, so p > . Accept Ho. 6. Ho: m=4500 and H1: m < 4500 and =0.025 sample z= -1.83 The area from the z table for z = -1.83 is p = 0.0336, so p >
. Accept Ho. HOMEWORK (due
Tuesday 4/30): (You did #3-7 below in the last hmk using the classical method,
now use the p-value method on them and compare the answers -- you should have the same
conclusions!). 1. Compute z with n = 35,
sample mean = 36.2, pop.
mean = 30, pop. std dev. = 12.9 2. Compute z with n = 2500, sample mean = 24.9,
pop. mean = 25.3, pop.
std dev. = 8.4. 3.
redo 10.2 p476 #12. given sample z = 1.92, find the p-value and accept or
reject Ho. 4.
redo 10.2 p476 #13. given sample z = 3.29, find the p-value and accept or
reject Ho. 5.
redo 10.2 p476 #14. given sample z = –1.32, find the p-value and accept
or reject Ho. 6.
redo 10.2 p476 #16. given sample z = 1.20, find the p-value and accept or
reject Ho. 7.
redo 10.2 p476 #18. given sample z = 2.61, find the p-value and accept or
reject Ho. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
4/23 |
No
new hmk. Study for test as in notes below. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
4/18 |
LECTURE: Ch 10
Significance tests (10.1 and 10.2): I gave a handout with the following abbreviated table of
critical values, so you do not have to look them up on the table backwards
each time you want to do a problem. The top row represents the area in either
the left or right tail of the distribution, and the bottom row represents the
positive or negative critical value. Refer to it as you look at the example
problems below:
One-sided significance
test example: If the null hypothesis is that the mean of a population is 45
and the alternate hypothesis is that it is less than 45, we have a one-sided alternate hypothesis: you only care if it is less than 45
and you donÕt care if it is greater. Ho:
= 35 H1:
< 35 If a level of significance (alpha) is given as = 0.02 and you take a sample and standardize it to get z= -1.90, does it give evidence to reject the null hypothesis and
therefore accept the alternate hypothesis? Classical approach
The critical value comes from the alpha value. Since we only care about
values that stray too far below what is claimed to be the center, we put all
of alpha into the left tail. The critical z value with 0.02 area to its left
is -2.054 from the table above. Since -1.90 is closer to center, we consider
it a routine sample (one that would happen 98% of the time) so there is
nothing strange about the center being where it is claimed to be. We accept
the null hypothesis. Two-sided
significance test example: If the null hypothesis is that the mean of a
population is 35 and the alternate hypothesis is that it is not 35 (within a
certain amount of acceptable error), we have a two-sided alternate
hypothesis: we care if is
significantly higher or lower than 35. Ho:
= 35 H1:
35 If a level of significance (alpha) is given as = 0.05 and you take a sample and standardize it to get z=2.25, does it give evidence to reject the null hypothesis and
therefore accept the alternate hypothesis? Classical approach
The critical values come from the alpha value. Since we have a two-sided H1,
alpha is divided by 2 to get 0.025 (this is how much goes in each tail
of the distribution) and on the table above, you see a critical z of 1.96.
Compare the sample z to the critical value of z. Since the sample z of 2.25
is farther away from the mean than the critical value of 1.96, we have
evidence to reject the null hypothesis. MORE
EXAMPLES: 1. Ho: m=11 and H1: m 11 and =0.01 sample z= 2.67 Half of alpha, 0.005 goes into each tail since the alternate
hypothesis is two-sided. The critical values for 0.005 are + or - 2.576 and
2.67 is farther away from center than this, so reject Ho. 2. Ho: m=265 and H1: m < 265 and =0.01 sample z= -2.25 All of alpha, 0.01 goes into the left tail of the distribution
since the alternate hypothesis only pertains to values < 265. The critical
value for 0.01 is -2.326 and -2.25 is closer to center than this, so accept
Ho. 3. Ho: m=35 and H1: m > 35 and =0.05 sample z= 2.23 All of alpha 0.05 goes into the right tail of the distribution
since the alternate hypothesis only pertains to values >35. The critical
value for 0.05 is 1.645 and 2.23 is farther from center than this, so reject
Ho. 4. Ho: m=1.23 and H1 1.23 and =0.02 sample z= -2.45 Half of alpha, 0.01, is put into each of the left and right
tails of the distribution since the alternate hypothesis pertains to values
not equal to 1.23, that is, both greater than and less than 1.23. The
critical values are + or - 2.326 and -2.45 is farther away from center than
this, so reject Ho. 5. Ho m=0.045 and H1: m > 0.045 and =0.005 sample z= 2.06 All of alpha, 0.005, is put into the right tail of the
distribution. The critical value for 0.005 is 2.576 and 2.06 is closer to
center than this, so accept Ho. 6. Ho: m=4500 and H1: m < 4500 and =0.025 sample z= -1.83 All of alpha is put into the left tail of the distribution due
to the alternate hypothesis. The critical value for 0.025 is -1.96 and -1.83
is closer to center than this, so accept Ho. HOMEWORK due Tuesday 4/23: 10.2
p476 (#15-18 in use p-values, but for right now, treat them with critical
values as below). Draw
distributions for each with relevant z values and areas (critical values from
table above): 12.
given part a. sample z = 1.92, do parts b, c, d 13.
given part a. sample z = 3.29, do parts b, c, d 14.
given part a. sample z = –1.32, do parts b, c, d 16.
given sample z = 1.20, do part b using critical values 18.
given sample z = 2.61, do part b using critical values TEST
#4 FORMAT for Thursday 4/25. Given: the z table, and formulas for the
population z value, confidence intervals, error, and sample size, and the critical values for 90/95/99%. --About
2 word problems with parts, like 7.3 p354 #17-24, 28-29 and hmks due 4/9,
4/11. --Show
how to find the z critical values for a given confidence level using the z
table backwards as in 7.2 p347 #23-26 and 9.1 p416 #13-16. --3 or 4 short answer questions on sampling distributions and
the effect of changes to sample size and confidence on error and confidence
intervals, as in #7 from hmk due 4/16 and todayÕs quiz. --At least one each of word problems (some with follow-up parts)
dealing with confidence intervals, error and sample size (not necessarily in
that order), as in 9.1 p416 #21-24, 43-48, and hmks due 4/16, 4/18. --A few situations like todayÕs work to accept or reject hypotheses
in significance tests. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
4/16 |
LECTURE: We
worked on word problems in class involving confidence intervals, error, and
sample size. READING: To
practice finding critical values,
see 7.2 ex8 p343 and p347 #23-26. For
background on what the sampling
distribution is and how we use it, see 8.1 p377-388. For
examples of forming confidence
intervals, finding the error
in using a sample mean to approximate the population mean, and how to select
a sample size for a fixed error,
see 9.1 p405-415. HOMEWORK due Thursday 4/18: In
your work below, round error and confidence interval values to 2 decimal
places. Do not round square root values in the middle of a calculation. As
stated on p414, always round sample size up to the next whole number.
For example, in ex. 7 p415, 43.2964 is rounded to 44. 1.
9.1 p416 #14 (do as in 7.2 p347 #23-26) 2.
9.1 p416 #16 (this was done in 7.2 p347 #23 also, so you have the answer from
there!) 3.
9.1 p416 # 24 (use the z values you found in 9.1 #14, and for 85%, Òz answerÓ
is in 9.1 #15!). 4.
9.1 p420 # 44 5.
Supplemental problem: A random sample of 300 telephone calls made to the
office of a large corporation is timed and reveals that the average call is
6.48 minutes long. Assume a std. deviation of 1.92 minutes can be used. If
6.48 minutes is used as an estimate of the true average length of telephone
calls made to the office, a.
What is the maximum error in the estimate of the mean using 99% confidence? b.
What is the maximum error in the estimate of the mean using 90% confidence? 6.
Supplemental problem: A large hospital finds that in 50 randomly selected
days it had, on average 96.4 patient admissions per day. From previous
studies a population std. deviation of 12.2 days can be used. Using a 90%
confidence level, a.
How large a sample of days must we choose in order to ensure that our
estimate of the actual daily number of hospital admissions is off by no more
than five admissions per day? b.
How large a sample of days must we choose to have one-fourth of the error in
part a? NEXT LECTURE: We will
start ch. 10 next time, and this chapter can be a difficult read. We will
look at the significance test in pieces before putting it all together, and
will start with how to make a decision, as outlined on p467 and p470. Bring
todayÕs handout on Thursday! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
4/11 |
LECTURE: Section 8.1: the sampling distribution lets us work
with a normal distribution of all possible samples from a population even the
original population is not normal (use n of 30 or greater in this case). Section 9.1: constructing confidence intervals looking at
ex. 3 p411 using the material from p410/411. The formula for the lower and
upper bounds of the confidence interval for the population mean can be
written as on p410: The
z values above are the Òcritical
valuesÓ that come from putting the given confidence level as an
area in the middle of the distribution. This was done in 7.2 p347 #23-26: #23
For confidence level of 80%, the critical z values are +/- 1.28 #24
For confidence level of 70%, the critical z values are +/- 1.04 #25
For confidence level of 99%, the critical z values are +/- 2.58 #26
For confidence level of 94%, the critical z values are +/- 1.88 The
most common critical z values are given in the table on p410: For
confidence level of 90%, we will just use the first of the closest table
values +/-1.64. For
confidence level of 95%, the critical z values are +/- 1.96 For
confidence level of 99%, we will just use the first of the closest table
values +/-2.57. exampleS: 1.
Speeding: In example 3 p411/412 a 90% confidence interval (note we use
1.65 from above) for the whole population of car speeds is: sample
mean – error < m < sample mean + error 59.62
– 1.64(8/squareroot12) < m <
59.62 + 1.64(8/squareroot12) 59.62
– 3.79 < m < 59.62 + 3.79 55.83
< m < 63.41 If
we are willing to accept a lower confidence level, like 70%, (note we use
1.04 from above): 59.62
– 1.04(8/squareroot12) < m <
59.62 + 1.04(8/squareroot12) 59.62
– 2.40 < m < 59.62 + 2.40 57.22
< m < 62.02 Tighter
interval around m, but valid in only 70% of all samples. You would need to
know why the estimate was needed in the first place, or be working in the
particular field of study to know whether the trade-off is worth it. Here,
residents wanted evidence that cars were speeding regularly, and the first
interval showed this (the speed limit for the area was 45mph), so there is no
need in this case to settle for less confidence to get a tighter interval. 2. Time
reading the Sunday paper: A
study discloses that 100 randomly selected readers devoted on average 126.5
minutes to the Sunday edition of the paper. Similar studies have shown a
population standard deviation of 26.4 can be used. Construct a 95% confidence
interval for the true average number of minutes that readers spend on the
Sunday edition. The
interval, using the formula above is: sample
mean - error < m < sample
mean + error 126.5
– 1.96(26.5/squareroot100) < m < 126.5 +
1.96(26.5/squareroot100) 126.5
– 1.96(26.5/10) < m < 126.5 +
1.96(26.5/10) 126.5
– 1.96(2.65) < m < 126.5 +
1.96(2.65) 126.5
– 5.17 < m < 126.5 + 5.17 121.33
< m < 131.67 So
in 95 samples out of 100, we would expect a reader to spend anywhere from
about 121 minutes to about 132 minutes on the Sunday paper. If
we take a sample of 500 readers instead, leaving all other conditions the
same, 126.5
– 1.96(26.5/squareroot500) < m < 126.5 +
1.96(26.5/squareroot500) 126.5
– 2.32 < m < 126.5 + 2.32 124.18
< m < 128.82 so
notice that without tampering with the confidence level, we got a tighter interval.
The problem with this, though, is that you pay a price in time and money by
taking a sample 5 times the first. READING IN
THE TEXT: Browse
section 8.1 p377-388, note definitions on p381 and 385. Read
9.1 p405-412 for items from this lecture, and read p413-415 for next time. HOMEWORK due Tuesday 04/16: Use given critical values above examples. Try
variations of the 2 examples above and notice the changes to the error and
interval width: 1.
Redo example 1 (speeding) above by changing 90% confidence to 95% (z is
1.96). 2.
Redo example 1 above keeping the 90% confidence level, but changing sample
size n to 50. 3.
Redo example 1 above keeping the 90% confidence level, but changing sample
size n to 5. 4.
Redo example 2 (Sunday paper) by changing 95% confidence to 90% confidence. 5.
Redo example 2 above by keeping 95% confidence but change n to 1000. 6.
Redo example 2 above by keeping 95% confidence but change n to 40. 7.
In general, from looking at the examples and your work: a.
Does the error get bigger or smaller as you reduce sample size? b.
Does the confidence interval get wider (less precise estimate for m) or
narrower (closer estimate for m) around the population mean as we take a
smaller sample size? c.
Does the error get bigger or smaller as you reduce the % confidence level? d.
Does the confidence interval get wider or narrower around the mean as we take
a smaller confidence level? |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
4/9 |
LECTURE: We
worked on normal word problems from 7.3 again today. In a ÒforwardsÓ
problem, given an x value (or two x values), use the standardizing formula z = ( x – mean)/(std. deviation) to find the z value(s).
Then look up the area in the tail of the distribution corresponding to each z
value. Find the area you want using these tail areas. In
ÒbackwardsÓ problems, given an
area (%, probability, proportion) find the x value that bounds it by
reversing the process from the ÒforwardsÓ problems. Identify the given area
in a picture and search the middle of the table for the closest area to the
one you are given, map it backwards to find the row and column it belongs to
in order to find the z value, then take the resulting z value and
ÒunstandardizeÓ it (solve for x) in the formula z = ( x – mean)/(std. deviation)! READING IN
THE TEXT: 7.3
p349-352, but using the table given in class. Supplementary exampleS
: 1. A salesman has an average car route trip time of 4.3 hours with std. deviation of 0.2 hours. What is the probability that the length of his car trip will last anywhere from 4 to 4.5 hours? Answer: This is a ÒforwardsÓ problem. For x=4, z=(4-4.3)/0.2=-1.5 and for x=4.5. z=(4.5-4.3)/0.2=1.0. The area to the left of –1.5 is 0.0668 and the area to the right of 1.0 is 0.1587. The area between is 1-(0.1587+0.0668)=0.7745, so there is about a 77% chance that his trip will last anywhere from 4 to 4.5 hours. 2.The lengths
of sardines received by a cannery have a mean of 4.64 inches and a standard
deviation of 0.25 inches. If the distribution of these lengths can be
approximated closely with a normal distribution, below which length lie the
shortest 18% of the sardines? Answer: This is a
ÒbackwardsÓ problem since you are looking for an x value (length of
sardines), having been given an area. The area of 18% or 0.18, is a left-hand
tail area, because it represents below-average lengths. The closest value to
this in the table is 0.1814. This area corresponds to a z value of – 0.91. We ÒunstandardizeÓ this value by
using the formula to solve for x and get – 0.91=(x-4.64)/0.25 so x=(– 0.91)(0.25)+4.64= –0.2275+4.64=4.41.
About 18% of
the sardines measure 4.4 inches or shorter. HOMEWORK due
Th 4/11: (Draw
normal distribution for each problem, label values, write sentence of
conclusion) 1. 7.3 p355 #28
(this is a ÒbackwardsÓ type problem) 2. The average
amount of radiation to which a person is exposed while flying by jet across
the U.S. is 4.35 units with std. deviation of 3.2. What is the probability
that a passenger will be exposed to more than 4 units of radiation? 3.
The number of days that patients are hospitalized is on average 7.1 days with
std. deviation of 3.2 days. How many days do the 20% longest-staying patients
stay? 4.
The average time to assemble a product is 27.8 minutes with a standard
deviation of 4.0 minutes. What percent of the time can one expect to assemble it in
anywhere from 30 to 35 minutes? 5. For a
salesman driving between cities, the average trip time is 4.3 hours with std.
deviation of 0.2 hours. Below what time lie the fastest 10% of his trips? |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
4/4 |
LECTURE: Before the
test, I gave a few examples of the use of the standardizing formula from ch3:
z = ((x-mean)/std.deviation) to find areas under the normal curve for word
problems. We will spend more time on this on Tuesday, but read some more
examples below and try the hmk. READING IN
THE TEXT: 7.3
p349-353 Notice that you are combining knowledge from 3.4 and 7.2 in this
section. Use the formula from 3.4 to standardize the distribution, then use
the table from 7.2 to look up areas. exampleS: 7.3 p354 #17 a.
area less than z = (20-21)/1 = –1 is 0.1587 so about 16% of the eggs
are expected to hatch in less than 20 days. b.
area more than z = (22-21)/1 = +1 is 0.1587 so about 16% of the eggs are
expected to hatch in more than 22 days. c.
area less than z = (19-21)/1 = +2 is 0.0228 and area more than z = 0 is 0.50,
so the area between is 1 – 0.0228 – 0.50 = 0.4772 so about 48% of
the eggs are expected to hatch in 19 to 21 days. d.
area less than z = (18-21)/1 = +3 is 0.0013 which happens 0.13% of the time
(much less than 1%). 7.3 p354 #21 b.
area less than z = (250-266)/16 = –1 is 0.1587 so about 16% of
pregnancies last less than 250 days. d.
area more than z = (280-266)/16 = +0.88 is 0.1894 so about 19% of pregnancies
last more than 280 days. e.
area no more than z = (245-266)/16 = –1.31 is the same as area less
than –1.31 which is 0.0951 so about 10% of pregnancies last no more
than 250 days. f.
area less than z = (224-266)/16 = –2.63 is 0.0043 so pregnancies
lasting less than 224 days happen less than ½ of a percent of the
time, therefore are considered rare. HOMEWORK due
Tuesday 04/09 7.3
p354 (Draw picture of normal distribution for each part and label with values
from problem) #18
try all parts abcd, then do an extra part e: e. probability a randomly
selected 6th-grade student reads less than 125 words per minute? #20
try all parts abcd, thn do an extra part e: e. probability a randomly
selected car will spend more than 2 minutes in the drive-thru? |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
4/2 |
We
briefly looked at the material we are moving into (7.3) and will probably
take another look at it before the test for a few minutes. Hmk
is to study for the test as outlined in the notes from 3/21 below. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
3/21 |
SPRING BREAK
IS HERE! HAVE A GREAT TIME! ANSWERS from
last hmk, not done in class: 3: Getting any
1 of 5 winning regular numbers and
not getting the winning Mega:
(
( 5C1*42C4 ) / 47C5 ) * ( (1C0*26C1) / 27C1 ) = ( (5*111930) / 1,533,939 ) *
(26/27) =
14550900 / 41,416,353 = 0.35 4: Getting any
2 of 5 winning regular numbers and
not getting the winning Mega:
(
( 5C2*42C3 ) / 47C5 ) * ( (1C0*26C1) / 27C1 ) = ( (10*11480) / 1,533,939 ) *
(26/27) =
2984800 / 41,416,353 = 0.07 Both
of the above events are considered too common to give prizes for, so you win
nothing! LECTURE: We
skipped ch 6, as we have already talked about probability distributions in
other sections, and move to ch 7 to revisit normal distributions and see
probability as areas under the curve. We
looked at the table below today to find areas under the already standardized
curve. We will use a modified version of this table from your book, where we
employ the symmetry of the curve so that the area to the left of a negative z
value is the same as the area to the right of a positive z value. In that
way, you can look up the z values on the one page below as + or – , not
just –. To
look up a particular value of z, you put together the row and column that
make up the z value. The left-most column gives the ones and tenths places of
the z, but the uppermost row gives the hundredths place of the z value. Example: to find the
area under the curve to the right of 1.83, since 1.83=1.8+0.03, you look to
the row of 1.8 and the column of 0.03 to get an area of 0.0336 in the right
tail of the distribution. Example: for the area
for values to the left of z= -1.57, putting row 1.5 with column 0.07 we get
0.0582 in the left tail of the distribution. READING IN
THE TEXT: (Review
of previous topics 7.1 p327-332 standardizing formula and area under the
normal curve) 7.2
p337-346 finding area under the normal curve (be careful that we are using a
modified version of the table in the book—the table in the book has a
two-page table with separate +/- values, but the answers to the area
exercises will come out the same) exampleS: 1.
What is the area to the left of z= –2.04? Answer: 0.0207 2.
What is the area to the right of z= 2.79? Answer: 0.0026 3.
What is the area to the left of z= –0.06? Answer: 0.4761 4.
What is the area to the left of z= –0.60? Answer: 0.2743 5.
What is the area to the right of z= 0.60? Answer: 0.2743 6.
What is the area to the right of z= –1.74? Answer: 0.9591 (from 1-0.0409) 7. What is the area to the left of z= 1.05? Answer: 0.8531 (from 1-0.1469) 8.
What is the area between z= 0.87 and z= 2.03? Answer: 0.1710 (The
smaller tail corresponding to z= 2.03 has area of 0.0212 and the larger tail
corresponding to z=0.87 has area of 0.1922. The smaller area is contained
within the larger area, so to find the area between, take the larger and
subtract the smaller: 0.1922–0.0212=0.1710) 9.
What is the area between z= –0.25 and z= –1.97 Answer: 0.3769
(Same as the above, subtract smaller tail from larger: 0.4013-0.0244 =
0.3769) 10.
What is the area between z= –2.09 and z=3.07? Answer: 0.9806
(Different from the previous two problems, because the values are on opposite
sides of the distribution and so it is not the case that one tail is
contained within the other. You must start with 100% of the whole
distribution and Òchop offÓ the two tails using subtraction: 1
– (0.0183 + 0.0011) or 1 – 0.0183 – 0.0011 which equals
0.9806) HOMEWORK due
Tuesday 4/2: 1. 7.2 p346 # 6
ac 2. 7.2 p346 # 8
bd 3. 7.2 p346 #
10 abc 4. Find the
area between z = – 0.99 and z = – 1.09 5. Find the
area between z = + 0.05 and z = + 1.05 6. Find the
area between z = – 2 and z = +2. 7. Find the area
between z = – 3 and z = +3. TEST #3
FORMAT: (Will
occur as scheduled on Th 4/4) I
will provide the general addition and multiplication rules and formulas for
nPr and nCr. 1. One
probability model set-up like 5.1 p235 #40 combined with a problem like 5.2
p246 #26 2. Given a
table like 5.2 p248 #42, 44 but combined with material from 5.4, find
probabilities of events like: P(A), P(A |
B), P(A and B), P(A or B). Events may be stated in words (as in #42, 44
for example) or defined with letters such as C and D. 3. One general
addition rule card problem like 5.2 #32ac or ex3 p242 4. One general
multiplication rule word problem like 5.4 #12-16 5. One nPr or
nCr to show meaning/cancellations from formula, like 5.5 p276 #18 or 26 6. About 3
situations like 5.5 #46-50 to write appropriate nPr, nCr but not compute it 7. One like
5.5#66 or ex15 p276 to write a probability using a quotient of nCr counts
from subsets 8. Follow-up
question to #7 as in hmk prob. 5.5 p278 #66, where you needed to identify all
the events that make up a probability distribution for the situation: without
calculating all of them, you know that out of samples of 3 colas from the
12-pack, you can only have the 4 possible events of 0 diet, 1 diet, 2 diet,
and 3 diet in the samples, so P(0)+P(1)+P(2)+P(3) = 1. 9. One like 5.5
p278 #60 or Super Lotto from hmk to write probability as a quotient of nCr
counts 10. Various
areas to look up like todayÕs hmk from 7.2 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
3/19 |
LECTURE: 5.5:
how to form more sophisticated probabilities using nCr counts. READING IN
THE TEXT: 5.5p275-276
ex 14, 15 probabilities involving combinations exampleS: 5.5 p278 #62 c. (55C3*45C4)/100C7 = (26235*148995)/(1.60075608x10 to the 10th
power) = about 0.24 5.5 p278 #65a. Out of the 13 tracks, 5 are liked so 8 must be
disliked. You are taking 2 of 5 liked and 2 of 8 disliked for the event
probability on the top of the fraction. On the bottom of the fraction, any 4
could pop up from the 13 tracks available.
(5C2*8C2)/13C4 = (10*28)/715 = about 0.39 Another
example
not in book: Out of 125 dishes in a box, 8 are chipped. If we select 6 dishes
at random from the box, what is the probability that exactly 1 will be
chipped?: Answer:
Out of 125 dishes in a set, if 8 are chipped, 117 are not.
(8C1*117C5)/125C6=(8*167549733)/4690625500 =0.29 There
are two very recognizable lottery examples of forming probabilities using
this counting method. One is the MEGA MILLIONS lottery game which we looked
at today (p278 #60). It involves tickets sold in at least 42 states. Another
is just for California: SUPER LOTTO PLUS: CALIFORNIA
LOTTERY: SUPER LOTTO PLUS To
play the game, you are asked to pick 5 different numbers choosing from 1 to
47 regular numbers and one ÒMegaÓ number choosing from 1 to 27. The top prize
(which is the one advertised in millions) goes to whoever matches all 5 of 5
winning numbers and matches the one Mega number. Much smaller prizes are
awarded for matching some of the numbers. Example: Getting any
3 of 5 winning regular numbers and
not getting the winning Mega:
(
( 5C3 * 42C2 ) / 47C5 ) * ( (1C0*26C1) / 27C1 ) = ( ( 10 * 861 ) / 1,533,939
) * ( (1*26) / 27 ) =
223860 / 41,416,353 = 0.005405111 Example: Getting any
3 of 5 and the Mega: (
( 5C3 * 42C2 ) / 47C5 ) * ( 1C1 / 27C1 ) = ( ( 10 * 861 ) / 1,533,939 ) * ( 1
/ 27 ) =
8610 / 41,416,353 = 0.00207889 Example: Getting all
5 of 5 and the Mega: ( 5C5 / 47C5
) * ( 1C1 / 27C1 ) = ( 1 / 1,533,939 )( 1 / 27 ) = 1 / 41,416,353 =
0.000000002. HOMEWORK due
Th 3/21: 1.
5.5 p278 #66 Do parts a, b, and c, and also find P(exactly 0 diet). Check
that the probabilities from all the parts form a probability distribution:
P(0) + P(1) + P(2) + P(3) = 1 2.
We did 5.5 p278 #60 in class. Using the same game and sets, find the
probability of getting none of the winning numbers from either set! 3.
In Super Lotto Plus (described in above notes), find the probability of
getting 1 of the winning regular numbers and not getting the Mega number. 4.
In Super Lotto Plus (described in above notes), find the probability of
getting 2 of the winning regular numbers and not getting the Mega number. To make your
life easier in problems 2, 3 and 4 above, here are some shortcuts: nC0=1 for all n (so 5C0 = 1 for
example) nC1=n for all n (so 5C1 = 5 for
example) nCn=1 for all n (so 5C5 = 1 for
example) 51C5=2,349,060 5C2=10 42C3=11,480 42C4=111,930 47C5=1,533,939 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
3/14 |
ANSWERS:
previous hmk not covered in class 5.2 #44 a.
P(placebo) = 100/250 b.
P(went away) = 188/250 c.
P(placebo and went away) = 56/250 d.
P(placebo or went away) = 100/250+188/250-56/250 = 232/250 5.4 #18 a.
P(former | cancer) = 91/1014 b.
P(cancer | former) = 91/7848 LECTURE: I
introduced material from section 5.5 that we will continue with on Tuesday. To
form more complicated probabilities, one must know how to count sometimes
large and complex numbers of things. We
considered the example of how to count the different ways one can take 2
letters, without repetition, from a set of 3 letters{A, B, C}. We found 6
permutations (order matters) and 3 combinations (order doesnÕt matter) with
the help of a tree diagram. We can perform these counts without a tree, using
the formulas in 5.5. READING IN
THE TEXT: 5.5
pages 266 thru the end of example 11 on p273, and
try some of the computations in the skill building section on p276/277 for yourself (check answers to odds in the back of the
book). Especially read about: --
tree diagrams on p267 --
factorials on p269 --
permutation formula p270 --
combination formula p272 --p271
ex7 and p273 ex11 deciding if order is important HOMEWORK due
Tuesday 3/19: (Please perform problems in the order listed below) 5.5
p276 # 6, 8, 14, 16, 24 (note that #9 tells you that 0! = 1) 5.5
p277 #28, showing the possible paths on a tree diagram and check count with
formula. 5.5
p277 #30, showing which outcomes in #28 above are repeats and check with
formula. 5.5
p277 #18 and see how quickly the counts can get out of hand, even with small
sets of objects. (Would you want to list all of these selections in a table
or on a tree diagram?) 5.5
p277/278 #46, 50 deciding if order matters first, then computing the
appropriate P or C. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
3/12 |
LECTURE: 5.2
goes over the two forms for the addition rule p238 and p241/242 and 5.3/5.4
go over the two forms for the multiplication rule p251 and p259. The
2nd forms of each rule (p242 and p259) are the general forms, which actually
cover the cases of the 1st forms (p238 and p251). So you only need the
general forms for all cases! ÒE
OR FÓ is a union of sets and ÒE AND FÓ is an intersection of
sets. The
general addition rule of section
5.2 includes intersections as part of the union (subtracting the intersection
once so as not to double count it in the union): P(E
or F) = P(E) + P(F) – P(E and F) In
table problems, we treated P(E and F) as an intersection of a row and column
in the table, but now we can also use the general multiplication rule to find it: P(E
and F) = P(E)áP(F l E), where P(F l E), read ÒF given EÓ, is the conditional probability: the
probability that event F occurs given that event E has already occurred or
that F occurs given that E is the subset F is being chosen from. We
worked on an exercise in class using a table like that in 5.2 p248 #42 to
highlight how to find one-event (ÒsimpleÓ probabilities), conditional
probabilities, and multi-event ( ÒcompoundÓ probabilities using AND and OR). In
the table handout in class, the first three probabilities are all one-event
classical forms: 1.
P(target) = 145/282 2.
P(29-39) = 125/282 3. P(Macys) = 137/282 The
next two are conditional probabilities: 4.
P(target given 29-39) = 50/125 (given we select only from the 125 29-39, find
P(Target)) 5.
P(29-39 given target) = 50/145 (given we select only from the 145 Targets,
find P(29-39)) The
next three are ways to find an intersection (mult rule): 6.
P(target and 29-39) = 50/282 (there are 50 fitting both categories out of all
the subjects) 7.
P(target and 29-39) = P(target)* P(29-39 given target) = (145/282)*(50/145) =
50/282 8.
P(29-39 and target) = P(29-39)* P(target given 29-39) = (125/282)*(50/125) =
50/282 The
next two are finding unions (addition rule) 9.
P(target or 29-39) = P(target)+P(29-39)-P(target and 29-39) = 220/282 10.
P(target or macys) = P(target)+P(macys)-P(target and macys) =
145/282+137/282-0/282 which equals 1, since you have
100% chance they will prefer one or the other since all those in the survey had to
choose one. You
can find a table discussion similar to the one we had in class in example 4
on p242/243 and examples 1/2 on p257/258 (they refer to the same table even
though in different sections). READING IN
THE TEXT: 5.2
p238-243 assigned previously 5.3
p250-252 (thru example 2) multiplication rule for independent events 5.4
p256-259 (thru example 3) general multiplication rule (works for both
independent and dependent events, thus it is a general rule!). MORE EXAMPLES (to guide
you in your new hmk assignment): 5.2
p247 5. E and F share {5, 6, 7} so they are not mutually exclusive 7. S has 12 members and (F or G) = {5, 6, 7, 8, 9, 10, 11, 12}
so P(F or G) = 8/12 = 2/3
P(F or G) = P(F) + P(G) – P(Fand G) = 5/12 +4/12 – 1/12 =
8/12 or 2/3 9. E and G do not share any numbers, so they are mutually
exclusive 13. P(E or F) = P(E) +P(F) – P(E and F) = 0.25 +0.45
– 0.15 = 0.55 15. P(E or F) = P(E) + P(F) = 0.25 + 0.45 = 0.70 19. P(E or F) = P(E) +P(F) – P(E and F) so 0.85 = 0.60
+P(F) – 0.05 and solving for P(F), we get 0.85 – 0.55 or 0.30. 31. a. P(heart or club) = P(heart) + P(club) = 13/52 + 13/52 =
26/52 = 0.50
b. P(heart or club or diamond) = P(heart) + P(club) +P(diamond) = 13/52 + 13/52
+13/52 = 39/52 = 0.75
c. P(heart or ace) = P(heart) + P(ace) – P(heart and ace) =
13/52 + 4/52 – 1/52 = 16/52 43.
a. P(satisfied) = 231/375 b. P(junior)
= 94/375 c.
P(satisfied and junior) = 64/375 from the intersection of the row and column
in the table. d.
P(satisfied or junior) = P(satisfied) + P(junior) – P(satisfied and
junior) = 231/375
+ 94/375 – 64/375 = 261/375 5.4
p262 3.
P(E and F) = P(E)*P(F given E) so 0.6 = (0.8)(P(F given E)) so P(F given E) =
0.6/0.8 = 0.75 13.
use mult rule but now in word problem form! P(cloudy and rainy)
= P(cloudy)*P(rainy given cloudy) 0.21 =
(0.37)(P(rainy given cloudy)) so P(rainy given cloudy) = 0.21/0.37 = 0.57 15.
P(16/17 and white) = P(16/17)*P(white given 16/17) 0.062 = (0.084)(
P(white given 16/17)) so P(white given 16/17) = 0.062/0.084 = 0.74 17.
a. P(no given <18) = 8661/78676 = 0.11 b. P(<18 given
no) = 8661/46993 = 0.18 Homework due Thursday
03/14: 5.2
p245 #8, 14, 20, 40, 44 5.4
p262 #4, 8, 14, 16, 18 Hints: In
5.2 #14, 20 and 5.4 #4, 8 write down the general rules first, then fill in
the numbers given and solve for the one that is not given. In
5.4 p262 #14, 16 both word problems involve plugging two given values in the
general mult rule and solving for the other one in the way #4, 8 prepare you
to do – these can be tricky, but try them anyway! In
5.2 #40, 44 and 5.4 #18 you are writing probabilities from a table as we did
in class today – they give you column and row totals in #40, but you
have to find the totals first yourself in the other two problems. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
03/07 |
LECTURE: Last
time and today before the test, we looked at finding probabilities from
tables from 5.1 and 5.2, using the classical method for forming probabilities
and the general addition rule. READING IN
THE TEXT: 5.1
p223-227 up to but not including example 5. 5.2
p238-243 to the end of example 4. EXAMPLES (to guide
you in your hmk): 5.1
p233 (using the set-up for probabilities on p227) 13. cannot have a negative probability 31. P(sports) = 288/500 = 0.576 33. a. P(red) = 40/100 = 0.40
b. P(purple) = 25/100 = 0.25 39. 118+249+345+716+3093 = 4521 never 125/4521 = 0.026, rarely
249/ 4521=0.068, sometimes 345/4521 =0.116, most 716/4521 =0.263, always
3093/4521=0.527 49. a. P(right) = 24/73 = about 0.33
b. P(left) = 2/73 = about 0.03 c. yes, only 3% of the time 5.2
p247 25. using the addition rule for disjoint events (add the
probabilities):
a. they all add to 1
b. gun or knife = 0.671 +0.126 = 0.797
c. 0.126 + 0.044 + 0.010 = 0.180
d. 1 – 0.671 = 0.329
e. yes, they only happen 1% of the time 43.
a. P(satisfied) = 231/375 b. P(junior)
= 94/375 c.
P(satisfied and junior) = 64/375 from the intersection of the row and column
in the table. d.
P(satisfied or junior) = P(satisfied) + P(junior) – P(satisfied and
junior) = 231/375
+ 94/375 – 64/375 = 261/375 Homework due Tuesday
03/12: Do 5.1 p233 #12, 14, 32, 34, 40, 52 Do 5.2 p247 #26, 34, 42 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
03/05 |
Note
that item 13 on the test format below has been removed. We talked about items
11 and 12 in class today and they will be on the test. Here are some examples
to guide you on these items: 5.1
p233 (using the set-up for probabilities on p227) 17. There is a 42% chance of being dealt a pair (two cards of
the same value) in 5-card stud poker. If you play 5-card stud 100 times, will
you be dealt a pair exactly 42 times? Answer: No, the classical method expects 42 over the long-haul,
but in an experiment with a fixed number of trials (the empirical method)
anything can happen; 42 times is not guaranteed. 31. P(sports) = 288/500 = 0.576 33. a. P(red) = 40/100 = 0.40
b. P(purple) = 25/100 = 0.25 49. a. P(right) = 24/73 = about 0.33
b. P(left) = 2/73 = about 0.03
c. yes, only 3% of the time Homework
for now is to study for your test. You will have some homework in 5.1 and 5.2
after the test which will be introduced before the test for a few minutes and
will be due on Tuesday. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
02/28 |
LECTURE: We started looking at how we can put the mean and standard
deviation to work now that we know how to calculate them. Overall, we want to
be able to make any normal distribution have the same center (mean) of 0 and
same spread (std. deviation) of 1. Once distributions are on the same scale,
you can compare them. READING: 3.4 p155 (thru example 1) z-score definition and formula
(note that z values measure the number of std. deviations that an x data
value lies from the mean). We also talked about making area comparisons without being
able to look up the areas on a table yet. The material from 7.1 p330-332 approximates
this discussion. Look at how z calculations are found, given the mean and
std. deviation, then look at figure 10 to see a picture of how the original x
data is transformed to z data that is centered at 0 and has std. deviation of
1. Once areas for different distributions are on the same scale, you can
compare them (i.e., is one area contained within the other on a shared graph?
The area to the left of z = 2 is larger that the area to the left of z = 1). HOMEWORK: due Tuesday
03/05: (do not just
skip the first problem – there is a lot to do in it so it will be worth
more points!) 1. Ch3 Review
p173 #2 (use either formula for the std. deviation – this problem is a
little time-consuming, but a good roundup of several ideas!) 2. 3.4 p162 #26
(base your answer on quartiles/outliers) 3. 3.4 p161 #12 (show
comparison on a picture of a normal distribution) 4. 3.4 p161 #14 (show
comparison on a picture of a normal distribution) 5. Which is larger, the
area associated with values less than 55 for a distribution with a mean of 80
and std. deviation of 10, or the area associated with values less than 50 for
a distribution with a mean of 80 and a std. deviation of 15? (show comparison
on a picture of a normal distribution) TEST# 2 Will occur as
scheduled on Thursday 3/7. As always, bring your own calculator, but not a
cell phone, PDA, or other transmission device. Formulas to be provided on front board (be careful that since
formulas are not provided next to the problems, you must know which formula
is being asked for and what the symbols mean): 3.2 p136 table 11 formula involving deviations for std. deviation of
samples, 3.2 p134 table 10 computational formula for std. deviation of
populations 3.4 p155 z scores for populations 3.4 p160 lower and upper fences FORMAT: 1. Short question on misleading graphs. (See 2.3 p106
#1-7, 10-12, 14). 2. Given data, find the mean median and mode. (see 3.1 p125 #16 for
example). 3. Short answer questions from ch3 reading, for example: means of samples and populations (3.1
p118/119 ex1, p127 #24), resistance to skewing and distribution
shape (3.1 p122 definition and table 4), when to best use means, medians, modes (3.1
reading and p129 #42), measures of center and dispersion best to
report (3.2 p141 #8 , 3.4 p159 summary table), meaning of standardization (3.4 p155
definition). 4. Assess possible skewing by comparing median and mean values, or
matching pictures of
distributions with table data (see 3.1 p126 #18 for example). 5. Given sample data, find the mean and std. deviation (see 3.2 p142
#11-16, 20 for ex.) (using formula involving deviations, not
computational formula). 6. Given population data compute z values, mean, and std. deviation
(see 3.4 p163 #30 for ex.) (using computational formula, not formula
involving deviations). 7. Given quartiles only, assess skewing and outliers (see 3.4 p162
#20 for ex.) 8. For data, find quartiles/IQR/fences/outliers/box plot (see 3.4
p162 #22 and 3.5 p170 #12). 9. Word problem to compare z scores and relative placement (see 3.4
p161 #9-14 for ex.). 10. Question on comparison of areas under the normal curve using z
values (from hmk above). 11. Short answer on meaning of empirical and classical probabilities
(see 5.1 #17, 18 for ex.). 12. One
short problem to state a classical probability (see 5.1 p234 #32-34 for ex.). (Items 11, 12
will be covered in class on Tuesday). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
02/26 |
LECTURE: We worked on
3.4 and 3.5 together to form a box plot. We looked at examples in a class
exercise that had all whole numbers in the data and relatively nice
calculations. The ones for hmk will have larger and less friendly sets of
data! Answers to
the second box plot exercise from class: Put
the data in order first. Q2
= 35 Q1
= mean of 33 and 34 = 33.5 Q3
= mean of 39 and 42 = 40.5 IQR
= 40.5 – 33.5 = 7 LF
= 33.5 – 1.5(7) = 23 UF
= 40.5 + 1.5(7) = 51 There
is one outlier (but very close!): 22. READING IN
THE TEXT: For
hmk prep: 3.4
p157-160 percentiles, quartiles, and outliers discussion. 3.5
p164-167 read about how to construct a box plot from the quartiles in section
3.4 (see blue box on p165) and how the box plot gives a nice visual of data
that is easier to construct than a histogram (see pictures on p167). For
next time and perhaps the time after that, 3.4
p155-156 standard scores (word problems) 5.1
p223-227 up to but not including example 5. 5.2
p238-243 to the end of example 4. MORE EXAMPLES (to guide
you in your hmk!): 3.4 p162 #21 The mean of
this sample data is 3.99 and std. dev. is 1.78. Notice they put the data in
order by columns, so you do not need to list it again! a.
z = (0.97- 3.99)/178 = –1.70 b.
Q2 = (3.97+4)/2 = 3.99 Q1 = (2.47+2.78)/2 = 2.63 Q3 = (5.22+5.50)/2 = 5.36 IQR
= Q3 – Q1 = 5.36 – 2.63 = 2.73 Left
fence = 2.63 – 1.5(2.73) = –1.47 Right
fence = 5.36 + 1.5(2.73) = 9.46 so
no outliers (data values outside the ÒfenceÓ). 3.5 p170 #11 The
data in order are: 0.598,
0.600, 0.600, 0.601, 0.602, 0.603, 0.605, 0.605, 0.605, 0.606, 0.607, 0.607,
0.608, 0.608, 0.608, 0.608, 0.608, 0.609, 0.610, 0.610, 0.610, 0.610, 0.611,
0.611, 0.612. a.
Q2 = 0.608 Q1
= (603+605)/2 = 604
Q3 = (610+610)/2 = 610 IQR
= Q3 – Q1 = 610 – 604 =
6 Left
fence = 604 – 1.5(6) = 595
(so no outliers on the left) Right
fence = 610 + 1.5(6) = 619 (no outliers on the right either) Number
line below box plot shows min, Q1, M, Q3, max:
Data
appears to be skewed left. HOMEWORK due Thursday
02/28: 1. 3.4 p162
#20, 2. 3.4 p162 #22bcd
(given that mean of sample is 10.08 and std. dev. of sample is 1.89), 3. 3.4 p162
#24, 4. 3.5 p169 #6,
5. 3.5 p170 #12
(data in order 1.01, 1.34, 1.40, 1.44, 1.47, 1.53, 1.61, 1.64, 1.67, 2.07,
2.08, 2.09, 2.12, 2.21, 2.34, 2.38, 2.39, 2.64, 2.67, 2.68, 2.87, 3.44, 3.65,
3.86, 5.22, 6.81) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
02/21 |
Today,
I gave a personal code by which you can verify the % I have for your Test#1
scores. You can check it at http://www.smccd.edu/accounts/callahanp
under ÒtestscoresÓ by some time this afternoon. LECTURE: We
did more with std. deviation, now with respect to populations. Towards
the end of class, we took a look at how we will use the mean and std.
deviation that we now know how to compute. In section 3.4. standardized
values put the mean and standard deviation to work. Overall, we want to be
able to make any normal (symmetric) distribution have the same center (mean)
of 0 and same spread (std. deviation) of 1. Once distributions are on the
same scale, you can compare them or look up areas related to the distribution
from a table (table A-11 in the back of your book) READING
IN THE TEXT: For
hmk prep: 3.2
p131-134 std. deviation of a population 3.2
p139 Empirical Rule 3.4
p155-156 z-scores For
next time prep: 3.4
p157-160 quartiles 3.5
p163-167 box plots A
class exercise on Tuesday will unite 3.4 and 3.5 (3.3
will be skipped) Homework due Tuesday
02/26: 3.2 p142 #26 (note this
continues 3.1 #24 about means, but now asks for std. deviations): part a do by both methods: deviations and computational
formula, part b do each of your 3
samplesÕ std. deviations by either method (your choice) 3.4 p163 #30 (see table
of data from 3.1 p127 #24 and 3.2 p143 #26) Do
the z scores for the whole population only, not for each sample! If you
missed how to calculate z values, look at p156 and also p332 has some
examples. a. Given the
info from those previous problems that the population mean of the data in the
table is 26.4 and the population std. deviation of the data in the table is
12.8, find the z-scores for each data point in the table (you should end up
with 9 z scores). b. Find the mean
of the 9 z-scores from part a. (sum the + and – values as they are). c. Find the
std. deviation of the 9 z-scores from part a. using the computational formula
for population std. deviation as in table 10 on p134 (remember to divide by
N, not n-1). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
02/19 |
We
started looking at the notion of spread from 3.2. We started an exercise in
class finding the standard deviation
2 ways for the data set of x values below. Check the values:
Deviations
method: squareroot of (35/(7-1)) = squareroot of 5.83 = 2.42 Computing
formula method: Sxx from before yields 246.75 – ((38.5)(38.5)/7) = 35
so squareroot of Sxx/n-1
= squareroot of (35/(7-1)) = squareroot of 5.83 = 2.42 READING
IN THE TEXT: For
hmk prep: 3.1
p122 mean vs. median 3.2
p135-137 std. deviation of a sample For
next time prep: 3.2
p131-134 std. deviation of a population (3.3
will be skipped) 3.4
p157-160 quartiles (if there is time) 3.5
p163-167 box plots (if there is time) Homework due Thursday
02/21: 3.1
p125 #14, 18, 44 3.2
p142 #8, 3.2
p142 #12 (find std. deviation by deviations formula only – see p136
table 11), 3.2
p142 #20 (find std. deviation by computation formula only – see p136
table 12) Note:
p137 tells you to take the squareroot of the values on the previous page p136
to find the std. deviation, not just the variance! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
02/14 |
Before
the test, we took a brief look at measures of center and means of different
samples from a population. Read the section 3.1 as below, then try some hmk
for Tuesday. READING
IN THE TEXT: 3.1
p117-125 mean, median, mode, and comparing the mean vs. median Be
sure to read p122 about resistance and how distribution shapes affect the
mean and median. We will go over this on Tuesday. 3.2 on spread of data will be covered
on Tuesday if you want to read ahead. Homework due Tuesday
02/19: 3.1
p125 #16, 24, 28, 30, 36, 42 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
02/12 |
After
going over the hmk, we looked at 2.3 examples regarding use of areas in
graphics, and how scaling affects perception. Please read 2.3 p100-106 and
look at p106-108 #1-7, 10-12 (we looked at 2, 4, 6, 10 in class -- skip
time-series plots). Homework
is to study for your test. The format is at the end of the previous notes
below. Before
your test, we will take a brief look (10 to 15 mins) at the first section of
ch3 (mean, median and mode). There will be hmk from 3.1 due Tuesday and we
will pick up with the 3.2 concepts of center and spread. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
02/07 |
We
spent time after the quiz talking about the vocabulary of Ch.1 in 1.1 and
1.5. If you only recently got your book, you should go back and read these
sections to fill in the ideas. Sometime soon, you should also browse the
pages in 1.2, 1.3, and 1.4. Although we do not have time to go over this
material, it is important for your general knowledge of statistics. READING IN
THE TEXT: 2.2
p78-83 constructing classes and histograms for continuous data See
2.3 p106 #1-7, 10-12, 14 which we will look at on Tuesday in class (no hmk) IN-CLASS
EXERCISE: You
can see the answers in the back of the
book since the exercise was problem 2.2
p95 #39. Note
that class width is the difference between consecutive lower class limits,
not the difference between the lower and upper limits for any one class. 1st
part:
classes of width 10 give: class frequency tally 20-29 1 30-39 111111 40-49 1111111111 50-59 11111111111111 60-69 111111 70-79 111 You
can see the histogram by turning the tally marks above 90 degrees
counterclockwise! 2nd
part:
classes of width 5 are: 20-24, 25-29, 30-34 etc. The
histogram for this has the same general shape as the one from part a above,
but spreads the data out more and contains more peaks and valleys that show
more about the original data. I show height in X blocks below to ensure the
scale stays in line on this page, but you should just draw bars of the same
height (see answer to 2.2 #39 in back of book):
As
the book points out, there is no one best way to divide up the data. You pick
what you think shows the spread of the data best. I like the one with class
width 10 since it yields a fairly smooth Òbell curveÓ shape. I find the extra
peaks and valleys using class width 5 to be somewhat annoying to look at. Homework due
Tuesday 2/12: 2.2
p91-95 #2, 4, 6, 12, 14, 30, 34, 38 Note
for #30: instructions are listed before problem #27. Note
for #34: instructions are listed before problem #31. Note
for #38: make upper class limits accurate to one decimal, just like the data.
Your first class should be: 8 - 9.9 (since 9.9 is the last one decimal number
before 10). Looking
forward: We
will work on 2.3 on Tuesday. Test #1
will occur on Thursday 2/14 as scheduled and will cover the material from
lecture, class exercises, homework, and quizzes. Formulas
and testpaper will be provided but you must bring a standard scientific
calculator and something to write with. You will not be allowed to use
calculator cell phones, PDAs, or other transmission-capable devices. You may
not share a calculator (see if someone can let you borrow theirs after they
turn in their test). Format: 1a.
Graph linear scatterplot data, draw best fit line estimate, and find the
equation of the line using two indicated points. 1b.
Given the Òsummation tableÓ for x, y, xy, and x squared for the data use the
equations for Sxx, Syy, etc. to find the actual line of best fit. 2.
Given a set of exponential scatterplot data, turn it into linear scatterplot
data using logs, then graph the linear data. One or two follow-up questions
may ask you to plug in a given x or y value to a given exponential equation
and solve for the other variable. 3.
Given the line of best fit for some ÒloggedÓ data (x, logy), find the
exponential of best fit for the original data (x, y) by ÒunloggingÓ the slope
and y-intercept. 4.
Some short-answer questions on chapter 1.1 and 1.5 reading and definitions,
including statistics, samples and populations, qualitative and quantitative
variables, discrete and continuous variables, and bias in sampling. 5.
Given a set of data, find/show various parts of the following: frequency and
relative frequency distributions, frequency and relative frequency bar graphs
and side-by-side comparison of two sets of data, and why one representation
is better than another. 6.
Given a small set of data, construct classes of given width and lower first
class limit, and form the resulting frequency distribution. Be prepared to
provide a histogram if asked for. 7.
Answer short questions regarding use of areas in graphics, and how vertical
scaling affects perception. (See 2.3 p106 #1-7, 10-12, 14 which we will look
at on Tuesday). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
02/05 |
I
now assume that you have access to the book and can do some reading. Just in
case, though, I will put in some notes from Ch1 and 2.1 below. I felt that we
could have a better discussion of Ch. 1 vocabulary after you have read it, so
it is included in the hmk due Thursday and we will talk more about it then. In
class today, we took a look at bar graphs in 2.1. Note that comparison of
numbers of college grads using frequencies shows an increase in Internet
access. But from this, we cannot tell if the increase was just due to a
larger group being sampled in 2003 or if it represents a larger proportional
increase within the whole population. Based on the samples taken, can we
infer that Internet access has increased for the population of all college
grads? A different comparison can be made to answer this question using
relative frequencies:
Now
we can see that all of the categories increased with respect to frequency,
but relative frequency shows that the last two categories actually decreased
in proportion to the whole. So in general we find that the proportion of
college grads with access to the Internet has actually decreased. To
read for todayÕs hmk: 1.1
p3-8, noting definitions especially: A
sample is a subset of the population studied, and we make inferences about the population based
on that sample. Variables are the
characteristics of the individuals within the population. Qualitative, or
categorical variables allow for classification of individuals based on some
attribute or characteristic. Be careful that numbers can be used to identify
categories, but arithmetic operations between the numbers are meaningless.
Example: zip codes group areas together, but the difference (subtraction) of
two zip codes is meaningless data. Quantitative variables
provide numerical measures of individuals, where arithmetic operations can be
performed on the numbers involved and provide meaningful results. Example:
people can be grouped according to height in inches, and how much two peopleÕs
heightsÕ differ is meaningful data. A
discrete variable is a quantitative
variable that has either a finite number of possible values or a
countable number (such as 0, 1, 2, etc) of possible values. Example: The
number of cars that go through a fast-food line is discrete because it
results from counting A
continuous variable is a quantitative
variable that has an infinite number of possible values that result from
making measurements. Example: The number of miles a car can travel with a
full tank of gas is continuous because the distance would have to be
measured To
read for todayÕs hmk: 1.5
p38-42 about bias in sampling briefly: sampling bias
uses
a technique that favors one part of a population over another, undercoverage
causes
a sample to not be fully representative of the whole population, nonresponse of sample
subjects to surveys causes error due to missing data that may or may not be
minimized by callbacks and incentives. response bias
can
result from respondents not feeling comfortable with interviewers or
misrepresenting facts or lying, or from questions that are leading in the way
they are phrased (poorly worded questions). Example: A policeman asks
students in a classroom to fill out a survey involving whether they have used
drugs and what kinds they have used. Anonymous discussion of the results will
follow. Response bias could occur if the students feel uncomfortable
giving this information to a policeman, or if students misrepresent facts
because they either donÕt want to face their problems or think it will appear
cool to their friends. (To
read over the coming weeks at your leisure -- no homework problems on these: 1.2
p16-17, about observational and designed experiments 1.3
p23-26 up to ex. 3, about simple random sampling 1.4
p30-35 about sampling methods) To
read for todayÕs hmk: 2.1
p63-66 and example 6 p68-69 about frequency, relative frequency, bar graphs: A
frequency distribution lists each
category of data and the number of occurrences for each category of data. The
relative frequency is the percent
of observations within a category and is found by dividing the category
frequency by the sum of all the frequencies in the table. A
bar graph categories labeled on
the horizontal axis and frequencies on the vertical axis, with bars extending
from the horizontal axis to the height that is the frequency and where bars
are usually not touching but are of the same width. A
side-by-side bar graph can be used
to compare data sets and should use relative frequencies to ensure that the
sets are being measured on the same scale, where bars being compared from the
same category usually have no space between them but space is still left
between different categories. Note
that different problems can ask for different things! You could be asked to
provide: a
frequency bar graph, a
side-by-side frequency bar graph, a
relative frequency bar graph, or a
side-by side relative frequency bar graph. Know
which one is being asked for and what each entails. HOMEWORK due
Thursday 2/7: (last
time I will type out the assignment problems) 1.1 p12 short
answer: #26
Is Òassessed value of a houseÓ a qualitative or quantitative variable? #28
Is Òstudent ID numberÓ a qualitative or quantitative variable? #32
Is Ònumber of sequoia trees in a acre of YosemiteÓ discrete or continuous? #34
Is ÒInternet connection speed in kilobytes per secondÓ discrete or
continuous? #36
Is ÒAir pressure in pounds per sq. inch in a tireÓ discrete or continuous? 1.5 p43
Consider the type of possible bias for each of the following: #14
The village of Oak Lawn wishes to conduct a study regarding the income level
of all households within the village. The manager selects 10 homes in the
southwest corner of the village and sends out an interviewer to the homes to
determine income. #16
Suppose you are conducting a survey regarding the sleeping habits of
students. From a list of registered students, you obtain a simple random
sample of 150 students. One survey question is Òhow much sleep do you get?Ó. #18
An ice cream chain is considering opening a new store in OÕFallon. Before
opening, the company would like to know the % of homes there that regularly
visit ice cream shops, so the researcher obtains a list of homes and randomly
selects 150 to send questionnaires to. Of those mailed out, 4 are returned. 2.1 p72 #20
A survey asked 770 adults who used the internet were asked about how often
they participated in online auctions. The responses were as follows: frequently 54 occasionally 123 rarely 131 never 462 a.
construct a relative frequency distribution (just the %Õs, not a graph!). b.
what proportion never participate? c.
construct a frequency bar graph. d.
construct a relative frequency bar graph. 2.1 p72 #22
A survey of U.S. adults in 2003 and 2007 asked ÒWhich of the following
describes how spam affects your life on the Internet?Ó Feeling
2003 2007 Big
problem
373 269 Just
annoying 850 761 No
problem
239 418 DonÕt
know
15 45 a/b.
Construct the relative frequency distributions for 2003 and 2007. c.
Construct a side-by-side relative frequency bar graph (for all of the
categories). d.
Compare each yearÕs feelings and make some conjectures about the reasons for
similarities and differences. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Th
01/31 |
Supplementary
notes, followed by homework: Linear
and exponential patterns (not in book): A
linear relationship y=mx+b is built by repeated addition. We add positive numbers for an increasing line
(positive slope) or negative numbers for a decreasing line (negative slope). An
exponential relationship y= b(a)x is built by repeated multiplication. The a is the amount
by which we multiply each time (ÒaÓ contains the rate of increase or decrease
since a=1+r or 1-r, where r is the rate). For decreasing exponential
relationships, a<1 and for increasing exponential relationships, a>1.
For decreasing linear relationships y=mx+b, m is negative and for increasing
ones, m is positive. As with lines, b is the y-coordinate of the y-intercept.
Intercepts are included in each of the following example tables, but could be
solved for if missing. The
following are some tables of data to illustrate what sets of linear and
exponential data look like and how their equations are written (verify the
equations by plugging in points):
is
a decreasing linear set of data because you are adding –3 each time, so y= –3x+12.
is
a increasing linear set of data because you are adding +7 each time, so
y=7x+20.
is
a increasing exponential set of data because you are multiplying by 1.5 each
time so y=
50(1.5)x. Written y= 50(1 + 0.5)x it shows the rate of increase is 0.5
or 50%.
is
a decreasing exponential set of data because you are multiplying by 0.9 each
time so y=
250(0.9)x. Written y= 250(1 –
0.10)x it shows the
rate of decrease is 0.10 or 10%. Exponential
equation of best fit for a scatterplot of seemingly exponential data: This exponential part is in the more expensive version of this
text, but should not be skipped, so I will supplement your version with notes
in class and here. If
you have a scatterplot of linear data, you saw in class last time that it was
relatively easy and accurate to estimate the line of best fit from a graph
and also find the best fit line using the equations from last time. However,
if you have a scatterplot of data that is best described by an exponential
curve, it is difficult to draw a good curve and you wouldnÕt know how to find
its equation because it does not have a constant slope (i.e, you could not
take two points and use the slope formula or point-slope form!). But
if you take the logarithm of each y value in the exponential data (leaving
the x values the same), that is, turn (x, y) into (x, logy) in the table, you
will have transformed it into linear data! Then use the equations for line of
best fit to find y = mx+b for the ÒloggedÓ data (x, logy) and ÒunlogÓ the
slope m and y-intercept b to find the ÒaÓ and ÒbÓ in y= b(a)x for
the original data. We
did an example of this process in class. Here is another example, but with
data that is not perfectly exponential as it was in class so we cannot write
the equation from the table values:
It
is difficult to estimate an exponential scatterplot relationship, but it can
be turned into a linear relationship by taking the logarithm of the y values
(graph it if you donÕt believe it –unfortunately, I cannot show the
graphing part here!).
Now
we can find the best fit line for this (x, logy) linear scatterplot by making
a summations table with and use the standard deviation calculations (Sxx,
etc.) for finding the best fit line.
avgx=8/3=2.67 and
avgy=7.09/3=2.36 Sxx
= 26 – (64/3)=4.67 Sxy
= 18.26 – [(8)(7.09)/3]= –0.65 slope
= –0.65/4.67
= –0.14 y-intercept
= 2.36 – (–0.14)(
2.67) =2.73 So
the best fit line for the logged data is y= –0.14x+2.73 To
find the best fit exponential for the original data, ÒunlogÓ the slope and
y-intercept of the line above: raise 10 to the power of each separately and
then write the equation for the exponential of best fit for the original
table data (x, y). a=10slope
=10-0.14 = 0.72 b=10y-intercept
= 102.73= 537.03 So
the best fit exponential for the original data is y= b(a)x =
537.03(0.72)x. (Check
your answer: does plugging x=1 into your best fit exponential give you
something close to the original table value of 398.11? It shouldnÕt be exact
because the original data was not perfectly exponential, but it should be in
the ballpark! Same for the other two points.). Brief
practice solving for variables in linear and exponential problems from
Algebra: Given
the equation of a line y = 5x + 7, If
x = 4 is given, we can solve for y: y = 5 (4) +7 = 27. If
y = 9 is given, we can solve for x. Since 9 = 5x + 7 subtract 7 from both
sides to get 2 = 5x. Then divide both sides by 5 to solve for x: 2/5 = x. Given
the equation of an exponential y= 12(5)x, If
x = 3, then we can solve for y: y = 12(5)3 = 12(125) = 1500. If
y = 24, then we can solve for x, but it involves logarithms to rescue x from
being an exponent: 24 = 12(5)x. Divide both sides by 12 to get 2 =
(5)x. If you take the logarithm of both sides of the equation, you
get log 2 = log (5)x. Properties of logs give you log 2 = x log
(5). So to solve for x, divide both sides by log 5 to get x = log 2 / log 5 ,
which by calculator is about 0.43. Homework (due Tuesday 02/05): Treat
the following table data as forming an exponential scatterplot (not perfectly
exponential, but best described by an exponential function):
a. Take the
original (x, y) values in the table and make a new table (x, log y). That
is, find log 25, log 40.20, etc. b. Find the
line of best fit for the values in the (x, logy) table using Sxx and Sxy. Hint:
you should get the following summations to plug in (find and check them for
yourself): x= 10.7 y=
9.66 xy=
23.33 x2
= 32.49 (For
y notice that you are not summing the original y values to get 619.35 —
you sum the logged y values to get the y summation of 9.66!): c. ÒUnlogÓ the
slope m and the y intercept b from the best fit line for the (x, logy) data
in part b to get the Òa and bÓ for the best fit exponential y= b(a)x
for the original (x, y) data using a=10slope and b=10y-intercept
.. Does it describes the data well? Compare with a graph of the
original data. d. Use the best
fit exponential equation from part c to estimate the value of y when x is
2.5. e. Use best fit
exponential equation from part c to estimate the value of x when y is 300. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
T
01/29 |
We worked on linear
scatterplots. Some supplementary notes follow since I wonÕt assume you have
the book until next week. Reading in the text: --Read 4.1 objective 1
ÒDraw and interpret scatter diagramsÓ --Read 4.2 example 1 and
objective 1 Òfind the least squares regression lineÉÓ and note that I am
using the alternate form of the equations in the footnotes. Supplementary notes/examples on lines of best fit: Given
several data points (x,y) you fill out the table below (the data points'
coordinates are the x and y values. The symbol means add them
). Example:
for the data points (2,7) and (4,8) we know that the slope of the line thru
them is 1/2
= 0.5 so that using point-slope form for the equation of a line: y
–7 = 0.5(x – 2) y
= 0.5x +6 This
is the equation of the line thru the above points, a line with slope 0.5 and
y-intercept 6. Now
let us use the equations for the line of best fit: Set
up a table using the data points with the following quantities and sum them
up:
(n=2
in this short ex. for the 2 data pts given) =6/2=3 and =15/2=7.5 Use
all of these numbers to plug into the formulas for the line of best fit: Sxx
= x2 – ((x)2/n) = 620 – (36/2) = 2 Sxy
= xy – ((x)( y) / n) = 46 – [(6)(15)/2]
= 46 – 45 = 1 Slope
of best line = Sxy/ Sxx = 1/2 = 0.5 Y-Intercept
of best line = – (slope)( ) = 7.5 – (0.5)(3)
=7.5 – 1.5 = 6 The
best fit line is then y=0.5x+6 (which matches the equation found at the
beginning exactly, because 2 points make a line, not a scatterplot
estimation! Example: for the data
pts (1,9), (2,8), (3,6), and (4,3) find the equation of the line of best fit:
Note
that n=4 is the number of data points =10/4=2.5 =26/4=6.5 Using
the formulas above, Sxx
= 30 – (100/4)=30 – 25=5 Sxy
= 55 – [(10)(26)/4]=55 – 65= –10 Slope
= –10/5= –2 Y-
intercept = 6.5 – (–2)(2.5) =6.5+5 =11.5 The
best fit line is then y= –2x+11.5 Homework (due Thursday 01/31): Treat
the following table data as forming a linear scatterplot and do as in the
in-class exercise:
a. Sketch the
points on a hand-drawn graph (just on binder paper), draw what you think is
the line of best fit. Estimate the y-values on your line for x = 35 and x =
50 and use these points to find the slope of your line. Write the equation of
your line using point-slope or slope-intercept form. b. Fill out a
summation table and find the line of best fit using the equations Sxx, Sxy,
etc. |