Statistics Review Problems -- Stat 1040 -- Dr. McGahagan
Chapter 17 -- The Expected Value and the Standard Error (pp. 304-306)
* 1.One hundred draws with replacement from the box [ 1 6 7 9 9 10 ]. Find the sum.
a. May be as small as 100 (all ones) or as large at 1000 (all tens).
Probability of either extreme = (pow 1/6 100)
b. The box has mean of 7 and SD of 3; hence the sum of 100 draws will have an EV of 700 and SE = 30.
[square root law: standard error of N draws = (sqrt N) * SD of box]
This means that the interval from 650 to 750 is the EV +/ 50/30 SD = EV +/- 1.67 SD
Using the tables for the normal distribution, the chance of the total falling within that interval
is about 90 percent -- 90.11 if you use 1.65 and 91.09 if you use 1.70 as the z-score, and
90.4419 if you use normal.area(-50/30, 50/30)
To simulate this draw:
box ← c(1, 6, 7, 9, 9, 10) sets up the box model.
stats(box) gets the key statistics; all you want are mean(box) and sd(box)
sums ← c( ) defines a placeholder for the sums
for (i in 1:1000) sums[i] ← sum (draw( 100 box))
stats(sums) Note mean and SD of the 100 draws.
Results will vary: one trial gave mean(sums) = 700.097 and sd(sums) = 29.7893.
To drive home the point that the distributions of the BOX and of STATISTICS ON DRAWS FROM THE BOX
are quite different, draw histograms:
Histogram(box) Histogram(sums)
Note how much more "normal" the histogram of the sums looks than the histogram of the box.
This is an important step to seeing the point of the Central Limit Theorem, which we will soon meet.
Chapter 17 (continued)
* 2.Gambler plays roulette 100 times, betting $ 1 on a column of 12 (of the 38) numbers; a $ 1 bet will net you
$2 if you win; of course, if you lose, the house takes the $ 1.
Model with a box of 38 tickets, of which 12 are labeled + 3 and 26 labeled -1.
The mean of the box will be 2 * 12 / 38 + (- 1) * 26 / 38 = 24 / 38 – 26 /38 = - 2 / 38 = - 0.05263;
the average loss is a bit over a nickel on a dollar bet.
The SD can be found by the shortcut formula: (big – small) * sqrt [ (fraction big) * (fraction small) ]
= [2 - (- 1)] sqrt [ 12/38 * 26/38] = 3 * sqrt (312 / 1444) = 3 * 0.4648 = 1.3945
For the gambler's winnings (the sum of the individual wins or losses, the text formulas give:
EV = 100 * -0.0526 = - 5.26
SE = (sqrt 100) * 1.3945 = 13.95
So expected result is a loss of 5.26 cents, “give or take” 13.95. Note that it will not be at all unusual for the
gambler to be ahead after 100 plays. The breakeven point is only 5.26 / 13.95 = 0.3770 standard units above the
mean, and the tail area for 0.38 is about 35 percent. The gambler wins 35 percent of the time – enough to keep
him coming back.
Confirm the results with:
box <- c(rep(12 , 2), rep(26, -1))
stats(box) should confirm the shortcut formula.
payoff <- c( )
for ( i in 1:1000) payoff[i] <- sum(draw(100, box))
stats(payoff)
The simulation gives (on one typical run) :
Mean of payoff = - 5.29; SE of payoff = 13.80. Not exact, but close.
The histograms Hist(box, breaks= -2:4 and Hist(payoff) will again show sharply contrasting distributions.
The expected number of wins in 100 plays will be 100 * 12/38 = 1200 / 38 = 31.5789;
the SE of the number of wins will be simply (sqrt 100) * sqrt [(12/38) * (26/38)] = 4.6483
* 3.Shortcut formula for SD of box: SD = (big – little) * sqrt (fraction big * fraction little)
a.Box = [ 1, -2, -2] SD = ( 1 - (-2)) * sqrt ( 1/3 * 2/3) = 3 * sqrt (2/9) = 1.4142
Note that -2 is the little value – signs count !
b.Box = [ 15, 15, 16 ] SD = (16 – 15) * sqrt (2/3 * 1/3) = sqrt (2/9) = .4714
c.Box = [ -1, -1, -1, 1] SD = (1 - (- 1) * sqrt (1/4 * 3/4) = 2 * (sqrt 3/16) = 0.8660
d.Box = [0, 0, 0, 1] SD = (1 – 0 ) * sqrt (¼ * ¾) = (sqrt 3/16) = 0.4330
e. Box = [0, 0, 2] SD = (2 – 0) * sqrt (2/3 * 1/3) = 2 (sqrt 2/9) = .9428
* 4.Roll a die 180 times and count aces. Box = [1,0,0,0,0,0] has mean = 1/6 or 0.1667 and
SD = (sqrt 1/6 * 5/6) = .3727
After 180 rolls, we would expect the count of aces to be 180 / 6 = 30 and
SE of sum = (sqrt 180) * .3727 = 13.4164 * .3727 = 5.00.
Hence the range from 15 to 45 is the EV +/- 3 * SE of sum; from the normal table (or Rule of 1,2,3 ) we
expect almost all (more than 99 percent) of the group to get sums in that range.
Chapter 17 (continued)
* 5.Guess the total number of spots on a die thrown N times, with a one dollar penalty for each spot the
guess is off. Would you prefer 50 throws or 100 throws? Box this time will be [1,2,3,4,5,6] (contrast with last
problem), which has mean 3.5 and SD = 1.7078. EV of sum for 50 throws is 175 and EV of sum for 100 throws
is 350, and these are your best guesses.
But the SE for the sum of 50 throws will be (sqrt 50) * 1.7078 = 12.0761
and the SE for the sum of 100 throws will be (sqrt 100) * 1.7078 = 17.078. You are likely to make more of a
mistake with 100 throws.
* 6.Consider 100 draws with replacement from the box [1, 1, 2, 3] We are given the results of one
experiment:
45 ones, 23 twos and 32 threes.
Since the mean of the box is 1.75 and its SD is 0.8292, we would have expected a total of 175 in 100 draws,
and would expect a SE of 8.292. We would also expect 25 twos and threes and 50 ones.
We actually got a sum of 1* 45 + 2 * 23 + 3 * 32 = 45 + 46 + 96 = 187. This means there was a chance error
of 12 (187 – 175) in the sum of the draws. Given the standard error of 8, this is 1.5 standard errors above the
expected value, and should be expected to happen 6.68 percent of the time [1.0 – (normal-cdf 1.5) ]
The standard error for the number of ones would require a box model [ 1, 1, 0, 0], with mean 0.5 and
SD = 0.5 . In 100 draws, we expect 50 ones, and hence the chance error for the number of ones is 50 – 45 = 5.
The SE for the number of ones in 100 draws = sqrt (100) * 0.5 = 5.
* 7. Consider 100 draws with replacement from the box [1,2,3,4,5,6]
a. Sum of draws = 321, so average is 3.21 [note that 3.5 is expected value]
b. Average of draws = 3.78, and sum will be 378.
c. Chance that AVERAGE is between 3 and 4 = Chance that SUM is between 300 and 400.
The expected value of the sum is 350, and SE of the sum = (sqrt 100) * SD box = 17.0783
The range from 300 to 400 is EV +/ 50 / 17.0783 SE units = EV +/- 2.92 SE units.
Normal table indicates the chance is over 99 percent; normal.area (-2.92, 2.92) gives 0.9965.
8.EV and SE of DIFFERENCE of number of heads and tails in 100 tosses
Appropriate box model is [ -1, 1]. If you draw one head and one tail, the difference will be zero, as it should.
A box model must be numeric, so [heads, tails] is not possible; counting heads as 1 and tails as 0 would mean
that the sum of numbers is positive (with EV 50 in 100 draws); counting heads as 0 and tails as -1 gets you an
EV of -50. Finally, the box [-1, 0, 1] would only be appropriate with a very thick coin which could land on its
side.
With the box [-1, 1] the mean is 0 and the SD 1. Hence EV of difference in 100 draws is 100 * 0 = 0,
and the SE of the difference is (sqrt 100) * 1 = 10.
Chapter 17 (continued)
* 9.Bet on column or on number. Box for column bet as in problem 2: (payoff is $ 2 for win and $-1 for loss)
[12 twos and 36 minus ones]
mean of the box is 2* 12/38 – 1 * 26 /38 = -2/38,
SD = (big – little) * (sqrt 12/38 * 26/38) = 3 * .4648 = 1.3945
Box for single number bet = [ 35, -1 ... -1] with 37 minus ones; hence mean = -2/38 = -0.0526, and
SD = (big – little ) * (sqrt 1/38 * 37/38) = 36 * 0.1601 = 5.7626
Same EV (loss of 52.60), but betting on a single number has much greater variance for an individual bet or for
the sum of 1000 bets. Hence you have a much better chance of winning something by betting on a single
number.
SE for sum of 1000 column bets = (sqrt 1000) * 1.3945 = 44; SE for 1000 number bets = 182.
P(Sum > 0 for 1000 column bets) = P (Z > 52.60 / 44) = P (Z > 1.20) = 15 percent.
P(Sum > 0 for 1000 number bets) = P (Z > 52.60 / 182) = P (Z > .289) = 38 percent.
You have a greater chance of either winning or losing more than $ 100 with the number bet: A loss of $100 is
$ 47.40 below the EV so: P (sum < 100) = P (Z < - 47.40 / 44) = P (Z < - 1.077) = 14 percent for column bet;
P (Z < -47.40 / 182) = P (Z < -0.26) = about 40 percent for the single number bet.
* 10.Quantiles and box models. The key to this problem is that we are given that EV +/- 50 = 75 percent.
Hence we find 75 percent (or 74.99, the closest value) in the area column, and that 50 will be
1.15 SE of the sum of draws. A single SE must be 50 / 1.15 = 43.48, so we can calculate Z-scores.
For twice the number of draws, the EV will in fact be 800, but the SE will scale up by a factor of
the square root of 2, not 2. So the SE is not 86.96 but 61.49 for twice the number of draws, and the Z-score
of 100 will be not 1.15 but 100 / 61.49 = 1.626. the probability of being within +/- 1.65 standard errors is
not 75 percent, but 95 percent.
11.Sum of positive numbers: the box to address this problem should ignore the negative numbers, and hence
should be [0, 0, 0, 1, 3]. It has a mean of 4/5, and a SD of 1.1662. So the sum of 100 draws from this box has an
EV of 80 and a SD of 11.662
12.Box model and actual outcomes.
The box model is [1, 2, 3, 4, 5, 6, 7], which has mean of 4 and SD of 2.
Hence the sum of 100 draws has an expected value of 400 and a SE of 20.
a. If the sum of the draws is actually 431 and the EV is 400 the chance error is 31.
b. If the sum of the draws is actually 386, and the EV is 400 the chance error is -14.
c. If the sum of the draws is actually 417, the chance error is 17.
Note that the EV and the SE will not change just because the actual sum differs from the EV.