Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

The Use of Standard Deviation instead of Variance in Statistics, Exams of Data Analysis & Statistical Methods

The reasons why the standard deviation is preferred over the variance when describing a sample and performing calculations. It also includes examples and exercises to illustrate the concepts of z-scores and the five-number summary. The document also touches upon the importance of normal quantile plots in determining the normality of a dataset.

Typology: Exams

2019/2020
On special offer
20 Points
Discount

Limited-time offer


Uploaded on 11/25/2020

koofers-user-1rn
koofers-user-1rn 🇺🇸

10 documents

1 / 6

Related documents


Partial preview of the text

Download The Use of Standard Deviation instead of Variance in Statistics and more Exams Data Analysis & Statistical Methods in PDF only on Docsity! Lab# _______ Name: _____________ Go to Sec: _______________ Reg Section: ________ STAT303 Sec 508-510 Spring 2006 Exam #1 1. Why do we use s, the standard deviation, rather than s2, the variance, when describing a sample and doing calculations? You CANNOT calculate a standard deviation without having first calculated the variance. The standard deviation is merely the positive square root of the variance. A. It’s easier to calculate the standard deviation. NO! see above *B. It’s in the same units as the mean, µ. Yes, the variance, since it’s a squared quantity, has squared units. C. You can’t determine the variance without knowing the standard deviation. This is backwords, see above. D. It doesn’t matter which one you use since they are the same quantity. NO! The standard deviation is the positive square root of the variance. E. Exactly two of the above are correct. Obviously, not. 2. It is known that the mean price for a carton of milk is $1.00 with a standard deviation,s, of $0.20. Also, the mean price per pound for beef is $2.50 with a standard deviation of $0.60. If a local store, X, sells milk for $0.85 per carton and beef for $2.00 per pound, which of the following is true for the store? You need to use z-scores to compare unlike quantities  zmilk = (0.85 −1.00)/0.20 = −0.75, zbeef = (2.00−2.50)/0.60 = −0.83. Since beef has a smaller z-score, it is relatively smaller (less than its mean), therefore, cheaper. A. Beef is relatively cheaper since it’s $0.50 below the average vs. $0.15 below the average for milk. We must look at z-scores, how many standard deviations below (or above) the mean, not just how far from the mean. B. Milk is relatively cheaper since it’s only $0.85 where beef is $2.00 per pound. Again, we must look at z-scores. We can’t compare actually dollars. C. You can’t compare the two since the average prices are different. Yes, you can if you look at z-scores. *D. Beef is relatively cheaper since it’s 0.83s below the average vs. 0.75s below the average for milk. This is just the definition of the z-scores, so it’s correct. E. Milk is relatively cheaper since it’s 0.75s below the average vs. 0.83s below the average for beef. This doesn’t make sense. Freq Pct Cum Pct 65.00 1 1.0 1.0**** 72.00 2 2.0 3.0 73.00 3 3.0 6.0 74.00 6 6.0 12.0 75.00 3 3.0 15.0 76.00 5 5.0 20.0 77.00 7 7.0 27.0**** 78.00 15 15.0 42.0 79.00 11 11.0 53.0**** 80.00 11 11.0 64.0 81.00 6 6.0 70.0 82.00 11 11.0 81.0**** 83.00 3 3.0 84.0 84.00 7 7.0 91.0 85.00 2 2.0 93.0 86.00 3 3.0 96.0 65 60 55 50 45 40 35 Lab# _______ Name: _____________ Go to Sec: _______________ Reg Section: ________ 87.00 2 2.0 98.0 89.00 2 2.0 100.0**** Total 100.0 100.0 The Cum Pct column is giving you percentages to the left of the ‘percentiles’ in the first column. Remember that a percentile is a number with that percentage of the observations to the left of it. The 5 Number Summary: the minimum is the 1st (or 0th) percentile, Q1 is the 25th, x%is the 50th, Q3 is the 75th and the maximum is the 100th. The other thing you need to remember is that you must EXCEED these percentages. ONE of the 81’s is Q3. Not pick the closest one. 3. Which is the correct list of the 5 Number Summary for this table? A. 1, 25, 50, 75, 100 B. 65, 75, 79, 84, 89 C. 65, 77, 79, 81, 89 *D. 65, 77, 79, 82, 89 E. 65, 71, 77, 83, 89 4. Which of the following indicate that the data is skewed left? A. The mode (tallest bin) of the histogram is on the right, and the other bins get continually shorter as you go left. Yes, the tail (short bins) is on the skewed side. B. The boxplot has the median, x%, closer to Q3 and the maximum than to Q1 and the minimum. Yes, the closer the lines (say Q3 to the maximum), the taller the bins. C. The mean, x , is greater than the median, x%. No, this is backwards. The mean is on the skewed side. D. All of the above indicate left skewness. *E. Exactly two of the above (excluding D.) 5. Which of the following is/are true about the boxplots(5)? The first and second boxplots are normal, the 3rd is uniform and the 4th is slightly skewed to the right. A. All of the boxplots are normal. B. Only number 2 is normal since it has outliers on both ends (therefore symmetric). C. Only number 3 is normal. *D. Numbers 1 and 2 are normal. E. You can’t determine whether they are normal or not with boxplots. We need normal quantile plots instead. Although normal quantile plots are the BEST way to determine whether a dataset is normal or not, you can decide by looking at a boxplot. | Hair MomHair| Black Blond Brown Red |Total -------+-----------------------+----- Black | 1 0 12 0 | 13 Blonde | 0 7 10 0 | 17 Brown | 1 13 40 1 | 55 Gray | 0 0 3 0 | 3 Red | 0 0 3 0 | 3 White | 1 0 0 | 1 -------+-----------------------+----- Total | 2 21 68 1 | 92 6. The Two-way table above is showing the relationship between a child’s hair color and their mom’s. Which iof the following is/are true? 70.0060.0050.0040.0030.00 20 15 10 5 0 F re q u e n c y Lab# _______ Name: _____________ Go to Sec: _______________ Reg Section: ________ Yes, it is since the points fall along the line without any real deviations off of it. 14. For a certain dataset, you are told that the standard deviation, s = 0. What else can you say about the dataset? If the standard deviation is 0, then there is no deviation in the data  it is a constant, all the points have the same value. A. the mean, x = 0 The mean could be any value! B. the median, x%= 0 The median could be any value! *C. the IQR = 0 The IQR is also a measure of spread, so like the standard deviation, it would also be 0. D. all of the above E. none of the above 15. Suppose we stand on the corner of Texas Avenue and University Drive and count the number of cars that pass through the intersection during a green light on Texas. What type of variable would this be? A. a numerical continuous since the length of a green light could vary The NUMBER of cars cannot be continuous since we can’t have partial cars. *B. a numerical discrete since the number of cars would vary Yes, the data is only whole numbers. C. a categorical since the cars are different types/makes This is the wrong variable, although make is a categorical variable. D. You can’t tell without the data. E. Inaccurate since so many people run reds lights! 16. Given a least squares line of ŷ = 14-0.3x, what is the residual for the point (2, 9)? Residual, ˆ ˆe y y  . To find ŷ = 14 − 0.3(2) = 13.4. ê = 9 − 13.4 = −4.4 *A. -4.4 B. -2.3 C. 11.3 D. 13.4 E. 2.3 17. Suppose we want to study the quality of education in the Texas public schools. We want to include every school district, but it’s too costly to visit every school within each district, so we only take a random sample of 3 schools in every district and gather data from every teacher/class within that school. What type of sample did we get? The state is divided into strata by school districts because we want to make sure we have data from each of them. The schools within a district are the clusters since we only take a few of them, but we get each class within the school (the entire cluster). Therefore, this is a multistage sample. *A. a multi-stage sample since we had two levels of sampling B. a cluster sample since we took everyone within a school C. a stratified sample since we gather data from every school district D. a simple random sample since we didn’t know ahead of time which classes would be in the sample E. a biased sample since we didn’t look at all of the schools in Texas 18. What can you say about this histogram? There are 100 observations. This is a histogram of a uniform dataset (remember our only choices are uniform, normal, skewed left or right). A. The median, x%, and mean, x , are about 50. A uniform distribution is symmetric, so the mean and 50.0045.0040.0035.0030.0025.0020.00 s1 30 25 20 15 10 5 0 F re q u e n c y n = 100 Lab# _______ Name: _____________ Go to Sec: _______________ Reg Section: ________ median are about the same value. B. The IQR is about 15 (less than 20). Q1 is between 40 and 45, Q3 is between 55 and 60, so the IQR = Q3 − Q1 < 60 − 40. C. It is mostly likely a normal distribution. No, there is no peak and no tails (or the tails are too short). D. All of the above are true. *E. Only two of the above are true. 19. What percent of the observations in the histogram above are 25 or more? You have to sum up the frequencies which are the heights of the bins then divide by 100 to get the percent. It’s easier to sum the 2 that are less than 25 and subtract from 1  5 + 9 = 14. So, 100% − 14% = 86 %. So it’s actually a little more than 85% that is 25 or more. A. 75% *B. almost 85% C. about 23% D. not quite 15% E. Percents can’t be determined here since it’s a frequency histogram. 20. Suppose you get a z-score = 1.2 on this exam. Nearly everyone missed one particular question, so I decide to give everyone credit, i.e., I give 5 points to everyone who missed it. What SHOULD you think about this if you got the problem right originally? First of all, this is NOT a shift change since only the people who missed the question got 5 points, not everyone, especially not YOU. Since this will increase the mean but not your exam score, you are now closer to the mean than you were. This makes your z-score smaller than it was. A. It doesn’t matter since your z -score would stay the same. *B. You get gypped since your z -score would decrease. C. You benefit since your z -score would increase. D. Your z -score would increase, but so would everyone else’s. E. Your z-score would decrease, but so would everyone else’s. 1B,2D,3D,4E,5D,6B,7B,8B,9A,10E,11D,12E,13C,14C,15B,16A,17A,18E,19B,20B
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved