COMM 291

Midterm Review Package

Prepared by Angelica Cabrera

1. INTRODUCTION TO DATA AND VARIABLES Categorical vs. Quantitative Data Categorical Limited number – distinct categories No Quantitative Large number Yes

Possible values for variable Measurement units?

EXAMPLE. Which variables are quantitative and which are categorical? Employee # Age (years) Annual Income (in Performance 1,000s of dollars) Rating (1-5 scale) 5543 48 50 – 100 4.5 2431 34 20 – 49 3.9 7281 31 0 – 19 3.4

Job Type Management Clerical Maintenance

What kind of sampling designs are the following? a) Select ten individuals from each sport team at UBC (Ex. hockey, basketball, rowing, etc.) b) Randomly select 50 athletes using their student numbers c) Randomly select a faculty and survey all of athletes in that faculty d) Select every third name in alphabetized list of all varsity sports athletes at UBC, starting with a random name 3. DISPLAYING AND DESCRIBING CATEGORICAL DATA Best represented in a __________ graph

Contingency Tables Counts can be converted into: _______ Percentages

__________ Percentages

EXAMPLE. A survey collected teenagers’ preferences for soft drinks. Soft Drink Male Female Total 142 Pepsi 55 87 249 Sprite 99 150 309 Coke 196 113 350 350 700 Total a) What percentage of teenagers preferred Pepsi? b) What percentage of teenagers who preferred Coke were males? c) Of teenagers who are females, what percentage preferred Sprite? Simpson’s Paradox: an association that holds for all of several groups can _________ direction when the data are combined to form a single group. 4. DISPLAYING AND DESCRIBING QUANTITATIVE DATA Best represented in a _____________, and an alternative is a ____________________ plot Measures of Spread Range = Maximum value – minimum value Standard deviation (SD): “typical” distance from the data value to the mean Variance = (SD)2 Percentile: value below which % of data values fall IQR = Q3 – Q1

Measures of Centre 1. Mean: average of data 2. Median: middle of data 3. Mode: most frequent data value

COMM 291 Review Package prepared by Angelica Cabrera

2

Histogram Shapes Symmetric Left: Skewed Right:

Mean ___median

Mean ___median

Mean ___median

Best Measures of Centre and Spread For symmetric distributions, use ___________ and ___________ For skewed distributions, use ___________ and ___________

Box-and-Whisker Plots How to draw a box-and-whisker plot 1. Plot points given 2. Draw IQR 3. Find inner fences Lower inner fence = Q1 – 1.5 (IQR) Outer inner fence = Q3 + 1.5 (IQR) 4. Draw whiskers (last values within the inner fences) 5. Draw outliers (values outside inner fences)

EXAMPLE 2. Below is a five-number summary for hourly wages for managers at AEKI, a furniture store. Min Q1 Median Q3 Max 20.94 37.64 44.77 49.24 67.11 a) This distribution is skewed to the:

b) What is the IQR?

c) What is the lower inner fence?

d) What is the upper inner fence?

e) Where do the outliers lie? f) There was an error and the lowest hourly wage for sales managers was $18.15 instead of $20.94. How would this affect: The mean? __________ The median? __________ The range? __________ The IQR? __________

COMM 291 Review Package prepared by Angelica Cabrera

3

5. SCATTERPLOTS, CORRELATION, AND LINEAR REGRESSION Correlation (r): how strong the linear clustering is around a line Only for _______________ data with a ____________ pattern -1 < r < +1 NO units Correlation of X and Y = Correlation...

