AP Statistics
Summary: Maybe it will happen
Unit 1: Exploring One-Variable Data
- Core Idea: Stats starts with describing data — what it looks like, how it spreads, and where it centers.
- Center: mean (average), median (middle value)
- Spread: range, interquartile range (IQR = Q3 – Q1), standard deviation
- Shape: symmetric, skewed left/right, uniform, bimodal
- Outliers: values far from the norm; rule of thumb: < Q1 – 1.5×IQR or > Q3 + 1.5×IQR
- Visuals: histograms, dotplots, boxplots
- Z-score:
z = (x - mean) / standard deviation
→ how many standard deviations from the mean - Use context — stats is meaningless without interpretation
Unit 2: Exploring Two-Variable Data
- Core Idea: Stats gets interesting when you look at relationships between variables.
- Scatterplots: Show form (linear/nonlinear), direction (positive/negative), strength
- Correlation (r): Measures linear relationship;
-1 ≤ r ≤ 1
- Least Squares Regression Line (LSRL):
ŷ = a + bx
, where b = slope, a = y-intercept- Slope:
b = r * (sy / sx)
- Residual:
residual = actual - predicted = y - ŷ
- Coefficient of determination:
r²
→ percent of variation explained by model
- Correlation ≠ causation
Unit 3: Collecting Data
- Core Idea: Good data comes from good design — how you collect data shapes your conclusions.
- Types of studies:
- Observational (no control), Experimental (with control/treatment)
- Sampling methods:
- Simple Random Sample (SRS), Stratified, Cluster, Systematic
- Biases:
- Voluntary response, undercoverage, nonresponse, response bias
- Experimental design:
- Random assignment, control, replication, comparison
- Blocking: control for known confounding variables
- Inference: Random sampling → generalize to population. Random assignment → cause-and-effect.
Unit 4: Probability, Random Variables, and Probability Distributions
- Core Idea: Probability models randomness and lets us make predictions about long-run outcomes.
- Probability rules:
- 0 ≤ P(A) ≤ 1
P(A or B) = P(A) + P(B) - P(A and B)
- Complement rule:
P(not A) = 1 - P(A)
- Conditional probability:
P(A | B) = P(A and B) / P(B)
- Independence: If
P(A | B) = P(A)
, A and B are independent - Random Variables:
- Discrete: countable values
- Continuous: any value in an interval
- Expected value (mean):
E(X) = Σ [x * P(x)]
- Standard deviation of X:
σ = √Σ [(x - μ)² * P(x)]
Unit 5: Sampling Distributions
- Core Idea: A statistic (like a sample mean) varies from sample to sample — this variability is predictable.
- Sampling distribution: Distribution of a statistic from all possible samples
- Central Limit Theorem: If n is large, sampling distribution of sample mean is approximately normal
- For proportions (p̂):
- Mean:
μ = p
- Standard deviation:
σ = √[p(1-p)/n]
- Mean:
- For means (x̄):
- Mean:
μ = μ
- Standard deviation:
σ = σ / √n
- Mean:
- Conditions for normality: Random, 10%, Large Counts (np ≥ 10, n(1-p) ≥ 10) or n ≥ 30
Unit 6: Inference for Categorical Data: Proportions
- Core Idea: Use sample proportions to estimate or test claims about population proportions.
- Confidence interval for p:
p̂ ± z* √[p̂(1 - p̂)/n]
- Significance test for p:
- Null:
H₀: p = p₀
, Alternative:Hₐ: p ≠ p₀
,<
or>
- Test statistic:
z = (p̂ - p₀) / √[p₀(1 - p₀)/n]
- Get p-value from z, compare to α
- Null:
- Interpret confidence level: “In repeated samples, about 95% of intervals will contain the true proportion.”
Unit 7: Inference for Quantitative Data: Means
- Core Idea: Same logic as with proportions — just with means and t-distributions instead of z.
- Confidence interval for μ:
x̄ ± t* (s / √n)
- Significance test for μ:
t = (x̄ - μ₀) / (s / √n)
- Use t-distribution with
df = n - 1
- Still requires: random sample, 10% condition, normal population or large n
Unit 8: Inference for Categorical Data: Chi-Square
- Core Idea: Use chi-square tests when you have counts and want to test for relationships in categories.
- Types of tests:
- Goodness-of-Fit: one variable, compare to expected distribution
- Homogeneity: multiple populations, same variable
- Independence: one population, two variables
- Test statistic:
χ² = Σ [(observed - expected)² / expected]
- Degrees of freedom:
- Goodness-of-fit:
df = categories - 1
- Two-way tables:
df = (rows - 1)(columns - 1)
- Goodness-of-fit:
- Conditions: Random, Expected counts ≥ 5
Unit 9: Inference for Quantitative Data: Slopes
- Core Idea: Use inference to test if a relationship between variables (in regression) is real or just sample noise.
- Model:
ŷ = a + bx
- Standard error of slope:
SE_b
(given) - Test statistic:
t = (b - 0) / SE_b
- Degrees of freedom:
df = n - 2
- Confidence interval for slope:
b ± t* × SE_b
- Interpret in context: Does the data support a real linear relationship between x and y?