AP Statistics

Summary: Maybe it will happen

Unit 1: Exploring One-Variable Data

  • Core Idea: Stats starts with describing data — what it looks like, how it spreads, and where it centers.
  • Center: mean (average), median (middle value)
  • Spread: range, interquartile range (IQR = Q3 – Q1), standard deviation
  • Shape: symmetric, skewed left/right, uniform, bimodal
  • Outliers: values far from the norm; rule of thumb: < Q1 – 1.5×IQR or > Q3 + 1.5×IQR
  • Visuals: histograms, dotplots, boxplots
  • Z-score: z = (x - mean) / standard deviation → how many standard deviations from the mean
  • Use context — stats is meaningless without interpretation

Unit 2: Exploring Two-Variable Data

  • Core Idea: Stats gets interesting when you look at relationships between variables.
  • Scatterplots: Show form (linear/nonlinear), direction (positive/negative), strength
  • Correlation (r): Measures linear relationship; -1 ≤ r ≤ 1
  • Least Squares Regression Line (LSRL):
    • ŷ = a + bx, where b = slope, a = y-intercept
    • Slope: b = r * (sy / sx)
    • Residual: residual = actual - predicted = y - ŷ
    • Coefficient of determination: → percent of variation explained by model
  • Correlation ≠ causation

Unit 3: Collecting Data

  • Core Idea: Good data comes from good design — how you collect data shapes your conclusions.
  • Types of studies:
    • Observational (no control), Experimental (with control/treatment)
  • Sampling methods:
    • Simple Random Sample (SRS), Stratified, Cluster, Systematic
  • Biases:
    • Voluntary response, undercoverage, nonresponse, response bias
  • Experimental design:
    • Random assignment, control, replication, comparison
    • Blocking: control for known confounding variables
  • Inference: Random sampling → generalize to population. Random assignment → cause-and-effect.

Unit 4: Probability, Random Variables, and Probability Distributions

  • Core Idea: Probability models randomness and lets us make predictions about long-run outcomes.
  • Probability rules:
    • 0 ≤ P(A) ≤ 1
    • P(A or B) = P(A) + P(B) - P(A and B)
    • Complement rule: P(not A) = 1 - P(A)
  • Conditional probability: P(A | B) = P(A and B) / P(B)
  • Independence: If P(A | B) = P(A), A and B are independent
  • Random Variables:
    • Discrete: countable values
    • Continuous: any value in an interval
  • Expected value (mean): E(X) = Σ [x * P(x)]
  • Standard deviation of X: σ = √Σ [(x - μ)² * P(x)]

Unit 5: Sampling Distributions

  • Core Idea: A statistic (like a sample mean) varies from sample to sample — this variability is predictable.
  • Sampling distribution: Distribution of a statistic from all possible samples
  • Central Limit Theorem: If n is large, sampling distribution of sample mean is approximately normal
  • For proportions (p̂):
    • Mean: μ = p
    • Standard deviation: σ = √[p(1-p)/n]
  • For means (x̄):
    • Mean: μ = μ
    • Standard deviation: σ = σ / √n
  • Conditions for normality: Random, 10%, Large Counts (np ≥ 10, n(1-p) ≥ 10) or n ≥ 30

Unit 6: Inference for Categorical Data: Proportions

  • Core Idea: Use sample proportions to estimate or test claims about population proportions.
  • Confidence interval for p:
    • p̂ ± z* √[p̂(1 - p̂)/n]
  • Significance test for p:
    • Null: H₀: p = p₀, Alternative: Hₐ: p ≠ p₀, < or >
    • Test statistic: z = (p̂ - p₀) / √[p₀(1 - p₀)/n]
    • Get p-value from z, compare to α
  • Interpret confidence level: “In repeated samples, about 95% of intervals will contain the true proportion.”

Unit 7: Inference for Quantitative Data: Means

  • Core Idea: Same logic as with proportions — just with means and t-distributions instead of z.
  • Confidence interval for μ:
    • x̄ ± t* (s / √n)
  • Significance test for μ:
    • t = (x̄ - μ₀) / (s / √n)
    • Use t-distribution with df = n - 1
  • Still requires: random sample, 10% condition, normal population or large n

Unit 8: Inference for Categorical Data: Chi-Square

  • Core Idea: Use chi-square tests when you have counts and want to test for relationships in categories.
  • Types of tests:
    • Goodness-of-Fit: one variable, compare to expected distribution
    • Homogeneity: multiple populations, same variable
    • Independence: one population, two variables
  • Test statistic:
    • χ² = Σ [(observed - expected)² / expected]
  • Degrees of freedom:
    • Goodness-of-fit: df = categories - 1
    • Two-way tables: df = (rows - 1)(columns - 1)
  • Conditions: Random, Expected counts ≥ 5

Unit 9: Inference for Quantitative Data: Slopes

  • Core Idea: Use inference to test if a relationship between variables (in regression) is real or just sample noise.
  • Model: ŷ = a + bx
  • Standard error of slope: SE_b (given)
  • Test statistic:
    • t = (b - 0) / SE_b
  • Degrees of freedom: df = n - 2
  • Confidence interval for slope:
    • b ± t* × SE_b
  • Interpret in context: Does the data support a real linear relationship between x and y?