Statistics: Methods to collect, analyze, present, interpret data, and make decisions.
Two Types of Statistics:
Note: Probability lies in-between descriptive and inferential statistics.
Course Progression: Descriptive Statistics \to Probability \to Inferential Statistics
Population: The entire group of individuals that we want information about.
Sample: The subset of the population we examine to gather information.
Random Sample: Sample where each element in the population has a chance of being selected.
Sample Size (n): The number of observations in a sample.
Variable: Characteristic of a person or thing.
Observation (x_i): Value of a variable for an element.
Data Set (X): Collection of observations on \ge 1 variables.
Example: Finding the average number of plushies students have in their bedrooms
- Population: All students at my college.
- Sample: The students walking past the entrance to the library who I polled as they passed.
- Sample Size: n = The number of students I asked.
- Variable: # of plushies in bedroom.
- Observation: “x_1 = 2, x_2 = 10, etc.”
Two Types of Variables:
Important: Variable type is determined by how we use it.
- e.g.,
- Numerical Age: 19, 42, 69, etc.
- Categorical Age: Young, Average, Old
Note: Different types require different analyses.
Simple R.S. Select n objects at random from the population.
Stratified R.S. Divide population into strata1, then select simple random sample for each stratum.
Cluster R.S. Divide population into clusters2, select clusters at random, then select simple random sample for each cluster.
Note: Clusters v.s. Strata (some v.s. all)
- Clusters are representative of their population.
- e.g., You can describe the population using any cluster.
- Strata must be used together to describe their population.
- e.g., If you have male and female strata, you can’t just use the male stratum to describe the population.
Systematic R.S. Select every kth element in the population.
Multi-Stage R.S.: A mixture or combination of at least two methods above (except simple R.S.)
Note: In this course, we’ll assume the samples we’re given are good samples and gotten through simple R.S., unless told otherwise.