Not a comprehensive list
Sec 1.1
Probability- Models uncertainty and variability
Defining Probability:
Experiment- A situation where chance or uncertainty leads to results (aka outcomes)Outcome- Result of a single trial of an experimentEvent- One or more outcomes of an experimentSample Space (S)- All possible distinct outcomes in an experiment
Classical definition of probability:
Disadvantages of Classical definition
- Sample Space
Sneeds to be finite - Outcomes in
Sneed to be equally likely - Outcomes in S (or event A) may be difficult to count
Relative Frequency Definition
Probabilities are assigned on the basis of experimentation or historical data
The probability of an event is the (limiting) proportion (or fraction) of times the event occurs when the experiment is repeated a large number of times under the exact same conditions.
Disadvantages of Frequency Definition
Need an infinite # of experiments to get the correct value. Sometimes the recreation of the experiment under the same controlled conditions may be challenging
Subjective Probability Definition
The probability of an event is based on how confident the person making the statement is that the event will occur. Usually based on prior knowledge (belief) or available information
Disadvantages of Frequency Definition
- No mathematical model is used
- How do you determine who’s knowledge / judgement is superior
Sec 1.2
Sample space is set of distinct outcomes for an experiment/process so in a single trial one and only one of these outcomes can occur
Only one of these outcomes can occur
Sample space is not necessarily unique
confused so asked gpt
Here, “not necessarily unique” refers to the fact that the choice of sample space is not one-of-a-kind — there can be more than one valid way to represent it.
- Example: For the same die roll, we might define the sample space as:
S1={1,2,3,4,5,6}S_1 = {1, 2, 3, 4, 5, 6}S1={1,2,3,4,5,6}
or, alternatively,
S2={odd,even}S_2 = {\text{odd}, \text{even}}S2={odd,even}.
Both are legitimate sample spaces, just framed differently. Neither is “unique” in the sense of being the only correct version.
So “not necessarily unique” = different valid representations are possible.
Discrete sample space is one that consists of finite or countable infinite set of outcomes
In discrete sample spaces, we can talk about:
Simple Event (Outcome)- An event that contains only one pointCompound Event- An event made up of 2 or more simple events
Probability Laws
, discrete sample space
Probabilities for must satisfy 2 conditions
Probability P(A) of an event A is defined as
Odds
Term odds can be used to describe probabilities
odds in favor of event A occurring is given by:
Odds against the event A is the ratio
Sec 1.3
Addition & Multiplication Rule
Job 1 can be done in p ways
Job 2 can be done in q ways
We can do either job 1 or job 2 in p + q ways
OR implies addition
Similarly
AND implies multiplication
Sampling w/ and w/o replacement
With replacement means that every time an object is selected it’s put back into the pool
Without replacement means that every time an object is selected it is NOT put back
Permutations - Arrangement of objects where order matters
STAT 231
Introduction to Statistical Science
Population- Population is a collection of units, unit as individual person, place, or thing about which we can take measurementsProcess- Process is a collection of units, but those units are ‘produced’ over timeVariates- Studying specifics about units, eg: for dogs breed of dog, weight of dog, general health, etc.Continuous- Variates that can be measured to an infinite degree of accuracyDiscrete- Variates that can only take a finite or countable infinite number of valuesCategorical- Variates in which units fall into a non-numeric category such as hair color, health of dogs, etc.Ordinal- Categorical variates where an ordering is impliedComplex- More unusual variates such such open-ended responses to survey question, or imageSample Variance-
Sample Standard Deviation- Denoted by s and is just the square root of- Define the quantile (0 < p < 1)
- Let m = (n + 1)p
- then take the smallest value
- If is not an integer, but , then determine the closest integer s.t. and take
Interquartile Range- The difference between the upper and lower quartiles = q(0.75) - q(0.25)Negatively Skewed- Mean is less than the medianPositively Skewed- Mean is greater than the median- Symmetry = Skewness of 0
Sample Skewness-
- Numerator indicates sign for skewness coefficient, denom is always positive, more skewed the distribution is the higher the abs value of $g_1$ will be
Kurtosis- Measures the heaviness of the tails and peakedness of the distribution of datSample Kurtosis-
- Sample Kurtosis is always positive and doesn't have units
- Kurtosis > 3 = heavier tails, more peaked center
- Kurtosis < 3 = lighter to no tails and a more flat centre
-
Maximum Likelihood Estimate