105 cheat sheet 1
wk 1 - 4
Machine Learning Tasks
1) Regression
Predicts continuous values based on past scores
2) Classification
Predicts categories
Binary
Multiple Class
3) Clustering
Groups data without labels
Counting Methods
1) Repetition Allowed
2) No repetition
3) At least one/more repeated digits
Set Theory + Probability
Addition Rule/Inclusion-Exclusion Principle
For the inclusion-exclusion principle, the goal is to add all portions (in a venn diagram) and minus all intersections.
However, if events A and B are mutually exclusive, then Addition Rule will not apply
Complement Rule
Conditional Probability
Given event A, what is the probability of B
Bayes Theorem
Reverse of conditional probability
Given event B, what is the chance event B was caused by event A
Independence
When occurrence of one event does not affect probability of other event occurring.
If two events are independent:
As long as one is true, the rest is true
Joint Probability
The probability of both event A and event B occuring is
Marginal Probability
Sum over joint probabilities
Law of Total Probability
The events form a partition of the sample space.
"The total probability of A is the sum of the probabilities of A happening in each scenario, weighted by how likely each scenario is."
Combination + Permutation
Combination - Order does not matter
There may come times where you need to answer via cases. This is when:
1) There are multiple constraints (at least 2 boys and 3 girls)
2) There are multiple valid possibilities ( 1 team with 2 people, 2 teams with 1 person)
3) When the problem asks for exact numbers/items from different groups
! Have you checked inverse/duplicate instances?
Special Chooses
1) Select all
2) Select none
3) Symmetry
Ie. When choosing a team of 3 from 10 people, choosing 3 people for the team, or choosing 7 people to leave out from the team results in the same result
This is what makes combination's "order doesn't matter" unique
So, the result of combinations should not change the outcome
Permutation - Order matters
where n is total number of items, and r is the number of items to arrange
0! = 1
This formula cannot be directly used all the time, because the number of items may have conditions. (ie, at least 3 items of x, and 4 items of y)
In these cases, use combination formula to find out first.
Discrete and Continuous Random Variables
Random Variables
An outcome with a value that is random. (Ie. Number of heads in 5 coin flips = [random])
Outcome can be related to being discrete or continuous
Can be expressed with a table
Discrete Distribution
Probability Distribution of a
0 toys
9 kids
1 toy
6 kids
2 toys
15 kids
From the table, we can gain the probability a child has x toys.
This creates pmf, which can be displayed in a table.
0 toys
9/total kids = 0.3
1 toy
6/total kids = 0.2
2 toys
15/total kids = 0.5
Now, from pmf, we can find the total chance the outcome, or less than it can be achieved.
Ie. What's the chance a kid has 1 or less toys?
Ie. How likely is the x and less? (random variable)
This is the cdf - summing up all the possibilities and less. (Giving a probability for a set range of outcomes)
Takes countable values
No. of children, no. on die
Ie. pick ball,max/min, count of days
Take uncountably infinite values, intervals
Time taken to finish task, height, weight
Ie. Integral, measuring time/length
P(X =x) is positive for specific values
P(X=x) =0 for exact values
PMF (probability mass function)
Exact chance of a value
Properties
0 ≤ 𝑝ₓ(𝑥) ≤ 1 for all 𝑥
Probabilities of each outcome should be between 0 and 1
∑𝑥 pₓ(x) = 1
Probabilities of all outcomes added should total up to 1
no pmf as continuous variables are a range. - 5.000 and 5.0001 seconds may have different results, and this may cause infinitely many tiny values - thus we have pdf
PDF (Probability Density Function)
Likelihood range of probability
Integrate PDF over a range to find actual probability
Area under curve = probability
Properties
f(x) ≥ 0 for all x
No negative probabilities
Integration between inf and -inf of f(x)dx=1
total area under the curve = 1
Because sum of all probabilities should be equal 1
P(a ≤ X ≤ b) = ∫ from a to b of f(x) dx
P(X = a) = 0
Because it is continuous, so there is no way there is an exact probability for outcome a (x=a)
CDF (Cumulative Distribution Function)
Chance of the value and less
F(x) = P(X ≤ x)
P(X ≤ x)
= P(X=x) + P(X< x)
= 0 + P(X <x) = P(X<x)
Complement Rule: P(X ≥ x) = 1 - F(x)
Properties
𝐹 is non-decreasing (F(a) ≤ F(b) if a ≤ b)
CDF value never goes down, because the value is a summation
0 ≤ F(a) ≤ 1 for any value a
Value of CDF is always between 0 and 1
lim(a) -> -inf F(a) =0
Probability gets closer to 0 as you go far left, because you're before any possible outcomes
Which means should always start at 0 for 0.
lim(a) -> inf F(a) =1
Probability gets closer to 1 as you go far right, because you passed all possible outcomes
P(a < X ≤ b) = F(b) − F(a)
The probability that X is between a and b (not including a, but including b) is the difference between the CDF at b and the CDF at a.
There is no requirement for cdf to be smooth at all.
Jumps/Stepwise
CDF (Cumulative Distribution Function)
Properties
Same as discrete cdf
Given PDF ⇒ f(x), CDF ⇒ integration f(x), from -inf to x meaning: F(t) dt - Integral of a function over an interval gives the area under the curve of that function between two points ⇒ hence = 1 - Calculates area under curve from -inf to x
<- Same
Given CDF ⇒ F(x) PDF = d F(x) / dx ⇒ Derivative of CDF meaning: - PDF is the slope of CDF curve - flat CDF ⇒ PDF is 0 - steep CDF ⇒ PDF is high.
<- Same
Expected Value & Variance
Expected Value
Long run average outcome
What number do we get on average if we run the same test over and over again.
Can also be considered the mean, probability weighted average, or central tendency.
Discrete Variables
Calculation of it:
Each x value and their probability, multiplied and added up
For E[X] of a function:
result of function g(x) * probability of that function
Continuous Variable
Calculation:
For E[X] of a function:
Properties of E[X]
These properties apply to all random variables, including discrete and continuous.
𝐸[𝑋 + 𝑌] = 𝐸[𝑋]+ 𝐸[𝑌]
𝐸[𝑋 − 𝑌]= 𝐸[𝑋] − 𝐸[𝑌]
𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏
Variance and Standard Deviation
Variance - how spread out the data is, in terms of square distances from the mean
Standard deviation - square root of variance (original scale)
⇒ So, both tells the same story, in different units.
Calculation
Properties of variance
Var 𝑎𝑋 + 𝑏 = 𝑎^2Var(𝑋).
If 𝑋, 𝑌 are independent, then Var 𝑋 + 𝑌 = Var 𝑋 − 𝑌 = Var 𝑋 + Var 𝑌 .
Last updated