105 cheat sheet 1

wk 1 - 4

Machine Learning Tasks

1) Regression

Predicts continuous values based on past scores

2) Classification

Predicts categories
- Binary
- Multiple Class

3) Clustering

Groups data without labels

Counting Methods

1) Repetition Allowed

n^k , \text {where n is number of possibilities, and k is boxes}

2) No repetition

n * (n-1) * (n-2)....., \text { where n is the number of possibilities}

3) At least one/more repeated digits

n^k - (n*[n-1] *[n-2]...)

Set Theory + Probability

Just because something has two possible results doesn't mean each result is equally likely

A \cap B : \text {intersect}

A \cup B : union

Mutually exclusive means that two or more events, conditions, or situations cannot happen at the same time. If one occurs, the other(s) must not.

Addition Rule/Inclusion-Exclusion Principle

For the inclusion-exclusion principle, the goal is to add all portions (in a venn diagram) and minus all intersections.

Why intersections only?

That's because there is double counting of the intersection when you add P(A) + P(B).

P (A \cup B) = P(A) + P(B) - P (A \cap B)

However, if events A and B are mutually exclusive, then Addition Rule will not apply

P(A \cup B) = P(A) + P(B)

Complement Rule

P(A^\complement) = 1- P(A)

Conditional Probability

Given event A, what is the probability of B

P (A|B) = \frac {P (A\cap B)}{P(B)}

Bayes Theorem

Reverse of conditional probability
Given event B, what is the chance event B was caused by event A

P(A|B) = \frac {P(B|A)P(A)}{P(B)}

If you see Given/If/Assuming in a question, IT is a conditional probability

Independence

When occurrence of one event does not affect probability of other event occurring.

Mutually exclusive is very different from independence.

Independence means cooking a meal does not change what happens when you watch an anime. They don't affect each other at al.

If two events are mutually exclusive, ie. you are choosing between to eat a apple or banana, but being mutually exclusive means that you can't eat both at the same time.

If two events are independent:

As long as one is true, the rest is true

P ( A \cap B) = P(A) * P(B)

P (A|B) = P(A)

P(B|A) = P(B)

The opposite of independence is dependence, meaning that the an event is affected by another event.

Joint Probability

The probability of both event A and event B occuring is

P( A \cap B)

Marginal Probability

Sum over joint probabilities

P(A) = P( A \cap B) + P(A \cap B^ \complement )

Law of Total Probability

The events form a partition of the sample space.
"The total probability of A is the sum of the probabilities of A happening in each scenario, weighted by how likely each scenario is."

Combination + Permutation

Keywords:

Does order matter?
Are they looking for a pair?
Anything hinting that it should be distinct

Other Tips:

Multiplication ⇒ Adding probabilities
Power ⇒ Independent probabilities happening at once (Ie. Not sharing the same birthday/Dice is a 6 and a Coin is Heads)

Combination - Order does not matter

{n \choose r} = \frac {n!}{r!(n-r)!}

There may come times where you need to answer via cases. This is when:

1) There are multiple constraints (at least 2 boys and 3 girls)

2) There are multiple valid possibilities ( 1 team with 2 people, 2 teams with 1 person)

3) When the problem asks for exact numbers/items from different groups

! Have you checked inverse/duplicate instances?

Special Chooses

1) Select all

{n \choose n} = 1

2) Select none

{n \choose 0} = 1

3) Symmetry

Ie. When choosing a team of 3 from 10 people, choosing 3 people for the team, or choosing 7 people to leave out from the team results in the same result

{10 \choose 3} = {10 \choose 7}

This is what makes combination's "order doesn't matter" unique
So, the result of combinations should not change the outcome

Permutation - Order matters

P (n,r) = \frac {n!}{(n-r)!}

where n is total number of items, and r is the number of items to arrange

0! = 1

This formula cannot be directly used all the time, because the number of items may have conditions. (ie, at least 3 items of x, and 4 items of y)

In these cases, use combination formula to find out first.

Discrete and Continuous Random Variables

Random Variables

An outcome with a value that is random. (Ie. Number of heads in 5 coin flips = [random])
Outcome can be related to being discrete or continuous
Can be expressed with a table

Discrete Distribution

Probability Distribution of a

Number of Toys (Random Variable/Output)

Number of Kids

0 toys

9 kids

1 toy

6 kids

2 toys

15 kids

From the table, we can gain the probability a child has x toys.
- This creates pmf, which can be displayed in a table.

P(X=x)

0 toys

9/total kids = 0.3

1 toy

6/total kids = 0.2

2 toys

15/total kids = 0.5

Now, from pmf, we can find the total chance the outcome, or less than it can be achieved.
- Ie. What's the chance a kid has 1 or less toys?
- Ie. How likely is the x and less? (random variable)
- This is the cdf - summing up all the possibilities and less. (Giving a probability for a set range of outcomes)

Discrete Distribution

Continuous Distribution

Takes countable values
No. of children, no. on die
Ie. pick ball,max/min, count of days

Take uncountably infinite values, intervals
Time taken to finish task, height, weight
Ie. Integral, measuring time/length

P(X =x) is positive for specific values

P(X=x) =0 for exact values

PMF (probability mass function)

Exact chance of a value

Properties

0 ≤ 𝑝ₓ(𝑥) ≤ 1 for all 𝑥
- Probabilities of each outcome should be between 0 and 1
∑𝑥 pₓ(x) = 1
- Probabilities of all outcomes added should total up to 1

no pmf as continuous variables are a range. - 5.000 and 5.0001 seconds may have different results, and this may cause infinitely many tiny values - thus we have pdf

PDF (Probability Density Function)

Likelihood range of probability
Integrate PDF over a range to find actual probability
Area under curve = probability

Properties

f(x) ≥ 0 for all x
- No negative probabilities
Integration between inf and -inf of f(x)dx=1
- total area under the curve = 1
- Because sum of all probabilities should be equal 1
P(a ≤ X ≤ b) = ∫ from a to b of f(x) dx
P(X = a) = 0
- Because it is continuous, so there is no way there is an exact probability for outcome a (x=a)

CDF (Cumulative Distribution Function)

Chance of the value and less
F(x) = P(X ≤ x)
P(X ≤ x)
- = P(X=x) + P(X< x)
- = 0 + P(X <x) = P(X<x)

Complement Rule: P(X ≥ x) = 1 - F(x)

Properties

𝐹 is non-decreasing (F(a) ≤ F(b) if a ≤ b)
- CDF value never goes down, because the value is a summation
0 ≤ F(a) ≤ 1 for any value a
- Value of CDF is always between 0 and 1
lim(a) -> -inf F(a) =0
- Probability gets closer to 0 as you go far left, because you're before any possible outcomes
  - Which means should always start at 0 for 0.
lim(a) -> inf F(a) =1
- Probability gets closer to 1 as you go far right, because you passed all possible outcomes
P(a < X ≤ b) = F(b) − F(a)
- The probability that X is between a and b (not including a, but including b) is the difference between the CDF at b and the CDF at a.

There is no requirement for cdf to be smooth at all.

Jumps/Stepwise

CDF (Cumulative Distribution Function)

Properties

Same as discrete cdf

Given PDF ⇒ f(x), CDF ⇒ integration f(x), from -inf to x meaning: F(t) dt - Integral of a function over an interval gives the area under the curve of that function between two points ⇒ hence = 1 - Calculates area under curve from -inf to x

<- Same

Given CDF ⇒ F(x) PDF = d F(x) / dx ⇒ Derivative of CDF meaning: - PDF is the slope of CDF curve - flat CDF ⇒ PDF is 0 - steep CDF ⇒ PDF is high.

<- Same

Expected Value & Variance

Expected Value

Long run average outcome
- What number do we get on average if we run the same test over and over again.
Can also be considered the mean, probability weighted average, or central tendency.

Discrete Variables

Calculation of it:

E[X] = \sum_{} x * p_{x} (x) = \mu

Each x value and their probability, multiplied and added up
For E[X] of a function:
- result of function g(x) * probability of that function

\text {For E[X] of a function} = \sum _{x} g(x) * p_{x} (x)

Continuous Variable

Calculation:

E[X] = \int _{-\infty}^{\infty} x * fx(x) dx

For E[X] of a function:

E[X] = \int _{-\infty}^{\infty} g(x) * f_{x}(x) dx

Properties of E[X]

These properties apply to all random variables, including discrete and continuous.

Properties of E[X]

𝐸[𝑋 + 𝑌] = 𝐸[𝑋]+ 𝐸[𝑌]

𝐸[𝑋 − 𝑌]= 𝐸[𝑋] − 𝐸[𝑌]

𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏

Variance and Standard Deviation

Variance - how spread out the data is, in terms of square distances from the mean
Standard deviation - square root of variance (original scale)

⇒ So, both tells the same story, in different units.

Calculation

Var(X) = E[(X-\mu)^2] = E[X^2] - \mu^2

\text{Standard Deviation}= \sqrt {Var(X)}

Properties of variance

Properties of Var(X)

Var 𝑎𝑋 + 𝑏 = 𝑎^2Var(𝑋).

If 𝑋, 𝑌 are independent, then Var 𝑋 + 𝑌 = Var 𝑋 − 𝑌 = Var 𝑋 + Var 𝑌 .

𝜇 is a constant, so 𝐸 [𝜇^2] = μ^2

Previous105 cheat sheet 0 Next105 cheat sheet 2

Last updated 2 months ago