105 cheat sheet 1

wk 1 - 4

Machine Learning Tasks

1) Regression

  • Predicts continuous values based on past scores

2) Classification

  • Predicts categories

    • Binary

    • Multiple Class

3) Clustering

  • Groups data without labels

Counting Methods

1) Repetition Allowed

nk,whereย nย isย numberย ofย possibilities,ย andย kย isย boxesn^k , \text {where n is number of possibilities, and k is boxes}

2) No repetition

nโˆ—(nโˆ’1)โˆ—(nโˆ’2).....,ย whereย nย isย theย numberย ofย possibilitiesn * (n-1) * (n-2)....., \text { where n is the number of possibilities}

3) At least one/more repeated digits

nkโˆ’(nโˆ—[nโˆ’1]โˆ—[nโˆ’2]...)n^k - (n*[n-1] *[n-2]...)

Set Theory + Probability

Just because something has two possible results doesn't mean each result is equally likely

AโˆฉB:intersect A \cap B : \text {intersect}
AโˆชB:unionA \cup B : union

Mutually exclusive means that two or more events, conditions, or situations cannot happen at the same time. If one occurs, the other(s) must not.

Addition Rule/Inclusion-Exclusion Principle

  • For the inclusion-exclusion principle, the goal is to add all portions (in a venn diagram) and minus all intersections.

Why intersections only?

That's because there is double counting of the intersection when you add P(A) + P(B).

P(AโˆชB)=P(A)+P(B)โˆ’P(AโˆฉB)P (A \cup B) = P(A) + P(B) - P (A \cap B)
P(AโˆชB)=P(A)+P(B)P(A \cup B) = P(A) + P(B)

Complement Rule

P(Aโˆ)=1โˆ’P(A)P(A^\complement) = 1- P(A)

Conditional Probability

  • Given event A, what is the probability of B

P(AโˆฃB)=P(AโˆฉB)P(B)P (A|B) = \frac {P (A\cap B)}{P(B)}

Bayes Theorem

  • Reverse of conditional probability

  • Given event B, what is the chance event B was caused by event A

P(AโˆฃB)=P(BโˆฃA)P(A)P(B)P(A|B) = \frac {P(B|A)P(A)}{P(B)}

If you see Given/If/Assuming in a question, IT is a conditional probability

Independence

  • When occurrence of one event does not affect probability of other event occurring.

Mutually exclusive is very different from independence.

Independence means cooking a meal does not change what happens when you watch an anime. They don't affect each other at al.

If two events are mutually exclusive, ie. you are choosing between to eat a apple or banana, but being mutually exclusive means that you can't eat both at the same time.

If two events are independent:

  • As long as one is true, the rest is true

P(AโˆฉB)=P(A)โˆ—P(B)P ( A \cap B) = P(A) * P(B)
P(AโˆฃB)=P(A)P (A|B) = P(A)
P(BโˆฃA)=P(B)P(B|A) = P(B)

The opposite of independence is dependence, meaning that the an event is affected by another event.

Joint Probability

  • The probability of both event A and event B occuring is

P(AโˆฉB)P( A \cap B)

Marginal Probability

  • Sum over joint probabilities

P(A)=P(AโˆฉB)+P(AโˆฉBโˆ)P(A) = P( A \cap B) + P(A \cap B^ \complement )

Law of Total Probability

  • The events form a partition of the sample space.

  • "The total probability of A is the sum of the probabilities of A happening in each scenario, weighted by how likely each scenario is."

Only if Event B is mutually exclusive and exhaustive

Combination + Permutation

Keywords:

  • Does order matter?

  • Are they looking for a pair?

  • Anything hinting that it should be distinct

Other Tips:

  • Multiplication โ‡’ Adding probabilities

  • Power โ‡’ Independent probabilities happening at once (Ie. Not sharing the same birthday/Dice is a 6 and a Coin is Heads)

Combination - Order does not matter

(nr)=n!r!(nโˆ’r)!{n \choose r} = \frac {n!}{r!(n-r)!}

Special Chooses

1) Select all

(nn)=1{n \choose n} = 1

2) Select none

(n0)=1{n \choose 0} = 1

3) Symmetry

  • Ie. When choosing a team of 3 from 10 people, choosing 3 people for the team, or choosing 7 people to leave out from the team results in the same result

(103)=(107){10 \choose 3} = {10 \choose 7}
  • This is what makes combination's "order doesn't matter" unique

  • So, the result of combinations should not change the outcome

Permutation - Order matters

P(n,r)=n!(nโˆ’r)!P (n,r) = \frac {n!}{(n-r)!}

where n is total number of items, and r is the number of items to arrange

  • 0! = 1

Discrete and Continuous Random Variables

Random Variables

  • An outcome with a value that is random. (Ie. Number of heads in 5 coin flips = [random])

  • Outcome can be related to being discrete or continuous

  • Can be expressed with a table

Discrete Distribution

  • Probability Distribution of a

Number of Toys (Random Variable/Output)
Number of Kids

0 toys

9 kids

1 toy

6 kids

2 toys

15 kids

  • From the table, we can gain the probability a child has x toys.

    • This creates pmf, which can be displayed in a table.

x
P(X=x)

0 toys

9/total kids = 0.3

1 toy

6/total kids = 0.2

2 toys

15/total kids = 0.5

  • Now, from pmf, we can find the total chance the outcome, or less than it can be achieved.

    • Ie. What's the chance a kid has 1 or less toys?

    • Ie. How likely is the x and less? (random variable)

    • This is the cdf - summing up all the possibilities and less. (Giving a probability for a set range of outcomes)

Discrete Distribution
Continuous Distribution
  • Takes countable values

  • No. of children, no. on die

  • Ie. pick ball,max/min, count of days

  • Take uncountably infinite values, intervals

  • Time taken to finish task, height, weight

  • Ie. Integral, measuring time/length

P(X =x) is positive for specific values

P(X=x) =0 for exact values

PMF (probability mass function)

  • Exact chance of a value

Properties

  • 0 โ‰ค ๐‘โ‚“(๐‘ฅ) โ‰ค 1 for all ๐‘ฅ

    • Probabilities of each outcome should be between 0 and 1

  • โˆ‘๐‘ฅ pโ‚“(x) = 1

    • Probabilities of all outcomes added should total up to 1

no pmf as continuous variables are a range. - 5.000 and 5.0001 seconds may have different results, and this may cause infinitely many tiny values - thus we have pdf

PDF (Probability Density Function)

  • Likelihood range of probability

  • Integrate PDF over a range to find actual probability

  • Area under curve = probability

Properties

  • f(x) โ‰ฅ 0 for all x

    • No negative probabilities

  • Integration between inf and -inf of f(x)dx=1

    • total area under the curve = 1

    • Because sum of all probabilities should be equal 1

  • P(a โ‰ค X โ‰ค b) = โˆซ from a to b of f(x) dx

  • P(X = a) = 0

    • Because it is continuous, so there is no way there is an exact probability for outcome a (x=a)

CDF (Cumulative Distribution Function)

  • Chance of the value and less

  • F(x) = P(X โ‰ค x)

  • P(X โ‰ค x)

    • = P(X=x) + P(X< x)

    • = 0 + P(X <x) = P(X<x)

Complement Rule: P(X โ‰ฅ x) = 1 - F(x)

Properties

  • ๐น is non-decreasing (F(a) โ‰ค F(b) if a โ‰ค b)

    • CDF value never goes down, because the value is a summation

  • 0 โ‰ค F(a) โ‰ค 1 for any value a

    • Value of CDF is always between 0 and 1

  • lim(a) -> -inf F(a) =0

    • Probability gets closer to 0 as you go far left, because you're before any possible outcomes

      • Which means should always start at 0 for 0.

  • lim(a) -> inf F(a) =1

    • Probability gets closer to 1 as you go far right, because you passed all possible outcomes

  • P(a < X โ‰ค b) = F(b) โˆ’ F(a)

    • The probability that X is between a and b (not including a, but including b) is the difference between the CDF at b and the CDF at a.

  • There is no requirement for cdf to be smooth at all.

  • Jumps/Stepwise

CDF (Cumulative Distribution Function)

Properties

  • Same as discrete cdf

Given PDF โ‡’ f(x), CDF โ‡’ integration f(x), from -inf to x meaning: F(t) dt - Integral of a function over an interval gives the area under the curve of that function between two points โ‡’ hence = 1 - Calculates area under curve from -inf to x

<- Same

Given CDF โ‡’ F(x) PDF = d F(x) / dx โ‡’ Derivative of CDF meaning: - PDF is the slope of CDF curve - flat CDF โ‡’ PDF is 0 - steep CDF โ‡’ PDF is high.

<- Same

Expected Value & Variance

Expected Value

  • Long run average outcome

    • What number do we get on average if we run the same test over and over again.

  • Can also be considered the mean, probability weighted average, or central tendency.

Discrete Variables

  • Calculation of it:

E[X]=โˆ‘xโˆ—px(x)=ฮผE[X] = \sum_{} x * p_{x} (x) = \mu
  • Each x value and their probability, multiplied and added up

  • For E[X] of a function:

    • result of function g(x) * probability of that function

Forย E[X]ย ofย aย function=โˆ‘xg(x)โˆ—px(x)\text {For E[X] of a function} = \sum _{x} g(x) * p_{x} (x)

Continuous Variable

  • Calculation:

E[X]=โˆซโˆ’โˆžโˆžxโˆ—fx(x)dxE[X] = \int _{-\infty}^{\infty} x * fx(x) dx
  • For E[X] of a function:

E[X]=โˆซโˆ’โˆžโˆžg(x)โˆ—fx(x)dxE[X] = \int _{-\infty}^{\infty} g(x) * f_{x}(x) dx

Properties of E[X]

  • These properties apply to all random variables, including discrete and continuous.

Properties of E[X]

๐ธ[๐‘‹ + ๐‘Œ] = ๐ธ[๐‘‹]+ ๐ธ[๐‘Œ]

๐ธ[๐‘‹ โˆ’ ๐‘Œ]= ๐ธ[๐‘‹] โˆ’ ๐ธ[๐‘Œ]

๐ธ[๐‘Ž๐‘‹ + ๐‘] = ๐‘Ž๐ธ[๐‘‹] + ๐‘

Variance and Standard Deviation

  • Variance - how spread out the data is, in terms of square distances from the mean

  • Standard deviation - square root of variance (original scale)

โ‡’ So, both tells the same story, in different units.

Calculation

Var(X)=E[(Xโˆ’ฮผ)2]=E[X2]โˆ’ฮผ2Var(X) = E[(X-\mu)^2] = E[X^2] - \mu^2
Standardย Deviation=Var(X)\text{Standard Deviation}= \sqrt {Var(X)}

Properties of variance

Properties of Var(X)

Var ๐‘Ž๐‘‹ + ๐‘ = ๐‘Ž^2Var(๐‘‹).

If ๐‘‹, ๐‘Œ are independent, then Var ๐‘‹ + ๐‘Œ = Var ๐‘‹ โˆ’ ๐‘Œ = Var ๐‘‹ + Var ๐‘Œ .

๐œ‡ is a constant, so ๐ธ [๐œ‡^2] = ฮผ^2

Last updated