2 minute read

Law of Large Numbers, Central Limit Theorem, and Hypothesis Testing (1) - General Setup



(Ace the Data Science Interview: 201 Real Interview Questions Asked By FAANG, Tech Startups, & Wall Street)

Law of Large Numbers

The Law of Large Numbers (LLN) states that if we sample a random variable independently a large number of times, the measured average value should converge to the random variable’s true expectation.

$\bar{X}_n = \frac{X_1+\dots+X_n}{n} \rightarrow \mu,$ as $n \rightarrow \infty $


It is crucial to examine the long-term behavior of random variables over time. For instance, while a coin may land on heads five times in a row over a larger sample size $n$We anticipate the proportion of heads will be approximately half of the total flips.


Central Limit Theorem

The Central Limit Theorem (CLT) states that when we take repeated random samples of a large size from a given population, the distribution of the sample means will tend to be normally distributed, regardless of the original distribution of the population.

Keep in mind that the following form represents the normal distribution:

$f_X(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{exp} - \big(\frac{(x-\mu)^2}{2 \sigma^2})$

With mean($\mu$) and standard deviation($\sigma$)


The CLT states that:

$$ \bar{X_n} = \frac{X_1+ \dots+ X_n}{n} \rightarrow N \big(\mu, \frac{\sigma^2}{n})
$$

Hence,

$\frac{\bar{X_n}- \mu}{\frac{\sigma}{\sqrt{n}} \sim N(0,1)}$


The Central Limit Theorem (CLT) is fundamental for hypothesis testing. To illustrate the concept simply, let’s use the example of coin flipping: when you flip a coin many times, the distribution of the number of heads should resemble a normal distribution.

Whenever we need to analyze a distribution with a large sample size, we should always consider the CLT, whether it’s a Binomial, Poisson, or any other distribution.


Hypothesis Testing

General Setup

Hypothesis testing is determining whether a sample of data supports a specific hypothesis. Typically, hypotheses relate to particular characteristics of interest within a given population, such as its parameters, such as the mean conversion rate among a group of users ($\mu$).

The steps involved in hypothesis testing are as follows:

  1. State a null hypothesis and an alternative hypothesis. The null hypothesis is tested and will either be rejected in favor of the alternative hypothesis or not be rejected. It’s important to note that not rejecting the null hypothesis does not necessarily mean it is true; instead, there is insufficient evidence to reject it.

  2. Use a specific test statistic of the null hypothesis to calculate the corresponding p-value.

  3. Compare the p-value to a predetermined significance level, denoted as $\alpha$.

The null hypothesis typically serves as a baseline (e.g., the marketing campaign did not increase conversion rates, etc.), and the primary aim is to decisively reject it with statistical significance and expect a significant outcome.


Hypothesis tests can be classified as either one- or two-tailed tests. A one-tailed test includes specific types of null and alternative hypotheses.

$H_0: \mu = \mu_0$
$H_1: \mu < \mu_0$ or $H_1: \mu > \mu_0$

The types of a two-tailed test are as follows:

$H_0: \mu = \mu_0$
$H_1: \mu \neq \mu_0$

where $H_0$ is the null hypothesis and $H_1$ is the alternative hypothesis, and $\mu$ is the parameter of interest.



Leave a comment