Day22 Statistics Review (1)
Properties of Random Variable
(Ace the Data Science Interview: 201 Real Interview Questions Asked By FAANG, Tech Startups, & Wall Street)
Statistics is a fundamental aspect of any data scientist’s toolbox. Many components of a data science pipeline rely on statistical principles, making a strong understanding of foundational statistics essential.
We start with topics like the Central Limit Theorem and the Law of Large Numbers and then progress to the concepts underlying hypothesis testing, particularly p-values and confidence intervals, as well as type I and type II errors and their interpretations.
The following properties hold for any given random variable $X$. We assume $X$ is continuous, but these properties also hold for discrete random variables.
The expectation, also known as the average value or mean, of a random variable is calculated by taking the integral of the value of X multiplied by its probability density function (PDF) $f_X(x)$ :
and the variance is given by:
The variance is always non-negative, and its square root is known as the standard deviation, which is widely used in statistics.
The conditional expectations of both the expectation and variance are as follows. For instance, let’s consider the case for the conditional expectation of $X$, given that $Y=y$.
For any given pair of random variables $X$ and $Y$, the covariance is a linear measure of the relationship between the two variables.
and the normalization of covariance, represented by the Greek letter $\rho$, is the correlation between $X$ and $Y$.
All of these properties are basic concepts for DS statistics, so it helps to be able to understand the mathematical details behind each and walk through an example for each.
For example, if we assume $X$ follows a Uniform distribution on the interval $[a, b]$, then we have the following:
Therefore, the expectation of $X$ is:
We don’t have to memorize the derivations of all the various probability distributions, but we should be able to derive them when necessary. We need to understand the formulas provided above and be able to apply them to common probability distributions such as the exponential or uniform distribution.
Leave a comment