## Preliminaries

**Definition:** Quantiles are cut-off points partitioning the domain of a probability distribution which is the support/range of the corresponding random variable into intervals with equal probabilities. The probability distribution can be in the form of a *pmf* or a *pdf*. In case of a sample (observed data), which is discrete, the quantiles are similarly defined, however, the probability is define based on the relative frequency of occurrence.

**Definition: ***q*-quantiles partition the domain of a distribution or the sorted (domain of) data into parts with equal probabilities. For a *q*-quantile, there are quantiles or partitioning points. So, the probability of each partition should be because the range of the probability measure (set function) is .

**Example:** The 4-quantile of of a normal distribution uses 3 points to partition the domain of the distribution as where .

In the above example, and are the first, second, and the third 4-quantiles.

**Definition [quantiles of a population]:** Let . The *i-*th *q-*quantile of a distribution of a random variable is defined as such that

or equivalently

The above definitions uses two conditions or equivalently the infinitum to make sure the quantity is well defined for non-continues distributions.

**Remark: **The sequence of quantile values are increasing since the interval of the domain of the distribution is so.

**Definition:** *p-*quantiles are the same as *q-*quantiles but . The mathematical definition is the same as *q-*quantiles with replaced by the real number .

**Example:** The 0.95-quantile of the length of some object is 4cm. The probability of the length, being the random variable here, less than or equal 4cm is 0.95. In other words,, in a long run or with the frequency interpretation of probability, 95% of the measurements/data of the length are below or equal 4cm.

**Example:** The only 2-quantile is called the median. So, the median partitions the support of a random variable (i.e domain of its distribution) into two sub sets having the same probability of 0.5.

**Remark:** Quantiles partition the area under the *pdf* graph of a distribution into regions with equal area.

**Quantiles of a sample/data:** Let be a (finite and countable sequence) of observed values of a random variable (iid random variables constructing the sample). If the values are sorted in the ascending order and they have equal probability, i.e. , then then the index of the *i-*th *q-*quantile value is calculated as,

If is not an integer, then round up to the next integer to get the appropriate index. Therefore, the *i-*th *q-*quantile value is .

# Q-Q Plot

**Lemma: **Let and be two random variables on the same probability space. If , i.e. a linear transformation then the i-th q-quantiles in the supports (range) of and are related as

Proof:

The converse of this lemma also holds and can be readily proved. The random variables and are related as if for any i-th q-quantiles and of their distributions .

**Q-Q plot:** if the values of q-quantiles of the distributions of two random variables are plotted against each other in a Cartesian coordinate system and the graph shows a linear relation/function, then the distributions are the same up to a linear transformation of the random variables.

Finding out whether a sample of a random variable, i.e. observed data, is from a particular theoriticsl distribution or not is as follows.

1- The discrete values, are sorted in the ascending order and each value is regarded as a quantile cut-off point. This means (*N+1*)-quantiles are produced as (since . Note that as the values get larger the quantiles do as well since the sequence of quantiles is increasing and the sample values are sorted ascendingly,

2- The (*N*+1)-quantiles of the theoretical distribution are found as .

3- The scattered graph of the pairs is plotted.

4- Using a regression line, show whether the graph follows a linear line or not. If yes, the sample is from the same distribution as the theoretical one up to a linear transformation.

Sometimes, like when dealing with a normal distribution, the domain of the distribution is unbounded. Therefore, the last quantile corresponding to 100th percentile () will be infinite. In this case, the quantiles of the theoretical distributions are found according to . There are also other formulas in this regard.

Normality check can be done using a Q-Q plot assuming a normal distribution for the theoretical distribution. The following is how to do it by Python.

import statsmodels.api as sm # data: 1D np.array fig = sm.qqplot(data, fit=True, line='r', ) plt.show()