A median is a 'half-way' point in our data. We can use quartiles and percentiles to find any fraction (or percentage) of the way into our data.
Quartiles
Quartiles divide a data set into four equal parts. The three quartiles are:
- Lower Quartile (Q1): The value below which 25% of the data falls. This is also known as the 25th percentile.
- Median (Q2): The middle value of the data set, also known as the 50th percentile.
- Upper Quartile (Q3): The value below which 75% of the data falls, also known as the 75th percentile.
To find the quartiles for a small discrete data set, use these formulas for the positions of the quartiles:
\[Q_1 = \frac{n}{4}\]
\[Q_2 = \frac{n+1}{2}\]
\[Q_3 = \frac{3n}{4}\]
Where \(n\) is the total number of data points. With \(Q_1\) and \(Q_3\), we always round this number up, unless it is a integer where we add 0.5.
To find the quartiles for a large grouped data set, use these formulas for the positions of the quartiles:
\[Q_1 = \frac{n}{4}\]
\[Q_2 = \frac{n}{2}\]
\[Q_3 = \frac{3n}{4}\]
We use the exact values in our further calculations.
Percentiles
Percentiles divide a data set into 100 equal parts. The k-th percentile is the value below which \(k\%\) of the data falls. The formula for the position of the k-th percentile is:
To find a percentile for a large grouped data set, use these formulas for the position of the k-th percentile:
\[P_k = \frac{kn}{100}\]
Where:
- \(P_k\) is the k-th percentile.
- \(k\) is the desired percentile (e.g., 20 for the 20th percentile).
- \(n\) is the number of data points.
Interpolation
When the data is grouped and we have the position of the quartile or percentile, we use linear interpolation to estimate the exact value. This method assumes the data is evenly distributed within the class interval.
For interpolation, the formula is:
\[\text{Estimated Value} = L + \left( \frac{P - F}{f} \right) \times h\]
Where:
- \(L\) is the lower boundary of the class interval containing the quartile or percentile.
- \(P\) is the position of the quartile or percentile you are calculating.
- \(F\) is the cumulative frequency before the class.
- \(f\) is the frequency of the class interval containing the quartile or percentile.
- \(h\) is the width of the class interval.
Example of Interpolation
Suppose you want to calculate the 30th percentile, and the 30th percentile falls within a class interval with:
- Lower boundary \(L = 10\)
- Class width \(h = 5\)
- Cumulative frequency before the class \(F = 20\)
- Frequency of the class \(f = 8\)
- The position \(P = 30\)
The estimated value would be:
\[
\text{Estimated Value} = 10 + \left( \frac{30 - 20}{8} \right) \times 5 = 10 + \left( \frac{10}{8} \right) \times 5 = 10 + 6.25 = 16.25
\]