Box plots

  • EDEXCEL A Level

Video masterclass

Topic summary

Boxplots are graphical representations of a dataset's distribution. They display key statistics, including the minimum, lower quartile (Q1), median, upper quartile (Q3), and maximum, along with any outliers. Boxplots are useful for visualising and comparing distributions.

1. Drawing a Boxplot:

To construct a boxplot, follow these steps:

  1. Organise the data: Arrange the data in ascending order and calculate the five key values:
    • Minimum: The smallest value (excluding outliers).
    • Q1: The lower quartile (25th percentile).
    • Median: The middle value (50th percentile).
    • Q3: The upper quartile (75th percentile).
    • Maximum: The largest value (excluding outliers).
  2. Identify outliers: Calculate the interquartile range (IQR): IQR=Q3Q1 Use the formulas below to find thresholds for outliers:
    • Lower threshold: Q11.5×IQR.
    • Upper threshold: Q3+1.5×IQR.
    Any data points outside these thresholds are considered outliers.
  3. Plot the key values: Draw a number line, marking the minimum, Q1, median, Q3, and maximum. Connect Q1, median, and Q3 with a box, and draw whiskers from the box to the minimum and maximum (excluding outliers).
  4. Plot outliers: Represent outliers as individual points beyond the whiskers.

2. Interpreting a Boxplot:

Boxplots summarise a dataset's key characteristics:

  • The box represents the middle 50% of the data, bounded by Q1 and Q3.
  • The median line within the box indicates the centre of the distribution.
  • The whiskers extend to the smallest and largest non-outlier values, showing the range.
  • Outliers, plotted as individual points, highlight unusual data values.

Key features to consider when interpreting a boxplot include:

  • The spread of the data (indicated by the length of the box and whiskers).
  • The presence and location of outliers.
  • Skewness: If the median is closer to Q1 or Q3, the data may be skewed.

3. Comparing Boxplots:

When comparing multiple boxplots, look for differences in:

  • Medians: Indicate differences in central tendency.
  • Spread: Compare the lengths of boxes and whiskers to identify variations in variability.
  • Outliers: Examine the number and positions of outliers for each dataset.
  • Skewness: Look for asymmetry in the box and whiskers to assess skewness in the data.

4. Example:

Consider the following dataset: 2,4,5,7,9,12,14,18,22.

  • Step 1: Arrange the data in ascending order (already arranged).
  • Step 2: Find the key values:
    • Minimum: 2
    • Q1:5
    • Median: 9
    • Q3:14
    • Maximum: 22
  • Step 3: Calculate the IQR: Q3Q1=145=9.
  • Step 4: Find thresholds for outliers:
    • Lower threshold: 51.5×9=8.5.
    • Upper threshold: 14+1.5×9=27.5.
  • Step 5: Identify outliers: There are no outliers, as all data points lie between 8.5 and 27.5.

Plot the key values and connect them with a box and whiskers.

5. Summary:

  • Boxplots visualise the distribution of a dataset, highlighting key statistics and outliers.
  • Use the IQR method to identify outliers and ensure they are represented on the plot.
  • Compare boxplots to analyse differences in central tendency, spread, and outliers between datasets.

Extra questions (ultimate exclusive)

Ultimate members get access to four additional questions with full video explanations.