Boxplots are graphical representations of a dataset's distribution. They display key statistics, including the minimum, lower quartile (), median, upper quartile (), and maximum, along with any outliers. Boxplots are useful for visualising and comparing distributions.
1. Drawing a Boxplot:
To construct a boxplot, follow these steps:
- Organise the data: Arrange the data in ascending order and calculate the five key values:
- Minimum: The smallest value (excluding outliers).
- : The lower quartile (25th percentile).
- Median: The middle value (50th percentile).
- : The upper quartile (75th percentile).
- Maximum: The largest value (excluding outliers).
- Identify outliers: Calculate the interquartile range (IQR): Use the formulas below to find thresholds for outliers:
- Lower threshold: .
- Upper threshold: .
Any data points outside these thresholds are considered outliers.
- Plot the key values: Draw a number line, marking the minimum, , median, , and maximum. Connect , median, and with a box, and draw whiskers from the box to the minimum and maximum (excluding outliers).
- Plot outliers: Represent outliers as individual points beyond the whiskers.
2. Interpreting a Boxplot:
Boxplots summarise a dataset's key characteristics:
- The box represents the middle 50% of the data, bounded by and .
- The median line within the box indicates the centre of the distribution.
- The whiskers extend to the smallest and largest non-outlier values, showing the range.
- Outliers, plotted as individual points, highlight unusual data values.
Key features to consider when interpreting a boxplot include:
- The spread of the data (indicated by the length of the box and whiskers).
- The presence and location of outliers.
- Skewness: If the median is closer to or , the data may be skewed.
3. Comparing Boxplots:
When comparing multiple boxplots, look for differences in:
- Medians: Indicate differences in central tendency.
- Spread: Compare the lengths of boxes and whiskers to identify variations in variability.
- Outliers: Examine the number and positions of outliers for each dataset.
- Skewness: Look for asymmetry in the box and whiskers to assess skewness in the data.
4. Example:
Consider the following dataset: .
- Step 1: Arrange the data in ascending order (already arranged).
- Step 2: Find the key values:
- Minimum:
- Median:
- Maximum:
- Step 3: Calculate the IQR: .
- Step 4: Find thresholds for outliers:
- Lower threshold: .
- Upper threshold: .
- Step 5: Identify outliers: There are no outliers, as all data points lie between and .
Plot the key values and connect them with a box and whiskers.
5. Summary:
- Boxplots visualise the distribution of a dataset, highlighting key statistics and outliers.
- Use the IQR method to identify outliers and ensure they are represented on the plot.
- Compare boxplots to analyse differences in central tendency, spread, and outliers between datasets.