The boxplot, also known as a box-and-whisker plot, is a commonly used chart type in daily work and research. While most are familiar with bar charts, line graphs, pie charts, scatter plots, and Gantt charts, creating professional reports often requires utilizing more specialized and practical chart types like the boxplot.

Today, let’s explore the boxplot, which contains several elements and may seem somewhat complex to use. This article aims to provide a detailed interpretation of the various elements of the boxplot, along with analysis and creation techniques, to aid in your journey as a data analyst.

1. What is a Boxplot?

Firstly, what is a boxplot? Also referred to as a box-and-whisker plot, this visualization method derives its name from its resemblance to a box. Introduced by the renowned American mathematician John W. Tukey in his work “Exploratory Data Analysis” in 1977, the boxplot is primarily used to display the distribution of a set of continuous data. When you need to understand the data distribution characteristics or identify outliers, a boxplot is a useful tool for data analysis.

Boxplots provide an intuitive view of the distribution of a dataset, enabling quick identification of outliers.

2. How to Read a Boxplot?

How do we analyze the data in a boxplot? Let’s delve into the various concepts using an example of a boxplot.

The top and bottom of the box represent the upper and lower quartiles, respectively, while the line inside the box represents the median, dividing it into two halves. The lines extending outside the box display data beyond the upper and lower quartiles, resembling whiskers, hence the term “boxplot” or “box and whisker plot.”

Occasionally, individual points may appear on the boxplot, beyond the ends of the whiskers, representing outliers or anomalies. Boxplots are non-parametric; they depict variations in the statistical sample without making assumptions about the underlying statistical distribution. The spacing between different parts of the box indicates the dispersion (spread) and skewness of the data and shows outliers.

Specific data calculations include:

Maximum value: Calculated to distinguish outliers, not the actual maximum value. Maximum value = Q3 + 1.5 * IQR.

Upper quartile (Q3): The 75th percentile when all values in the sample are arranged in ascending order.

Median: The 50th percentile, represents the middle value when all values in the sample are arranged in ascending order.

Lower quartile (Q1): The 25th percentile when all values in the sample are arranged in ascending order.

Minimum value: Calculated to distinguish outliers, not the actual minimum value. Minimum value = Q1 – 1.5 * IQR.

Interquartile range (IQR): The difference between Q3 and Q1, reflecting the concentration of data to some extent. A smaller range indicates more concentrated data.

Outliers: Data points falling outside the maximum and minimum values.

Boxplots are suitable for comparing frequency distributions. This comparison shows how many projects or categories fall within a range of numbers. For example, using a frequency distribution comparison, we can demonstrate the number of employees in our company earning over 50,000 yuan per month, those earning between 30,000 and 50,000 yuan, and those earning between 10,000 and 30,000 yuan. Similarly, we can illustrate the number of employees under 25 years old, between 25 and 30 years old, and over 30 years old. Key terms for this comparison include ranges from x to y, density, frequency, and distribution.

boxplot; boxplot examples

While the elements included in a boxplot may seem complex, they offer several functions that other charts cannot replace:

  1. Intuitive Identification of Outliers: A boxplot allows for observing the overall distribution of data by utilizing statistical metrics such as the median, 25th percentile, 75th percentile, upper bound, and lower bound. The box encompasses the majority of normal data, while data lying beyond the upper and lower bounds are considered outliers.
  2. Assessment of Skewness and Tail Weight: In a standard normal distribution with a large sample size, the median lies at the center of the upper and lower quartiles, making the box symmetric around the median line. The greater the deviation of the median from the center of the upper and lower quartiles, the stronger the skewness of the distribution. If outliers are concentrated on the larger side, the distribution shows right skewness; if outliers are concentrated on the smaller side, the distribution exhibits left skewness.
  3. Comparison of Shapes for Multiple Data Sets: The lines at the top and bottom of the box represent the upper and lower quartiles, indicating that the box contains 50% of the data. Therefore, the width of the box reflects the degree of data fluctuation to some extent. A flatter box indicates more concentrated data, while shorter whiskers signify data concentration.

With these capabilities, boxplots find diverse applications, commonly used in activities such as quality management, personnel evaluation, and exploratory data analysis. For instance, the boxplot example below illustrates the situation before and after salary adjustments in a company. The distribution of salaries is more concentrated after the adjustment, creating a suitable range without excessively large disparities. Special cases are also addressed, contributing to employee motivation and meeting expectations for the adjustment.

3. How to Make a Boxplot

Creating a boxplot in Excel can be quite cumbersome, and integrating it with other visualizations for report generation and analysis is challenging. This is where the use of professional chart-making tools becomes necessary.

I recommend a professional chart-making and data visualization software: FineReport. It includes 19 major chart types and over 50 dynamic report chart styles required for data visualization. Its dynamic charts support rich interactive effects, allowing users to easily understand and utilize big data.

Next, let’s demonstrate how to create a professional boxplot using FineReport quickly:

FineReport supports multiple chart types
FineReport supports multiple chart types

FineReport provides localized services in Taiwan, Hong Kong, Macau, Singapore, Malaysia, and other regions, with technical support and project implementation by local teams. Click the button below to download FineReport for free and experience the reporting software. Feel free to contact technical support engineers for any technical issues, assisting you on your chart-making journey!

Try FineReport Now

For individual users, FineReport is permanently and completely free; for enterprises, FineReport offers different pricing plans tailored to their specific needs.

  1. Inserting a boxplot chart type in the FineReport designer.

The FineReport designer operates similarly to Excel, making it extremely user-friendly and straightforward to use with minimal learning curve.

2. Data Binding
Bind the dataset with the inserted boxplot.

create a boxplot by FineReport

There are two data formats for boxplots: result boxplots and detail boxplots. Result boxplots use data directly stored in the dataset as statistical information for the boxplot. Detail boxplots utilize detailed data, and FineReport automatically calculates the statistical parameters based on the dataset.

In essence, if the dataset contains pre-calculated data, FineReport will automatically retrieve it; otherwise, it will compute the necessary values.

3. Style Configuration

FineReport offers extensive style customization options for the boxplot, allowing adjustments to parameters like borders, colors, and more. Additionally, you can define the representation of normal and outlier values within the boxplot.

create a boxplot by FineReport 2

4. Chart Preview

Once the style configurations are finalized, you can preview the boxplot that has been customized accordingly.

create a boxplot by FineReport 3

4. Key Points in Boxplot Making

Lastly, to aid in creating visually appealing and intuitive charts, here are some key points to consider when making a boxplot:

1. Limitations with Numerous Groups

Using a boxplot becomes challenging when there are too many groups depicted in a single chart. Excessive grouping results in an overload of boxplots, making it difficult to discern the distribution of data within each group.

2. Boxplot Limitations in Displaying Data Variability

While boxplots provide a visual representation of data distribution and facilitate quick identification of outliers, they are not suitable for showcasing data variability.

  1. Boxplot Unsuitability for Detailed Data Distribution Analysis

Boxplots only offer a summary of data distribution for a specific group, making them inadequate for detailed analysis. Consider using a violin plot for a more comprehensive examination of data distribution.

That concludes the detailed analysis of boxplots. For previous articles in the chart series, you can click the links: data visualization.

For individual users, FineReport is completely free, while for enterprises, it offers tailored pricing plans to meet specific needs.

FineReport provides localized services in regions such as Taiwan, Hong Kong, Macau, Singapore, and Malaysia, with technical support and project implementation handled by the local teams of Fanruan, the original manufacturer. Click the banner below to download FineReport reporting software for a free trial. Feel free to contact our technical support engineers for any technical assistance during your chart-making journey!

Free Trial of FineReport

Explore Other Resources

Data Visualization | 15 min read
When it comes to project management dashboard software tools, the first thing that comes to mind for many people is the Gantt chart! This is…
Data Visualization | 9 min read
We all know the importance of data visualization, but do you really know how to visualize your data? By using the right chart, you can…
Data Visualization | 15 min read
Data visualization helps us understand data more easily. This post covers the top 16 types of chart in data visualization and their application scenarios, helping…