What is a bar chart?
A bar chart is a chart that represents categorized data which is presented using rectangular bars whose lengths or heights are proportional to the values they are representing. Bar charts can be depicted either vertically or horizontally. A vertical bar chart is sometimes also referred to as a column chart.
A bar chart is used to compare two or more discrete categories. One axis of the chart will usually represent and display the specific categories which are being compared, and the other axis represents the measurement values. Bar charts can also be presented with a cluster of bars in groups, representing the values of more than one variable which is being measured. It is quite common for colors to be used as well to differentiate and split each bar or group of bars from others on the bar chart. Bars on the chart are plotted with a common baseline to allow for easier comparison of values.
William Playfair (1759-1823) is usually credited as being the first person to use bar charts efficiently. He was a political engineer as well as economist who hailed from Scotland. It was not that some form of bar chart had not been in existence before- data based graphics had been making an appearance since the mid to late 17th century- but Playfair’s utilization of the humble bar chart in his work was compelling. Thus, the modern popularity of the bar chart can be directly and easily traced back to William Playfair’s technical book, ‘Commercial and Political Atlas,’ which was based on the British economy and published in 1786.
When and how is a bar chart used?
A bar chart is generally used when there is a need to display a distribution of data points or carry out a comparison of metric values across a variety of different subgroups of the data sets in question. They also allow for the comparison of one group against another. Since such tasks are common to basic data analysis, bar charts are a very common type of chart.
The main variable of the bar chart is the category variable. A categorical variable has discrete values which can also be thought of as labels. Some examples include, industry type, country, website access method, be it mobile or desktop, or types of visitors. Categorical variables can also have ordered values such as when objects are divided by size or time periods. An example of this is when the objects are categorized in terms of big, medium, small; or into a time series such as as first quarter and second quarter.
On the other hand, the secondary variable will be numerical in nature. The values of the secondary variable determine the length of each bar on the chart. The source of these values can come from a variety of sources. In their simplest form, the values could be a simple count or frequency of occurrences or even a proportion of how much of the data is divided for each category. In other instances, the values may refer to values such as averages, totals or some other summary measures which have been computed for each group separately.
Bar charts are best for displaying comparative data, as they are quite useful for representing data which has been grouped into ordinal or nominal categories. Additionally, bar charts are easy to interpret and as a result, are quite commonly used.
Getting the most out of a bar chart
Bar charts are one of the simplest kinds of data visualization methodologies, and the trick to making a good bar chart is to keep it simple. There are some basic principles to adhere to when creating bar charts.
Employ a zero-value baseline
The first rule of the thumb when creating a bar chart is to ensure that all the bars on the chart are plotted against a zero-value baseline. A zero-value baseline not only allows the reader to easily compare the lengths of the bars, but also enables a truthful representation of the data. A bar chart which employs a non-zero baseline or has some other gaps/ crunching of the scales on its axis can easily be read wrong and misinterpreted. This is because the ratio in the lengths of the bar will not match the ratio in the actual values the bar is representing.
Ensure regular forms for the bars
The second rule of the thumb is to never, ever modify the shape of the bars. Some tools do allow for the bar caps to be rounded rather than having the regular straight edges, but this just means that the interpreter of the chart will have difficulties in discerning where exactly the real value of the bar should be read. Should be read at the top of the semicircle at the end of the bar? Or is it to be read somewhere in the middle? So, whilst a little rounding of the corners of the bars are ok, it is important to ensure the bar is flat enough for the reader to discern its true value as well as provide easier and clearer comparison between the bars.
One more point to take into consideration here is three dimensional (3D) effects on the bars. Just like heavily rounded bars can make the real values harder to read, 3D bars also may have the same challenge. Also, 3D bars may cause the baseline to not be aligned correctly.
If the bars have different widths, it can look like they have differing values, even though there is no change in values along the category axis. This can be highly confusing, as readers will sometimes interpret volume differences in value differences.
Category orders
When putting together a bar chart, it is important to consider the order the bars will be plotted in. The conventional order is to sort the bars length wise- longest to shortest. While there is the argument that bar lengths are easily comparable, no matter the order they are plotted, having them in a logical order reduces the burden on the reader in interpretation, making comparisons and differences more easily visible.
The only exception to this rule is if the category labels are already inherently, in some specific order, in which case, the order takes precedence of course. For instance, if dates are being used, they should be listed in date order from left to right.
Color should be used wisely
While colors are very useful for comparisons on bar charts, the design choice of how color is used should be considered with great care. Certain programs will color each bar differently but this could be a distraction for the reader as the coloring can imply some sort of meaning. However, if the creator intends the color to be an aesthetic feature, it could create confusion. Color should therefore always be used with a specific purpose—such as a multivariate gradient to illustrate relative bar segment values—in order to tell a story.
Replacing bars with images
It could be very tempting to replace the usual bars with images and pictures which depict the category being measured. For example, replacing bars which represent currency with images of money bags. Creators have to be careful to not misrepresent data in this manner. If the images or symbols chosen scales both height and width with value, differences will appear much larger than they actually are in reality.
If the creator still feels that icons will be a useful way of depicting value, then the more advisable option is to use a pictogram chart type instead. While this is not perfect, it is still better than using images to represent bars. In a pictogram, the value of each category is indicated by a series of icons. Each icon here represents a certain quantity. But a word of caution; pictograms can make reading the values a bit more challenging, as the analyst has to perform a certain amount of mental mathematics to gauge what the relative value of each category really is.
Different types and additions to bar charts
There are a range of different bar charts. Some are different purely on an aesthetic level (and should be approached with caution), but there are some differences which can be useful in a business setting.
Horizontal bar charts
As the name suggests, data on this kind of a chart is represented using horizontal bars. The rectangular, horizontal bars display the measures of the data in question. The categories of the data are marked on the x-axis and the y-axis category displays the horizontal representation of the chart.
Vertical bar charts
In these bar charts, vertical and rectangular bars are drawn to represent the data measures. The bars are drawn vertically on the x-axis and represent the quantity of the variables on the x-axis.
There are two further categorizations under the horizontal and vertical charts.
Horizontal or vertical grouped bar charts
Grouped bar charts are also commonly referred to as clustered bar charts. Such charts display the discrete values for two or more types of categorical data. In such charts, the vertical or horizontal bars are grouped together by position. Coloring is used to code each group for similar data values.
Horizontal or vertical stacked bar charts
The stacked bar chart—whether horizontal or vertical—is also commonly referred to as the composite bar chart. This kind of bar chart displays the division of the entire bar chart into different sections. Coloring and specific labels are employed to easily identify categories. One rectangular bar in a stacked chart represents the entire parameter, and different segments can be represented on this bar using different colors.
How to decide whether a bar chart should be depicted horizontally or vertically depends on the type and amount of data being presented. Whilst vertically is the usual default type, it would make more sense to use a horizontal bar chart when the data has longer category labels. The vertical chart could overlap these labels and may render them illegible.
Value annotations
Value annotations are common additions to bar charts. While bar charts generally lend themselves to an easy comparison of bar lengths and heights as well as approximate values, exact values are not necessarily easy to represent or identify. This is where annotations come into play and can report the values where it is required. Annotations are generally placed at the end of the bar or in the middle.
Variability whiskers
In the event where numeric values for the bar chart are a summary measure, whether or not to include error bars is an important consideration. Error bars, in the form of additional variable whiskers, are added to the ends of each bar to indicate the variability of the individual data points which have contributed to the summary measure. The common choices for measures of uncertainty include interquartile ranges, standard deviations and confidence intervals. It is therefore quite important that in the event the user is displaying error bars, a note is made in an annotation or in a comment, to clarify what the error bars represent.
The alternative is to display or depict variance within each of the categories by using different chart types such as a violin plot or a box plot. While an addition of such plots will have more elements, a much deeper understanding of the data distribution is provided.
Lollipop chart
One more variation of the bar chart is the lollipop chart, in which information is represented just like a bar chart but with some slightly different aesthetics. The bars are replaced with lines which are topped by dots at the end points. Lollipop charts are most useful when the user is plotting several categories with values fairly close together. Lollipop charts are easier to read.
The case for and against bar charts
There are advantages and disadvantages for using bar charts. Creators wishing to display information in an easily understood and accessible way must consider the following points. Should a bar chart be used, or is there another form of chart that is more effective in that particular scenario?
Bar charts are great to use when there is:
- A large set of data that can be summarized in a simple visual form
- Differing categories of data to be displayed
- A fixed set of values for comparison (e.g. “Top 10”)
- A trend in data that is easy to see at a glance
Choose another option when there is:
- An underlying pattern or cause
- Summary statistics such as mean/ standard error
- Continuous data
When used appropriately, bar charts are a simple, effective chart type when communicating a fixed set of values for comparison.