Sharpen Your Data Visualization Superpowers
How to Turn a Good Data Visualization into a Great One
Let’s now explore how to turn a good viz into a great one by following ten data viz “suggestments,” or rules of thumb, based on the science of how we perceive visual cues as well as the hard-won experience of other data visualizers.
Suggestment 1: Clean Data First
Many nonprofits have entry-level staff or multiple staff entering data into management information systems or spreadsheets. The result can be “dirty” data — data with a troubling level of inaccuracy because it has not been entered correctly or consistently. If, for example, Michael Smith is entered twice, once with a middle initial and once without, then tracking his progress through your program will be difficult.
To make sure data is accurate and thus of any value at all, make sure you regularly clean it. A few simple data cleaning procedures for spreadsheets include:
- Use a Spell Checker. Use a spell checker to find values that are not used consistently, such as a program name.
- Remove Duplicate Rows or Entries. Sort data and then scan rows to find duplicates. Filter data for unique values to find near duplicates. You can also find duplicates using an option under conditional formatting in Excel.
- Use Find and Replace. Use find and replace function to correct data entered incorrectly in multiple rows or entries.
- Use Upper Case and Trim. Change all text to upper case to ensure consistency. The UPPER(text) function in Excel will convert text to upper case. The TRIM(text) function in Excel strips extra spaces from text, leaving only single spaces between words and no space characters at the start or end of the text.
Online resources, such as Open Refine, also can be used to both clean data and transform it from one format into another.
Suggestment 2: Encode Thoughtfully
Cleveland and McGill (and others) have studied the types of visual “encodings” or “channels” people are able to decode most accurately and ranked them as depicted in the image above.
As shown, humans are pretty good at deciphering some visual cues and pretty bad at others. For example, we do well when comparing lengths along a common scale but more poorly when assessing angles. That’s why we can interpret bar charts much more easily than pie charts, as we saw in Let’s Use Florence Nightingale’s Secret Weapons.
The only exception is when we want to compare a part to a whole. In this case, a pie chart does a good job of showing that girls, for example, represent only a sliver of all the participants in a program or that thirty- to forty-year-olds comprise the majority of visitors to an event. But once you go beyond two or maybe three slices, and you want more exact comparisons among groups or parts, skip the pie and dust off the trusty bar chart.
Suggestment 3: Give the Most Important Data Points Prominence
The most important variables should be encoded with the most effective channels in order to be most noticeable, and then decreasingly important variables can be matched with less-effective channels.
It’s important to remember that humans can rapidly and accurately detect the presence or absence of a "target” element with a unique visual feature within a field of distractor elements. Case in point: In which of the images above is it easier to determine the number of nines?
Similarly, if you want to draw attention to one bar in a bar chart, give it a vibrant color and gray out the others, as in this image.
Suggestment 4: Show Order
We humans are great at detecting patterns, even when none exist (think conspiracy theories). From an evolutionary perspective, pattern recognition has helped us to understand what we see and make predictions that help us survive and reproduce.
Order is a particular type of pattern. It is the arrangement of people or things in relation to each other according to a particular sequence. So when there is an order to our data, we should show it. Our pattern-seeking minds will thank us for delivering up a real pattern and making it so easy for us to see.
For example, arrange bars on bar charts in descending order so that viewers can easily pick out the top/bottom or the most/least. In the visualization on the left, the viewer is challenged to quickly discern the most expensive item because several bars are quite similar in size. However, the right-hand chart makes it immediately clear that cream is the most expensive item, followed by milk and eggs.
Suggestment 5: Clarify (Don’t Confuse) with Color
Color is a great tool for drawing attention to certain data points in a graph, chart, map, or diagram. But color can also obscure data. Adopting a few rules of thumb will turn a rainbow of confusion into an elegant and clear picture:
- Assign only one meaning per color. If you are color coding a map and assigning blue to a certain income range, do not use blue to mean anything else in that map or adjacent related visuals. Blue always means that specific income range.
- Limit the color palate. Limit you graph, chart, map, or diagram to a few complementary or monochromatic colors. Remember the color wheel? (If not, see this image.) Choose complementary colors that are on opposite sides of the wheel: think orange and blue and yellow and purple. Or choose several tones of one color (a monochromatic color scheme). Looking for effective, ready-made color palates? Check out sites like color-hex.
- Avoid reds with greens. Seven to 10 percent of men are red-green colorblind and can’t tell the difference between the two. Avoid putting red and green on the same visualization.
The map above effectively uses color to show data on chronic illness in the United States. It uses a diverging color palette to emphasize contrast (e.g., above/below average). According to data from a Centers for Disease Control survey, states colored in blue tones have, on average, more healthy adults while those colored in orange hues have, on average, less healthy adults. Gray states are somewhere in the middle.
Suggestment 6: Delete Non-data Ink (or Pixels)
This data tip comes from the grandfather of modern data visualization: Edward Tufte. He originally recommended the elimination any non-data ink from data visualizations, although today we might think more in terms of pixels than ink. The idea is to remove any distractors from the story that a data visualization shows. Such distractors can include bells and whistle such as bars on a bar chart drawn as people or buildings (Tufte called this “chartjunk”).
But there are more subtle distractors like graph lines and background color. The two images here show the same data, but the one on the right is stripped down to the essentials: no graph lines, no axis titles, only the visual information necessary to see the slope and to quantify it.
Suggestment 7: No Unjustified 3D
The idea for this and the next two tips comes from Tamara Munzner, a professor at the University of British Columbia’s Department of Computer Science and author of Visualization Analysis and Design. As may be clear by now, the idea is to focus on what matters—the story the data is telling us—without any unnecessary distractions.
Making visualizations look three-dimensional is almost always a distraction and a distortion. To make something look 3D, you have to use a technique called “foreshortening” which means the parts that are supposed to be perceived as closer in space are larger (see slice B in the image on the left), and parts that are supposed to be perceived as farther away are smaller (slices A and C). The angles represented on the 2D chart on the right show that the slices are actually quite similar in size. And if we were better at judging angles (see Suggestment 2 above), we would know that slices A and B are the same size.
Is it ever a good idea to make data visualizations look 3D? Yes, but rarely. The rule is simple. Only use 3D visualizations for 3D spatial data such as a diagram showing airflow over a spacecraft. Otherwise, keep it flat.
Suggestment 8: Eyes Beat Memory
It’s easier to compare two things you can see at the same time than to compare something you can see to something you can only remember. When several small visualizations are placed side by side (called “small multiples”), you can see the power of eyes over memory.
For example, in this great small-multiples visualization by Doug McCune, you can quickly scan the images to make easy comparisons. In each chart, the X-axis shows time of day and the Y-axis shows number of crimes. Daytime crimes are displayed with yellow bars in the top half of the chart and night-time crimes with blue bars on the bottom. It’s easy to see that driving under the influence and drunkenness occur more often during the night and trespassing and suicide occur more often during the day. It would be much harder to draw this conclusion flipping through pages or clicking through screens.
Suggestment 9: Zoom In
Munzer’s rule of thumb is: “Overview first, zoom and filter, details on demand.”
The idea is to give the big picture first in a series of data visualizations.
If, for example, you are showing fundraising data by county, first provide a map of the counties within the region of interest. Color the counties using a monochromatic color scheme, with lighter colors showing lower fundraising amounts and darker to show higher amounts. This broad view will identify areas of concern and generate questions.
A viewer who wants to boost fundraising in counties with low amounts might ask questions like: Which neighborhoods within these counties have particularly low fundraising amounts? Who lives in these neighborhoods? Are they mostly families? Retirees? What types of nonprofits do they give to?
Some types of data visualization software (like Tableau) allow you to create interactive visualizations and thus easily zoom into the data. For example, a viewer clicking on a portion of the fundraising map can see fundraising data by neighborhood. Filters can be added to visualizations so that fundraising amounts for certain demographic groups can be seen.
A good strategy is to create an overview visualization first and then share it with those who represent your intended audience to see what question it generates for them. Then add filters and more detailed visualizations that allow viewers to address these questions.
Suggestment 10: Viz Is Not Always Better
I’ve talked about how we can use our visual superpowers to quickly grasp data. Letters and numbers, as I have explained, are more difficult for our brains to process than visual cues such as length, color, or shape. So is it always better to represent data using such cues? Or are there times when a spreadsheet is a better tool for the job? The answer: spreadsheets (aka tables) are better than visualizations particularly when:
- You have an already-engaged but diverse audience. These are folks who are highly motivated to access certain data and won’t be annoyed by having to find the data on a table. Moreover, tables use paper or screen real estate efficiently. You can fit a lot of rows and columns in a small space allowing users with different interests to find data in a single table.
- You have many units of measure. For example, you want to show the height, weight, location, and satisfaction level of one hundred participants in a healthy-eating program. This data involves four units of measure: inches, pounds, latitude/longitude, and survey ratings. Such complexity is difficult to represent on a single visualization, but you can do so in a single table quite easily.
Consider these suggestments as a checklist. When revising your viz, review the list and see how your viz does on these ten tips. Chances are that if you follow all or most, you will have a viz that is clear and easy to consume. And if you don’t, it may not be consumed at all.