Data cannot speak for themselves. Data are never neutral. Data have biases and limitations, vulnerabilities and uncertainty. And when data are put into a position of power, data are often twisted and contorted in countless ways. As the economist Ronald Coase once said, “if you torture the data long enough, it will confess.”
Learning how to truly see data is one of the hardest parts of doing data science. The first step is recognizing that data cannot be taken for granted. Data must be coaxed into showing their weaknesses. The weaknesses are not always obvious. As a tool, visualization can help reveal data’s weaknesses, or obscure them.
Associated with all of this is the context in which data is used. boyd argues that visualizations can help reveal data’s weaknesses, or obscure them.
The work of visualization — like the work of animation — is fundamentally about communication. Even if your data are nice and neat, the choices you make in producing a visualization of that data shape how those data will be perceived. You have the power to shape perception, whether you want to or not. There is no neutral visualization, just as there is no neutral data. Thus, in building your tools, you must account for your interlocutors. What are you trying to convey to them? When do you need to stretch the ball so that the viewer sees the information as intended?
The challenge we face is being conscious of these limitations and the way in which data is politicized and perverts.
When you build a visualization tool, you will want to see it for all that it can be, for all that it can do.
It is interesting to consider this in regards to my work with schools and absence data. I am often asked to represent seemingly simple questions, such as how many days was x away or how many days was x late? One of the biggest problems is that this is often based on assumptions that the data has been entered both correctly and uniformly. For example, what constitutes a ‘late’? Is it when a student arrives at 9:15am? What about 10:30am after an appointment? In addition to this, when you count absences or attendances, are you counting excursions? Days that students have been asked to work from home? Days when students have stay home due to having symptoms? I feel this only becomes more confusing when you step back and view the numbers at large.