Information Visualization: Challenge for the Humanities

 

Maureen Stone, StoneSoup Consulting

 

Final draft...last changed December 22, 2008 10:04 PM PDT

comments welcome
 

Challenge: To make fluency in the tools of digital collaboration, visualization and analysis a standard skill for the humanities scholar. This will in turn evolve innovation in humanities research, including:

  • Developing new genres for complex information presentation that can be shared, analyzed and compared.
  • Creating a literacy in information analysis and visualization that has the same rigor and richness as current scholarship.
  • Expanding classically text-based pedagogy to include simulation, animation, spatial and geographic representation.

Tools influence both the creation and the analysis of information. Whether using pen and paper, Microsoft Office, or web 2.0, scholars, who are both creators and consumers, will base their process, production and questions on the capabilities their tools offered them. Digital archiving and the interconnectivity of the web provide new challenges in terms of quantity and quality of information. It creates a new medium for presentation, and a foundation for collaboration independent of physical location.

Information in digital form provides unequalled opportunity to combine, distill, present and share complex ideas. The challenge is to do so in a way that balances complexity with conciseness, accuracy with essence, that speaks authoritatively, yet inspires exploration and personal insight. This presentation will go beyond illustrated texts organized as pages, or even as web pages to include interactive, graphical representations based on data.

While literacy in all new media will be crucial for digital scholarship of the future, in this white paper I will focus on information visualization,or the creation of graphical representations of data, which harness the pattern recognition skills of the human visual system. The skills that support information visualization include data analysis, visual design, and an understanding of human perception and cognition.

As my specific expertise is color, I will include both the use of color in visualization and the visualization of color in art and history as an example.

What is information visualization?

In computer science research, the term visualization describes the field of study that uses interactive graphical tools to explore and present digitally represented data, which might be simulated, measured, or archived.

The visualization field split off from computer graphics in the mid-1980’s to distinguish graphics rendered from scientific data from algorithms for creating images of natural scenes, many of which were a blend of scientific, artistic, and technically pragmatic techniques. A further division occurred in the early 1990’s to distinguish scientific, or physically based data from abstract “information visualization” such as financial data, business records, or collections of documents. More recently, the term “visual analytics” was coined to emphasize the role of analysis, especially for extremely large volumes of data. While valuable to provide different foci for publication, many in the field (myself included) find the separation into different communities more pragmatic than intellectually important.

The primary publishing venue for research in visualization is the IEEE Visualization conferences and the supporting IEEE publications, Transactions on Visualization and Computer Graphics (TVCG), and IEEE Computer Graphics and Applications (CG&A). Visualization relevant work can appear, however, in many other fields, including computer graphics, human-computer interaction, vision, perception and digital design, as well as fields that extensively use visualization, such as cartography and medicine.

Visualization is certainly not unique to the computer science domain. Edward Tufte has written a series of books on the visualization of information that are considered seminal in the field. Tufte’s books are full of fascinating examples of how information can be graphically presented. Tufte also lectures extensively on the topic, forcefully promoting his personal (usually excellent) views on the best way to present information. Tufte’s principles of for excellence in visualization emphasize conciseness, clarity and accuracy.

Graphic designers will assert that the graphical presentation of information is their fundamental goal, which they achieve based on principles basic to art and design: hierarchies of importance, spatial relationships, layering, contrast vs. analogy, legibility, and readability. These are constructed from careful choices of positioning, shape, color, size, and typography. Cartographers combine these same elements within their discipline to create exemplars of information display, as do medical illustrators and other specialists working within their fields of study.

Historical visualization

A complete history of visualization is beyond my scope, but here are some historical examples that are often cited in talks and classes on visualization. Most have been used as examples by Tufte in his books.

William Playfare (1758-1823)is credited as the father of graphical methods in statistics. His inventions include the bar chart, the pie chart and time-series graphs. His goals were political, his focus government spending.

John Snow (1813-1858) used a dot plot of cholera cases overlaid on a London street map in 1984 to discover and illustrate the source of the contamination.

Charles Minard (1781-1870) created an information graph published in 1869 illustrating Napoleon's disastrous march to Moscow in the Russian campaign of 1812. The flow diagram, plus its paralleling temperature diagram, poignantly illustrates the number of men that died as the temperature dropped to bitter levels.

The value of digital visualization

Digital visualization enables creation and exploration of large collections of data, such as the CLIR archives. I would argue, however, that the tools for collection are far more successful to date than those for exploration. Other than size, what value does digital visualization provide?

Digital visualization enables interactive exploration. Compare spreadsheets with graphing capabilities (such as Microsoft’s Excel) and dynamic maps (such as Google maps) to their static, paper-based versions. I would argue these two examples are probably the most influential form of digital information visualization yet discovered.

Digital visualization can be combined with simulation to simultaneously explore many potential solutions along with the probabilities and dependencies that influence these solutions. Brain surgeons, for example, can use the data from a CAT scan to explore different approaches to removing a tumor. Similarly, such data can be used to create simulators for training.

Digital visualization can be used to monitor continuously changing streams of data. Many major metropolitan areas have a website that shows traffic flow in real time, such as the one provided by WSDOT for the Seattle area.

Digital visualization facilitates collaboration. Collaboration in the sense of sharing is fundamental to the web, and to digital archiving. The website Many Eyes, however, provides a forum for people to upload their data, create different visualizations, and for other people to comment on them. Like photo and video sharing, there seems to be an irresistible attraction to posting and getting feedback on one’s creations and discoveries.

The dark side of information visualization

My concern is that digital tools are out-running literacy in the art and science of graphically presenting information. Put more bluntly, it is too easy to make pictures that confuse, miscommunication or downright lie, either inadvertently or deliberately. Tufte’s books show many examples of graphical distortion created by inaccurate uses of scale and perspective, extraneous graphical elements (“chart junk”) and improper presentation of data, such as a graph of costs over time that does not adjust the dollar amounts for inflation.

Even Tufte is not immune to the risk of misusing visualization. After the Challenger disaster, he analyzed and redesigned the graphs used by the Morton-Thioko engineers to communicate their analysis and concludes that if they had visualized their data more effectively, the risk of launching in cold weather would have been clear and persuasive. This example is frequently used (including by myself) to dramatically illustrate the power of visualization. I recently uncovered a substantial rebuttal by the engineers, which argues Tufte did not fully understand the context or the data, and is therefore guilty of falsely making the engineers responsible for the disaster.

A common criticism of visualization tools, both research and commercial, is that they do not embody basic visual design principles. By default, colors are too bold, lines are too thick, and fonts are too small. The result is cluttered, ugly, and at worst, misleading. The most recent release of Microsoft Office, with its ubiquitous tools Excel and PowerPoint, touts its refined graphics. But, the result is a disaster from a visualization standpoint. Colorful, transparent, rotating 3D bar charts make good “eye candy” but do not communicate their information about their underlying data any more clearly than a simple 2D graph. In fact, they are worse, because the 3D perspective distorts the numeric relationships that are represented by the relative heights of the bars.

Stephen Few is a consultant working in the field of business intelligence whose primary mission is to improve the presentation of business graphics. Stephen’s website has many examples of terrible visualizations that he has analyzed and redesigned, most made by commercial systems. His book, Show Me the Numbers, teaches how to concisely and effectively communicate with simple charts and graphs. This requires understanding the data, the audience, and the problem being solved. These skills must be taught, and I would argue are important for everyone to learn. (Stephen has an online Graph Design IQ Test to demonstrate this point)

People’s response to graphics is not purely intellectual; there is a strong visceral and emotional response, as is well appreciated by those in the advertising and entertainment industries. Pictures made from data are no exception, so both authors and consumers need to be educated about the impact of choices in layout, color, typography and imagery, topics more commonly taught in art and design.

Creating excellent new tools for visualization requires technical skills, visualization skills, and a deep understanding of the problems and tasks critical for a particular domain. One common criticism of visualization research is that it presents techniques that are technically interesting, but do not provide solutions to real problems. This is a classic problem in research tool and system design, where technologists have a vision, based on what is computationally possible, but lack an understanding of what is really needed to solve the problems of their potential users. Potential users (or “domain experts”), however, can rarely concisely articulate their needs in a way that directly informs the technological development. Successful collaborations will blend the skills of both, but are all too rare.

Teaching information visualization

Information visualization is traditionally taught as a graduate level course in computer science departments. The focus is on teaching students already fluent in computer systems and technology how to create innovative information visualization tools. Often, the text is Colin Ware’s Information Visualization: Perception for Design, plus Tufte’s Envisioning Information, augmented by selected research papers. Students typically create a project for their grade.

More recently, there have been courses designed to teach to undergraduates, often in disciplines other than computer science. With a colleague, Polle Zellweger, I designed and taught an information visualization course as a 4th year undergraduate elective in the University of Washington iSchool. (Info424 2006, 2007). We based our course on others, including the one taught by Marti Hearst at UC Berkeley (UC Berkeley CS558), and Melanie Tory at the University of Victoria. We collected material more widely, especially from Pat Hanrahan (Stanford CS448B), John Stasko (Georgia Tech CS7450), and Tamara Munzner (University of British Columbia CPSC 533C).

We found it an enormous challenge to select and focus the material to be taught. Is the goal to teach the students to design visualizations from basic principles, or to become fluent in existing tools? Should they focus exclusively on data visualization, or should the course include general topics in visual communication as well? Is the primary goal to make them aware of the broad range of visualization models and tools or to teach them specific skills, such as how to make good data graphs as taught by Few?

Visualization is a skill that must be practiced for fluency, but that takes time. Art and design schools teach visual communication by making students create, critique, and redesign. They assume a fluency in whatever medium is being used. Digital visualization can be taught the same way, but a single class will have to be very focused on a specific tools and visual forms. Data visualization requires a good understanding of data, how it is structured, basic data manipulation and basic statistical analysis. Interactive visualization requires understanding of basic human-computer interaction techniques and the principles that underlie them.

Our choices are reflected in the class websites, but I do not believe we have in any way solved this problem, which is a critical one for iSchools. Our efforts to provide concrete skills focused on data graphics, for which we used Stephen Few’s book and taught the students how to use the commercial visualization product from Tableau Software. While important, this is too narrow a focus for visualization literacy in iSchools and the humanities. We also used Tufte’s Envisioning Information, for its rich insights, but that does not provide any exposure to interactive and animated visualization. Over the two years, we tried several approaches for including interaction principles and skills, relying heavily on examples found on the web, but were never entirely satisfied.

Color in visualization, and the visualization of color

Color is considered a key element in visualization. It can be used to label, to quantify, to focus attention, and to contribute to the visceral sense of style. The perception and cognition of color is also important and is strongly linked to its usefulness in visualization (as well as our overall view of nature and the world). The mechanisms for creating color are fascinating and complex, from the displays in nature to the technology of paints, dyes, film and digital media.

Like visualization, color can be viewed from a scientific, artistic, and technical perspective. Using color effectively requires insight and practice. For the second half of this whitepaper, I am going to discuss color literacy as a sub-specialty of visualization literacy, starting with an example.

The craft of color: an example

In his book Envisioning Information, Tufte attributes the excellence of Swiss cartography to “good ideas executed with superb craft.” The resulting maps pack an immense amount of information into an elegantly useful visual package. Traditionally, I would now include an image of such a map as an illustration, but it would not capture the true beauty of the original, and at worse, would give a completely incorrect impression of its appearance.

Maps are traditionally designed to be printed on paper, with the specific technique depending on the age of the map. I believe the map Tufte admires was designed to be printed on an offset printing press. An offset press prints in inks of different colors, but with no gradation in the color, in contrast to film or displays. For any given spot, ink is either present or not, with high frequency patterns called “screens” or “halftones” used to vary the lightness. Offset inks can be any of a wide range of colors, and can be either transparent or opaque.

The high-quality printed map that Tufte admires would be produced so that each different color was printed as a separate layer, using as many as a dozen different printing plates, each with its own colored ink. The design of the map would take every possible advantage of this process. Each information layer, whether contour lines, grids, text, or the shading to indicate topography, would be crafted to print beautifully.

A commercial offset printer does not have the luxury of unlimited numbers of plates and inks, but instead uses exactly four standard colors: cyan, magenta, yellow and black. To reproduce a map in a textbook, for example, requires simulating the original map colors by halftoning and combining the standard four colors. Some of the original colors may not be accurately reproducible, which can change the effectiveness of the color encoding. Halftoning also introduces texturing. As a result, symbols that were crisp and legible when printed with a solid ink may become fuzzy and less easy to read. A map designed for a commercial offset press, however, would be crafted to ensure that fine lines and text were printed with dark, sufficiently solid colors and that all colors used in the color encoding would print reliably and distinctly.

Reproducing Tufte’s map on a display introduces the complex color transformation problems between displays and print, and also the relative crudeness of the display resolution. Features smaller than a pixel must either become larger or blurred, resulting in illegible or overly bold contour lines, symbols and text. Map designed for displays, however, replace these fine features with the ability to dynamically zoom and label. Colors, too, can be dynamic, adding a new dimension to the color encoding.

In all cases, visual perception constrains the choice of line weights, fonts and colors. The visual factors that affect the legibility of text, symbols and fine lines are spatial acuity and luminance contrast. Spatial acuity is the ability to focus on and discriminate fine patterns of lines (edges), and contrast is the difference in perceived lightness (luminance) between a foreground object and its background. The choice of colors for rendering and encoding must consider not only luminance contrast, but also the effects of simultaneous contrast and spreading.

What can we learn from this example, other than that good color is hard? First, it should be clear that designing well with color requires knowledge of the materials used to produce it, and some practical knowledge of human visual perception. It should also be clear that what makes color aesthetic and effective depends on the technical properties of the medium (and also the culture and economics that support it). Finally, it serves as a warning about the complexity of archiving color, as viewing its digital rendering will not be the same as viewing the original object.

Color design guidelines: Do no harm

Tufte’s primary rule for color design is “Do no harm.” The more complete quote talks both of the power of color in visualization and its ability to confuse, and therefore recommends using color sparingly and only for very specific purposes he calls ‘fundamental uses.” These are: “to label (color as noun), to measure (color as quantity), to represent or imitate reality (color as representation), and to enliven or decorate (color as beauty).”

Consider this map of St. Thomas designed by the National Park Service. Color is used to label the different features, as shown in the legends. These colors are chosen to reflect reality, as in the mangroves vs. the water. The depth of the water is used indicated by its shade of blue. The contour shading indicates the terrain, as clever use stylized color to simulate 3D. Color labels the size of the roads. The dark color of the main road is achieved by outlining it in black, a clever use of simultaneous contrast. Most text is black, and sits visually as a separate labeling layer. The ferry docks are emphasized with black boxes containing white text.The entire map is beautiful, yet functionally designed. Here is another map of the same region, showing a different rendering (and somewhat more information). Its color encoding is similar, though the overall presentation is more cluttered.

Learning how to do excellent visual design takes dedication, skill and a lot of practice. With appropriate tools and guidelines, learning to avoid making awful visualizations may be simpler.

Example: voting system guidelines

I have recently completed a contract for the National Institute of Standards and Technology (NIST) to write a set of guidelines for the use of color in voting systems. A primary motivation was to ensure accessibility for individuals with color vision deficiencies, but we were able to create guidelines that should greatly improve the effective use of color for everyone. The irony is that color use in paper ballots is usually adequately constrained by the economics of printing; white paper, black text, perhaps one other color for labeling. But, given a color digital display in a voting kiosk, developers now have the opportunity to use, and to grossly misuse color.

The goal was to create a simple, testable, set of rules that would eliminate the gross misuses of color and encourage its proper use. Our first goal was legibility, which is most easily achieved by severely restricting the use of colored text. Our second goal was avoiding the “color chaos” caused by the indiscriminate use of color. For this we required a consistent mapping between color and its function. (guidelines).

Example: Make the easy choice the right one

Tools for creating visualizations have the opportunity to encode good practice in their design. An example is the system created by Tableau Software for data exploration and visualization.

Tableau Software was the outgrowth of research at Stanford University on data visualization and analysis. It is a software package run on a workstation that makes it easy to interactively create charts, graphs and data maps to explore a database of numerical and categorical information. Fundamental to the design of the user-interface for this system is the desire to make it easy for the user to create effective, aesthetic visualizations.

I worked with Tableau to design the colors, and equally important, the interfaces used for assigning colors to their data visualizations, which consist of tables, graphs, scatter plots and bar charts. As well as designing color palettes that were legible, and uniquely colored (for labels), or smoothly varying (for quantity), I worked with the developers to design user interfaces that encouraged good use of color.

Unlike most color selection tools, which allow users to choose a color point in some color space, the guiding principle for the Tableau UI is to map a set of colors to data. For labeling, users first select a palette, or set of coordinated colors, which can be applied in one operation to the entire dataset. Users can also select individual colors from different palettes, or even customize individual colors using a traditional color tool. But, the simplest operation is to accept the default palette, or to choose a similarly well-crafted one. A similar approach was used for the colored ramps used to map colors to data.

My colleagues at Simon Fraser University and I have begun some studies of grids and other visual reference structures which are traditionally designed to be low contrast, yet legible (abstract). A key way graphic designers layer information without causing visual clutter is to carefully control the relative contrast of the different data elements. These can be carefully designed for a specific set of information and medium, but in digital visualization, both are dynamic. We seek ways to understand and quantify these subtle aspects of visual representation required in dense information displays such that they can be algorithmically manipulated to match human requirements in interactive and dynamic conditions.

Our approach to this problem is not to characterize “ideal” or “best,” but instead to define boundary conditions, outside of which the presentation is clearly bad. We reason that the best solution will always be contextual, as well as a matter of taste. Boundary conditions, however, are more likely to have simple rules that can easily be incorporated by engineers and researchers, and less likely to be influenced by taste.

Visualizing color

That colors change when reproduced is not new with digital media. Posters of great artworks provide only an impression of the original work. Such reproductions have value. The important thing is to understand their context and limitations, and then to augment them with additional analysis and information.

Even a crude reproduction can answer basic questions about form, layout, and even about color and shading. The change in painting style from medieval images of the Madonna (which are flat and feature a wealth of gold leaf), to a painting by Rubens, with its lush and subtle shading, should be clear in the most basic of reproductions. But, a comparison in any depth of 13th century colors to those of Rubens should be approached with caution, and should not depend on pictorial reproductions alone.

In The Bright Earth, Philip Ball persuasively argues that to fully appreciate color in art requires an understanding of both the chemistry and economics of color; the Virgin's blue cloak colored with pigment made from ground lapis lazuli is not only beautiful but expensive, reflecting the status of the patron who commissioned it. In a digital visualization, we may not see the proper colors, but we could link to discussions of historical color, to a spectral analysis of the particular paint, and to a symbolic visualization of the color relationships in the painting.

Art curators and historians know that colors change over time, so that even the "original" seen today is not what is painted. A dramatic example is the discovery that Greek and Roman statues, whose white purity has been held as an artistic ideal for generations, were quite likely painted. These theories are supported by careful surface analysis of the stone, as well as historical references to painted, lifelike statues.

To illustrate the effect of the coloring, full sized models have been created and colored with historically accurate paints. Pictures of these reproductions with their shockingly bright colors are effective illustrations. Viewing the models themselves, however, will provide a much more accurate impression than any picture, just as viewing Michelangelo's towering statue of David is very different than looking at a picture of it. This is not just a limitation of imaging; it is a fundamental part of perception.

The digital data used to create the models could be used to create a virtual model in 3D, which could be dynamically colored to explore competing theories of coloring. It seems likely, for example, that the bold colors proposed so far are merely the undercoat of a more subtle coloring, and would have been refined with layers of sophisticated overpainting. 3D graphics models of antiquities are routinely used to illustrate and explore archeological data (CG&A Sept 2002, Canterbury. Digital Michelangelo) Differences in pigments, lighting, and painting styles could all be explored and compared.

A good example of digital color reconstruction is work on rejuvenating Seurat's Palette. The colors of the original painting, which hangs in the Art Institute in Chicago, have darkened and yellowed over time, especially those containing zinc yellow. By simulating the physical properties of this pigment, and translating them to color, Roy Berns and his colleagues have been able to simulate the original appearance of the painting (essay).

Summary: Become literate about data, skeptical about pictures

In summary, the effective distillation of knowledge from information requires tools, one class of which are the abstracted graphical presentations called information visualization. Digital information visualization provides potentially tremendous power, but also risk. Like all powerful tools, effective design and use requires education, training, and iterative refinement.

The hypermedia and computational underpinnings of Web 2.0 provide more than adequate technology. What is needed is insight and good design to apply this power to studies in the humanities. Most critical is active involvement by those most interested in the results. Their information goals must drive the tools, not the inverse.

Literacy in information analysis requires a willingness to grapple with data in all its untidy forms, including missing, incomplete and contradictory entries. Good scholarship involves moving through layers of abstraction, using visualization to summarize, but also to "drill down" to the supporting information structures. Good tools for scholarship must always include ways to view the underlying assumptions, to visualize and examine alternative interpretations, to expose the degree of uncertainty.

The pictures generated as information visualization must be crafted with care and viewed with suspicion. Then, they will they correctly have the ability "to express 10,000 words."

Comments are welcome

Send email to
Please put CLIR in the subject to dodge my spam filter