In September 2014, David Guttman asked me to develop a tool that would display phylogenetic relationships in an easier to read format than the familiar tree paradigm that is currently used. Tree-like branching diagrams are efficient and intuitive when there are a limited number of sequences, but they do not scale well when working with very large numbers. Problems arise due to a poor overall use of space, arbitrary clade ordering, labeling issues, and the high cognitive load required for visually parsing hierarchical information. The figure below shows a typical problem with tree visualizations in that the proximity of the circled nodes suggests a closer relationship between the items than there actually is. In fact, the circled nodes belong to completely different clades, and it is just an artefact of how this tree was drawn that they appear to be close neighbours. If one imagines the tree as a hanging mobile in which the various branches and sub-branches are free to spin in any direction, a completely different set of neighbouring relationships would appear.
Dr. Guttman proposed that we develop a tool for displaying hierarchical relationships in the form of a “topographical map”, a top-down view that displays hierarchy in the form of a nested shapes. The figure below shows a mock-up, created with PowerPoint, that displays a simple phylogenetic tree on the left and the same information presented as a topographic map on the right.
There is precedent for a similar method of displaying hierarchical information. Isabel Meirelles a visual analytics researcher at OCAD University, recently shared a drawing with me by Maximilian Fürbringer that was originally published in 1888 that is remarkably similar to the mock-up that Dr. Guttman created. Although not a “topographical map”, these images depict slices of a tree diagram that depicts nested groups of child nodes in a very similar way.
Using a combination of Processing.js and a program called jsPhyloSVG for drawing phylograms, I developed a tool called Topographic Phylomap that reproduced Dr. Guttman’s mockup almost identically. Objects are created for each OTU with variables to store the current position, target position, stem length, parent id, sister ids and number of levels from the root. For each draw cycle, the target position of each node is adjusted so that it automatically attracts towards its sisters, parent’s sisters, and children if it has any. Contact with any objects that do not share the same grandparent or great grandparent will cause each node to bounce based on a radius determined by the OTU’s cumulative distance from the root. The current position is continuously updated based on the target position and a dampening factor that can be adjusted via a slider control. Everything remains in motion until each object settles into an optimal position and stasis is achieved. The end product is a topographic map-like image in which each element’s position is determined by on its own attractions, repulsions and individual size
The tool works well for small data sets. It makes it easy to identify related and unrelated items because they preattentively “pop out” to the viewer: items that touch are related, items that are separated are not. These relationships can be identified at a glance without having to trace individual branches back to the root. Unfortunately, the tool becomes less useful with large data sets. The figure below shows the large phylogenetic tree depicted above using Topographic Phylomap. It is virtually unreadable, even with the interactive features that enable users to highlight items of interest by mousing over a node or clade of the tree or the alphabetized list of OTUs on the right.
Given that the intent of the project was to find a better way to depict large phylogenetic data sets, it was clear that a new approach was required. I spent some time exploring the Circle Packing method in D3.js because it appeared to solve many of the problems with my own “force-layout” algorithm.
Despite promising results with small data sets, this method also proved to be not suitable for large biological data sets.
This is most likely due to the fact that the majority of clade splits in phylogenetic data are binary. That is, unlike the data in the original circle packing sample above (in which each parent node contains multiple children, with each child spreading out to fill all the available space within their parent), phylogenetic parent nodes typically only have two children. With only two children to fill each parent, there is a lot of wasted screen real estate. And in order to display all the nodes in the set one must shrink their size so much that they become virtually unreadable. Interactive features such as mouse-over highlighting and a zoom function that allows users to get a closer view of the otherwise hidden data, were helpful but ultimately insufficient.
Another flaw with the Circle Packing approach is that D3 does not provide a method for controlling the internal padding of individual nodes. This makes it impossible to show branch length data using this tool.
Another attempt at solving this problem involved using HTML <DIV> elements as a visualization method for displaying a hierarchy of nested items. One attractive quality of DIVs is that they automatically expand to contain however many items are assigned to them, and one can control their internal and external padding as well as their color and other style elements using CSS and jQuery. The figure below shows an example of this new approach.
Although promising, this approach also does not deliver on the initial goals of the project initially described by Dr. Guttman. For one, this method does not require any less space than the original phylogram. HTML div’s are by nature rectangular, so multiple children appended to the same parent will often lead to wasted screen real estate in the corners. And if the goal is to use padding around the shapes to convey a node’s branch length (sort of like how the spacing of topographic contor lines convey terrain slope), then it will be impossible to discern between empty space that is meaningful and empty space that is meaningless. Finally, HTML div’s were never intended as drawing tools so there are technical challenges associated with getting the right position and behaviour. One can set an “absolute” position for each element, but doing so interferes with the “float” behaviour that allows elements to move to the most space-efficient place within a parent.
At Wolfgang Stuerzlinger’s suggestion, I spent some time exploring voronoi trees, both with Foam Tree and in D3. Alas these did not work especially well with the phylogenetic data structure either:
In order to fully deliver on the goal of displaying phylogenetic trees as a topographic map, it has become apparent that I will need to create a unique HTML DIV-like object that combines the behaviours of the various techniques I have explored so far. To be clear, this is what the end result should look like:
As with my original Proessing.js code, these objects should move around independently to find the most efficient position in two-dimensional space, automatically attracting to their sisters and repelling from everything else. Like HTML DIVs, these objects should expand to contain however many child objects are assigned to them, and they should include parameters to set their size, color and padding widths according to the branch lengths and other data. As with the D3 Circle Packing effort, they should be inherently interactive, with a built in zoom function so users can transition between overview and lower level details. The most important feature these objects should have is flexible borders that automatically find the most efficient contour to contain whatever child objects are assigned to them, with no wasted space inside. Something like this:
I have not yet decided what language to write this in. Processing.js is a good drawing program, however it is slow and ultimately creates a bitmapped image file of a certain size (as opposed to vector graphics that can be scaled up for printing). D3.js provides some nice methods for creating SVG images, however it would require a fair amount of hacking to make it do something it was not written for. I have put this project on the back burner for now.