A Game of Forms: Using Graph Theory to Design Giant Web Forms
We’re redesigning a product that real estate agents use to publish house listings on the internet. If you have ever sold or bought a home, you can appreciate how complicated listing one on the market can be. The realm of real estate law is colossal and complex. Imagine all that size and complexity rolled up into one intimidating web form. Are you imagining it? Yes? Okay, great.
Now, let’s really put this imagination of yours to the test. Imagine that web form has over 400 fields. Imagine that, not only does it have over 400 fields, but each of those fields can appear and disappear. Imagine over 7,000 business rules dictating when fields appear and which fields are optional.
Is your head spinning? Are you in the fetal position yet?
Admittedly, this is how we felt when we first took on this project. “So many fields!” we gasped. But, after some heavy breathing, we buckled down and put together a game plan. We first aimed to understand the relationships between the fields in the form.
Fields were appearing and disappearing because they all depended on each other. For example, let’s say I’m listing a residential property, rather than a commercial property. Because my property is residential, the form doesn’t ask me for drive-in door height. Drive in door height is only relevant to commercial properties.
As we glanced through the business rules, some fields had more dependencies than others. To limit how much the form would change right before the user’s eyes, we wanted to put the fields with the most dependencies early in the form flow. But we knew counting every single dependency for every single field wasn’t an option. We’d be older than Melisandre before we figured that out.
Enter Graph Theory
Luckily, graph theory saved us from the crippling arthritis of being 10,000 years old. As the name implies, graph theory is the study of graphs. A graph is a collection of nodes, which are connected by edges. A series of edges between nodes create a path.
In the example above, Nodes A and B are connected by edge 1.
Graph theory is a key part of social network analysis. For example, think of Facebook. Nodes represent the people on Facebook. If we are friends on Facebook, then there’s an edge between my node and your node.
When analyzing social networks, a frequent goal is to understand which nodes are most “important” to the network. Hm — that sounds eerily similar to our goal in understanding this beast of a web form. In fact, if we apply graph theory to our web form, then each field is a node. If field A depends on field B, then we add an edge between these two nodes.
But how do we go about understanding a field’s importance? In graph theory, we estimate a node’s importance by calculating its centrality. While there are many different ways to measure centrality, in this context the two most useful measures are degree centrality and betweenness centrality.
To calculate a node’s degree centrality, just count how many edges the node has. For example, in the graph below, node A has a degree of 3. In contrast, node B has a degree of 1.
While degree centrality is straightforward, it does not always tell us the full picture. Let’s take a look at betweenness centrality.
Betweenness centrality measures how often a node appears on the shortest path between other nodes in the network. In other words, a node with high betweenness centrality connects different groups within the network. Such a node will dominate how information travels through the network. The best way to describe this concept is with an unnecessarily detailed example. And what better example to turn to than Game of Thrones?
We all know of Varys, Game of Thrones’ bald master-of-whisperers. (If you don’t know Varys, then again, shame on you). Varys’ influence over the seven kingdoms is nearly unparalleled. They don’t call him “the master-of-whisperers” just for funsies.
You might think, “Well that’s dumb. The King obviously has more influence than Varys.” In some ways, yes. But think of it this way. Much of the King’s influence comes from his connection to Varys. While the King knows many powerful people at court, he does not have a direct line to the common people.
Meanwhile, Varys was born in poverty as a slave. He employs more common spies than I have Game of Thrones references. So, to wrap this all up, Varys acts as a bridge between the common people and the King. Without Varys (or someone with similar connections), the King never hears the commoners’ whispers.
Still complicated? Let’s take a look at the graph.
Understanding the Form
Now that we’ve gotten that primer on graph theory out of the way, we can start to analyze the graph for the form. I know, you probably thought we would never get there.
Manually calculating the measures used to analyze graphs is rather complicated and time consuming. Instead, I use a free, open-source software called Gephi. If you decide to give it a go, I’ll warn you that Gephi’s user experience is not exactly intuitive. But it does beat counting nodes and paths until your eyeballs bleed.
All you need to import data into Gephi is a .csv file like this:
We call this an edge list. Gephi would take this edge list and produce a graph like this:
So, now, the big reveal. After I imported the graph into Gephi and made a few adjustments, I got the following graph:
Yikes. That doesn’t tell us much, other than the fact that there are a whole lot of fields on this form. So let’s clean the graph up a bit. We’ll start out by applying a new layout to the graph in Gephi. Gephi lets you choose between 8 layouts. Each layout clusters and positions nodes in a different way. How the layouts work is beyond the scope of this article — plus, I don’t understand how it works. In the end, layout doesn’t matter much, because it doesn’t change your data. You can experiment to find which layout you like best. When I apply the Yifan Hu layout (I almost always use this layout), we get the following graph:
It’s a little better, but still not telling us much. Let’s say we want to know which fields in the form have the highest degree centrality. We can use Gephi to make nodes with higher degree centrality larger in size. To do so, click the size icon (three circles) in the appearance panel. Next, click attribute and select degree from the dropdown menu. Set a minimum and a maximum size for your nodes and then click apply.
Once we have applied the changes, we see a new graph like this.
We can see there are a few fields that have a much higher degree centrality than others. Let’s find out which fields they are. To do this, you’ll want to turn labels on. Click the impossibly small caret icon in the bottom panel to display labeling options.
Protip: Select the “Hide non-selected” option in the labels menu. Doing so will make Gephi only show the label which you have hovered over. Otherwise, the graph will show hundred of labels and be impossible to read.
Once we have labels showing, we can see that the field with the highest degree centrality is “property type.”
Property type refers to whether the listing is residential, commercial, or vacant land. It makes sense that the property type field has several dependencies, because each property type requires different data. For example, the height of a drive in door is important for a commercial building, but not for a residential home.
This visualization helped us decide to ask users for property type before they get into the full form. This way, the form is not changing right before their eyes as they edit. We followed the same thought process for other fields with high degree centrality.
Now let’s take a look at betweenness centrality. We want to be able to view degree centrality and betweenness centrality at the same time. To do so, we can map the color of the node to its betweenness centrality. For example, the darker the node color, the higher its betweenness centrality. To do this in Gephi, click on the “statistics” panel on the right hand side of the screen. From there, run the “Avg. Path Length” statistic.
Gephi will then show you some nice statistics — but those aren’t what we’re looking for. We want to go to the appearance panel and click the small color palette icon. From there, we can select Betweenness Centrality for the color attribute. By default, Gephi will use several different colors for your nodes, but I find it more useful to use the same color but with different tints and hues. To do this, click the small icon with multicolored squares. Doing so will allow you to create a gradient. Nodes with the lowest betweenness centrality will use the color on the left, while nodes with the highest betweenness centrality will use the color on the right.
Once you have set up your gradient, hit apply and watch the magic happen.
We now have quite a different graph. Nodes that are more red here have a higher betweenness centrality. We immediately see that very few nodes have a high betweenness centrality. Listing status is one of these nodes. Again, this makes sense. For example, when you change the listing status to sold, the system displays several new fields to ask for things like sold date.
This means you can’t even get to those fields unless you first complete the status field. Remember how Varys acted as a bridge between the commoners and the king? Well, the status field acts as a bridge between the listing form and these new fields. In the end, we used this graph to discover fields that we didn’t even know existed. Pretty powerful, huh?
Unsurprisingly, there is a lot more to graph theory than I have discussed here. If you would like to learn more, I highly recommend Jennifer Golbeck’s book Analyzing the Social Web. She discusses graph theory and how it applies to social network analysis.
Also, as always, I’d like to hear from you all about any ways you may have applied graph theory to your UX work. It seems to me like the opportunities are limitless. You could even use graph theory to help you prioritize requirements. But alas, we’ll save that conversation for another day.