bob baxley



Cordpub

Background

I recently revamped my website and created an improved workflow for tracking my publications. Until this change, to track my publication list, I manually updated a static document on my computer and then cut and paste the list into an HTML document. Around the same time that resolved to improve my web presence, I started teaching an information systems course at Georgia Tech. Part of that curriculum included some light-duty programming in javascript and visualization prototyping in D3.

In working through examples for my class, I came across Ben Bederson’s excellent publications page and wanted to try to do something similar. After brainstorming for a bit, I came up with the following features I wanted to include:

  • Backend database to store publication information
  • D3 Chord Diagram to display the links between my co-authors
  • Dynamic Javascript filtering by date and co-author
  • No-cost hosting using Google services (Blogger, Apps Scripts, Google Docs); I did not want to have to host an instance of MySQL, for example, through for-free hosting service.

Google Query Language

As I said, I didn’t want to pay to host a database. Granted you can get web hosting for a few dollars a month, or I probably could have gotten space on a Georgia Tech server, but I didn’t pursue these options. In addition to the cost, I didn’t want to have to deal with another account administration portal. Also, I had recently become aware of Google’s Query Language that is packaged with their Javascript visualization toolbox. Basically, GQL let’s you treat Google Spreadsheets as SQL tables which you can query from Javascript. Awesome!

Using GQL, I simply need to create a row for each publication in a Google SS and then embed a few lines of JS code in my publications page to pull the publication list from the SS. With a few more lines of JS, the publications page can filter my publication list by altering the GQL criteria. Detailed examples are here.

Database

With GQL established as my query method, I needed to create a table of publications. Ideally, I would normalize my data into several tables: one for the publication list, one for the conference name and one for the author list. But because GQL doesn’t support JOINs, I collapsed all of the data into one table. If one were motivated, I believe it is possible to implement a JOIN-like functionality in JS, but I didn’t bother.

My table of publications is just a Google Spreadsheet. The URL is

https://docs.google.com/spreadsheet/pub?key=0AqVSrKawR254dENJa2N1TUpWdF9QLXZFaFVtMVN0ZWc&output=html 

By changing pub to tq and erasing the output the URL becomes

https://docs.google.com/spreadsheet/tq?key=0AqVSrKawR254dENJa2N1TUpWdF9QLXZFaFVtMVN0ZWc

This returns a JSON-like file of the spreadsheet data. I can actually pass GQL queries as part of the URL. For instance, to view all papers published in 2013, I would execute

SELECT * WHERE D=2013

Using the GQL URL encoding tool at the API reference site, the URL becomes

https://docs.google.com/spreadsheet/tq?key=0AqVSrKawR254dENJa2N1TUpWdF9QLXZFaFVtMVN0ZWc&tq=SELECT%20*%20WHERE%20D%3D2013

With this all of my data is stored on my Google Drive. To update my publication list on my website, I simply have to add a row to the Google Spreadsheet.

D3 & Chord Diagrams

D3 is a Javascript visualization library that is freely available at d3js.org. You have probably seen D3 visualizations on the New York Times website. Mike Bostock, who created D3, made some particularly neat visualizations for the election.

Chord diagrams are useful for displaying directed graphs. Consider the graph below that has directed and weighted edges. This graph could represent the migration between three neighborhoods, for instance: biking between neighborhoods example. I emphasized the weight values in the plot by enlarging the edge line, but even with this visual cue it is relatively hard to interpret. A chord diagram is another way of looking at the same information.

image

Before we get to the diagram, consider that we can characterize the same graph with a matrix:

Using the D3 chord example, it is easy to get the following chord diagram (JSFiddle code):



In the diagram, the length of the arc is the row sum of the matrix. For example, node A has an edge with weights three and four emanating from it to the two other nodes. This total value is displayed with an arc length of seven in the chord diagram. The width of the chord leaving from A to B is four and from A to it is three. Likewise, from B to A, the width leaving the B arc is two and from B to C the width is one. This is all the same information that is encoded in the graph drawing as well as the matrix.

However, the chord diagram also encodes the net flow as a color. Specifically, the color of the chord is corresponds to the node that is a net importer of edge weight. For instance, the graph diagram show that the edge from A to B has a weight of four while the B to A weight is only two. Therefore B is a net importer and the chord between these two nodes will be the color of the B arc–brown in this case.

Co-Authors as Chords

In order to plot co-author coincidence as a chord diagram, I simply have to characterize the interactions of a set of co-authors as a directed graph. Then I can represent the directed graph as a matrix and simply plot the matrix using D3. It is best to explain this process with a simple example. Assume we have four publications and they are authored subsets of six people who are labeled: A1, A2, A3, A4, A5, and A6. The publication list is as follows:

A1, A3, and A4, "Publication 1"
A1, A4, A5, and A6, "Publication 2"
A2, A1, and A5, "Publication 3"
A6 and A1, "Publication 4"

My first objective is to get the chord diagrams arcs to show how many publications the author in corresponding to that arc published. To do this, I need the row sum of my graph matrix to equal the total number of publications. My second objective is to have the chords between two authors to correspond to how many publications they have in common.

I can satisfy both of these objectives by weighting the edge between two authors as the number of papers they have in common scaled by the number of authors for each paper. Assume that there are publications in the list, and the th paper has authors. Also, if the th author was a co-author on the th paper, then , otherwise . With this, the weight between the and authors is

Notice that this is not a directed graph since the weights are symmetric. Therefore the color of the chord in the publication list does not convey any information. If anyone has ideas of what I can encode with color, please let me know!

Continuing the example, the resulting matrix is

The chord diagram for this example is displayed below (JSFiddle code). We can see that some chords never leave the originating arc. These are the diagonals in the weight matrix and are the number of fractional papers, each person co-authored.



Cordpub

I packaged all of this up in a javascript library that I use to render my publication list. The library is called cordpub and is available at https://github.com/gte620v/cordpub. The use case of the library is to allow you to keep data in a Google Sheets document. So, for instance, most of the data that drives the static pages on this site are kept here.

In the simplist form, cordpub can be used to basically skin the data in a spreadsheet into a HTML page like I have done on my CV page. Or, if your data is a list of publications with authors, you can enable the cord plot feature to show the graph of coauthors. The result is a d3.js driven graphic like this:

image