In working through examples for my class, I came across Ben Bederson’s excellent publications page and wanted to try to do something similar. After brainstorming for a bit, I came up with the following features I wanted to include:
- Backend database to store publication information
- D3 Chord Diagram to display the links between my co-authors
- No-cost hosting using Google services (Blogger, Apps Scripts, Google Docs); I did not want to have to host an instance of MySQL, for example, through for-free hosting service.
Google Query Language
Using GQL, I simply need to create a row for each publication in a Google SS and then embed a few lines of JS code in my publications page to pull the publication list from the SS. With a few more lines of JS, the publications page can filter my publication list by altering the GQL criteria. Detailed examples are here.
With GQL established as my query method, I needed to create a table of publications. Ideally, I would normalize my data into several tables: one for the publication list, one for the conference name and one for the author list. But because GQL doesn’t support JOINs, I collapsed all of the data into one table. If one were motivated, I believe it is possible to implement a JOIN-like functionality in JS, but I didn’t bother.
My table of publications is just a Google Spreadsheet. The URL is
By changing pub to tq and erasing the output the URL becomes
This returns a JSON-like file of the spreadsheet data. I can actually pass GQL queries as part of the URL. For instance, to view all papers published in 2013, I would execute
Using the GQL URL encoding tool at the API reference site, the URL becomes
With this all of my data is stored on my Google Drive. To update my publication list on my website, I simply have to add a row to the Google Spreadsheet.
D3 & Chord Diagrams
Chord diagrams are useful for displaying directed graphs. Consider the graph below that has directed and weighted edges. This graph could represent the migration between three neighborhoods, for instance: biking between neighborhoods example. I emphasized the weight values in the plot by enlarging the edge line, but even with this visual cue it is relatively hard to interpret. A chord diagram is another way of looking at the same information.
Before we get to the diagram, consider that we can characterize the same graph with a matrix:
Using the D3 chord example, it is easy to get the following chord diagram (JSFiddle code):
In the diagram, the length of the arc is the row sum of the matrix. For example, node A has an edge with weights three and four emanating from it to the two other nodes. This total value is displayed with an arc length of seven in the chord diagram. The width of the chord leaving from A to B is four and from A to it is three. Likewise, from B to A, the width leaving the B arc is two and from B to C the width is one. This is all the same information that is encoded in the graph drawing as well as the matrix.
However, the chord diagram also encodes the net flow as a color. Specifically, the color of the chord is corresponds to the node that is a net importer of edge weight. For instance, the graph diagram show that the edge from A to B has a weight of four while the B to A weight is only two. Therefore B is a net importer and the chord between these two nodes will be the color of the B arc–brown in this case.
Co-Authors as Chords
In order to plot co-author coincidence as a chord diagram, I simply have to characterize the interactions of a set of co-authors as a directed graph. Then I can represent the directed graph as a matrix and simply plot the matrix using D3. It is best to explain this process with a simple example. Assume we have four publications and they are authored subsets of six people who are labeled: A1, A2, A3, A4, A5, and A6. The publication list is as follows:
My first objective is to get the chord diagrams arcs to show how many publications the author in corresponding to that arc published. To do this, I need the row sum of my graph matrix to equal the total number of publications. My second objective is to have the chords between two authors to correspond to how many publications they have in common.
I can satisfy both of these objectives by weighting the edge between two authors as the number of papers they have in common scaled by the number of authors for each paper. Assume that there are publications in the list, and the th paper has authors. Also, if the th author was a co-author on the th paper, then , otherwise . With this, the weight between the and authors is
Notice that this is not a directed graph since the weights are symmetric. Therefore the color of the chord in the publication list does not convey any information. If anyone has ideas of what I can encode with color, please let me know!
Continuing the example, the resulting matrix is
The chord diagram for this example is displayed below (JSFiddle code). We can see that some chords never leave the originating arc. These are the diagonals in the weight matrix and are the number of fractional papers, each person co-authored.
In the simplist form, cordpub can be used to basically skin the data in a spreadsheet into a HTML page like I have done on my CV page. Or, if your data is a list of publications with authors, you can enable the cord plot feature to show the graph of coauthors. The result is a d3.js driven graphic like this: