The structure of a Mathematica package

I like to build tools to make programming easier.

One of my latest tools is a useful function that plots the "call graph" of a Mathematica package: each vertex is a function, and each represents the fact that the one function calls another. It's a neat way to understand the structure of a large body of a code without "chasing down" function calls by hand.

For example, the graph below* shows the structure of a machine learning package I'm writing. The size of the disks show function code size -- bigger disks for more complex functions. Interestingly, Mathematica allows me to measure this directly, rather than using the usual proxy of the number of lines in the source code for a function. By taking LeafCount of the DownValues of a symbol, I get the number of independent parts in the abstract syntax tree of a function.

Other visual elements also communicate important details: red disks are public functions -- functions that the package exports to external code. White disks, conversely, represent functions imported from other packages. And lastly, recursive functions (by definition) are those vertices with a circular edge that loops back to thesmelves.

*In case you're wondering, the star in the lower left corner is a set of functions that deal with measuring the error of a classifier on a given data set -- this functionality is independent enough that it doesn't need to call any of the other code in the package. The triplet on the lower right is a set of utility functions.

What if Shakespeare had been a composer?

At the NKS summer school , during which I was an instructor, I helped students from diverse backgrounds to implement their ideas in Mathematica . One of the my favorite projects was working with the artist Elizabeth Latta to visualize and computationally explore the famous play The Tempest , by William Shakespeare.

Among the many promising ideas we investigated, two in particular were interesting enough that I'd like to show them off. The beautiful drawings are Elizabeth's work, and I planned and wrote the code.

The first was a technique that used a network of the major characters in the play to indicate the relative importance of their interactions. In this visualization, each character is a node in the network and each interaction is an edge. To be precise, the thickness of the gray bond between two characters represents how often they speak lines in the presence of one another. The size of each portrait indicates the total number of lines each player has. So main characters should appear large, and have strong connections to their frequent stagefellows.

The second technique isn't quite a visualization, its an "auditization". The idea is that each character is assigned a note, and the play is simply, um, played -- every time that character speaks, a note is played. A note is sustained for an amount of time proportional to the number of lines he or she is saying. One last tweak is that I use Mathematica's pattern matching to ensure that long streams of repetitive, boring note patterns are elided somewhat. The general effect is passable music, or at least quite different from most algorithmic music. Take a listen! Just remember, you're listening to the play at about 30x speed!

One of the things that I think makes this idea work is that a play must already conform to a particular grammatic, scene, and dialogue structure -- a structure that leaves traces on the way the players share the stage. The rhythm of back and forth between two sparring characters cannot be too lop-sided. Multiple characters have to be orchestrated carefully to move the plot forward. Scene changes must permute the characters if there is to be any extended tension. Characters and motifs recur throughout the play is plans are hatched and executed. All of these forces, and others, make the non-local structure of the dialogue interesting -- simultaneously familiar, and unpredictable -- the very same properties a score must have if it is to be interesting to the ear.

I'm going to try this idea out on other Shakespeare plays, and other plays in general, to see if I can discern major differences in the music they can generate.

My friend Mike Sollami, a programmer and mathematician, loves concision . A cute expression of this ideal is a very short implementation of the QuickSort algorthm that he wrote:

quicksort = # & @@@ {# //. x : {__} :> (## & @@ 
Reverse /@ GatherBy[x, (# < x[[1]] &)])} &

Of course this is a little opaque, looking perhaps more like APL than modern code. To make it easier to parse visually, I've used one of the many small utilities I've written for internal use here at Wolfram|Alpha, called SyntaxPlot , to lay out the abstract syntax tree of quicksort

Screen_shot_2011-06-12_at_7

Not as short, but arguably more pretty!

GMail analytics

I've been playing with the JavaMail library, which is a friendly and clean API that abstracts a range of common email protocols like POP, SMTP, and IMAP. Targeting my own Gmail's IMAP account, I downloaded the headers of all 2700 of my sent messages from the last 5 years and loaded them in Mathematica.

There is also manner of stuff one can do with this kind of data, but a fun starting point for my explorations was a punch-card type visualization that shows my proclivity to send emails at different times of the day and days of the week (inspired by github's version of same for ).

What's special about 4am on a Tuesday morning? No idea!

Another simple statistic is the volume of emails that I've assigned to different GMail labels. I picked four interesting labels: emails between me and my family, my friends, my university, and work-related emails.

It's interesting how directly the structure of my life is visible in these email counts. Take a guess when I finished my degree and started working in industry, and when I moved to the United States.

There are loads of other things to try. For example, I have all the header text, so along with message IDs I can infer the thread structure of all my email conversations. How complex are my conversations with different people? Who do I have the most intricate conversations with? The simplest?

Graphing your social graph

Everyone talks about social graphs and their value to large companies and advertisers, but where are the actual pictures for individual users ? I got to thinking how one could visualize the "local" part of online social network -- just your friends and followers and their relationships -- and after a few weekends of tweaking and fiddling, I've got a nice Mathematica notebook that does all this and more.

For example, here's what my Twitter user account looks like:

You'll notice that I do not appear in the graph . I already have an explicit and implicit relationship to everyone in the graph, and so to include me would to distort the graph layout without adding any information at all.

Okay, what do all these visual elements communicate?

  1. lines indicate users relationships to each other: solid lines indicate mutual relationships, whereas dotted lines indicate one-way relationships -- the dotted end is the party who doesn't follow back .
  2. disk size shows tweet frequency: the bigger the disk, the more frequently a user tweets.
  3. color indicates a user's relationship to me: gray for users I follow that follow me back, blue for users who don't follow me back, and pink for losers, I mean users, whose overtures I don't return. Just kidding about the loser part.

Or, if you prefer a visual dictionary, try figure this out (hint: I'm green)!

Screen_shot_2011-04-21_at_6

In Mathematica , it's really easy to create interactive visualizations. It's extremely easy to annotate the graph nodes with tooltips that describe an individual user, showing their latest load-on-demand tweets, avatar, and follower information. Here's what one of these tooltips looks like:

Screen_shot_2011-04-23_at_3

But this is only the tip of the iceberg. One can easily visualize conversations between users by simply mousing over the edge that connects them. One can click on a user to tweet at them or to go straight to their twitter page. One can weight edges with the frequency of message exchanged between two users. And so on. With a powerful functional languages like Mathematica and its rich set of dynamic UI elements, it's very easy to take an UX or UI idea and just prototype it , often going from an idea to an implementation in a matter of minutes.

I'll leave you with a gallery of some of the my Twitter friends: