Orthogonality and projections

This blog is currently a bit of a mess. I’ve given up–for now–on trying to keep a coherent order of episodes; eventually I will collect things and put them in the right place. But, for now, I hope that the story in this article is cute enough to stand on its own two legs.

A couple of weeks ago Robin Piedeleu left a comment in Episode 28, asking about orthogonal projections in graphical linear algebra; specifically, he asked

how would I go about forming the diagram for the orthogonal projection onto the kernel of a matrix?

This is a pretty standard exam type problem in ordinary linear algebra, and requires a fair bit of matrix hackery that one needs to memorise. I used to hate these kinds of problems, especially in exams, because I’d always make routine errors in calculations. Here, I think, the graphical notation really stands out both for pedagogical purposes and for calculation: I definitely make fewer mistakes!

This post will also be a bit quicker paced than usual because I want to get to the answer and actually perform some computations. I will omit many of the proofs, but will (quickly) explain what the concepts of orthogonality, orthogonal complements and orthogonal projections mean and how one deals with them diagrammatically.


We start with a slightly different notation for arbitrary linear relations (I think due to Peter Selinger?) which is convenient for illustrating the two symmetries (colour swap and mirror image) of graphical linear algebra. Given a arbitrary diagram R: m → n, we will draw it as


The reason for this notation is that the symmetry breaking black square gives us a nice way to denote the mirror image diagram (R: n → m).


(Filippo, in addition, likes to mirror the letter R; I’m a bit less keen on that but it does fit the spirit.)

We can also use this notation to show the colour swap symmetry


and combine the two to get the graphical notation for bizarro (combining colour swap and mirror image) of R.


Given the quick and dirty mood of this article, I will, from now on, leave out the labels on the wires; they can be reinserted back consistently and without too much effort; leaving them out makes the diagrams cleaner and faster to draw.


The linear relations of interest for this topic will be of the form R: m → 0, which we know are just a graphical linear algebra way of saying “a subspace of Qm“. We have a nice graphical way of denoting the sum and intersection of such subspaces. Given, R: m→0 and S: m→0, their sum R + S and intersection R ∩ S are, respectively:



The first question to tackle is what exactly does orthogonality mean, diagrammatically speaking?

Let’s start with the usual explanations. In the geometric interpretation of linear algebra, two vectors v, w are said to be orthogonal exactly when they are at right angles to each other. Algebraically, one defines the dot product of vectors:

vw = v1⋅w1 + v2⋅w2 + … + vm⋅wm.

Then v and w are orthogonal exactly when vw = 0. For example (1, 1) and (1, -1) are at right angles when we draw them in Q2 and indeed, 1⋅1 + 1⋅(-1) = 0.

Given a linear subspace R, one can define the orthogonal complement of R,

R = { v | ∀ w ∈ R. v⋅w = 0 },

which contains precisely those vectors that are orthogonal to everything in R.

The two standard facts to mention here are that R ∩ R = {0} (the only vector that is both in R and R is the zero vector) and R + R = V, where V is the entire vector space. It is for this reason that R is called a complement.

So how does this all work in graphical linear algebra?

It turns out that the answer is very easy. The orthogonal complement of R is simply its colour-inverted version. Indeed, one can show that (for any R: m→0)


which is the diagrammatic way of saying R ∩ R = 0. Since equations in graphical linear algebra are invariant under swapping of colours (and copying/adding is commutative), we get for free that


which is the diagrammatic way of saying R + R is everything.

We might as well consider an example at this point. Consider the linear subspace of Q2 containing pairs (x,y) such that x + 2y = 0. The diagram for this is


The orthogonal complement is the colour swapped diagram, which is the following:


Note that the scalar has changed direction – this is of course a consequence of how the sugar for natural numbers is defined, since, recall:


Colour swapping the above makes the 2 gate point the other way.

Let’s just do a quick sanity check. An example of a vector that satisfies x + 2y = 0 is (6,-3). The diagrammatic spec of the orthogonal can be read as the subspace that contains vectors of the form (a, 2a) for any a. So let’s take, e.g. a = -1, obtaining (-1,-2). Now (6,-3)⋅(-1,-2) = -6 + 6 = 0. This is not a proof, of course, but I did say that I’m skipping those today.


Ok, so now what does it mean to project orthogonally onto some linear subspace R?

It means finding a linear transformation (or, if you prefer, a matrix) A such that

Ax = x if x ∈ R and Ax = 0 if x ∈ R.

So, roughly speaking, we preserve everything in R and kill everything in its orthogonal complement. This is exactly how geometric projections work: for example if you want to project a 3D shape onto a plane. Much of the maths that underlies 3D graphics relies on this kind of linear algebra.

Now back to the story; let us construct the orthogonal projection diagrammatically.

First, consider the following diagram.


We can think of this as the linear relation that represents the partial function that acts like the identity, but restricts its domain to R. It is thus not total, because, for example, it is undefined on the orthogonal complement.

The true orthogonal projection, given something not in R, should send it to zero. Here’s another partial function that does exactly that.


Now it turns out that to get the entire thing–the orthogonal projection onto R–we just need to sum up the two cases:


which gives us the diagrammatic specification for the orthogonal projection. Let’s take the right hand side above and rewrite it into the following, more aesthetically more pleasing diagram. This is graphical linear algebra after all, and we care about diagram prettiness.


We will use the above as the diagrammatic master formula for calculating orthogonal projections.


Let’s do a couple of worked examples, e.g. projecting onto

{ (x,y) | x + 2y = 0 },



which was our example from before. Recall that the orthogonal complement here is:


Substituting into the master equation gives


To get the actual matrix we can do a little diagrammatic calculation.


where we used the Spider Theorem (Episode 23) twice in the third step to simplify the diagram. We could continue with diagrammatic calculation, but at this point it’s quicker just to multiply the constituent matrices (and remember, the order of multiplication is “backwards”):


You may feel like doing a sanity check to see if this indeed does the job. That is, take a vector in { (x,y) | x + 2y = 0 }, which the matrix ought to leave untouched, and take one from the orthogonal complement, which the matrix must send to 0.

Just for fun, let’s quickly do the calculations for other projection, onto the orthogonal subspace.


Here the matrix is, therefore,



But Robin’s original question was about projecting onto the kernel of some matrix. And here graphical algebra has some additional pretty insights. I mentioned in a previous episode that the bizarro of a matrix (combining colour swap and mirror) is the transpose. Thus, we have, for an m×n matrix A


That is, the colour swap of a matrix is the same thing as its transpose, mirrored.

This can be quite useful. In particular consider the kernel of A.


We know that if we want the orthogonal complement we need to swap colours.


But, using our new insight, this is the same as


So the orthogonal complement of the kernel of A is the image of its transpose. Simple.

Dually, the orthogonal complement of the image of A




which is the same thing as


So the orthogonal complement of the image of A is the kernel of its transpose. These trivial consequences of graphical linear algebra notation are sometimes called the Fundamental Theorem of Linear Algebra.


Now we can, given A, calculate a formula for the orthogonal projection onto its image. Substituting the relevant bits into the master formula gives:


From which we can read off the somewhat mysterious formula A (ATA)-1 AT that appears in many linear algebra textbooks.

For the projection onto the kernel of A we get


which doesn’t obviously simplify into a “textbook formula”, but is just as easy to use in calculations.


So, since we’ve done loads of random calculations so far, let’s do one more for the road. Let’s say we want to calculate the orthogonal projection onto the kernel of


which, as a matrix is, diagrammatically


The kernel is thus


and the orthogonal complement of the kernel is


It remains to plug all the bits into the master formula for calculating projections and simplify:


which multiplies out to give



Last week I had the pleasure of visiting the Perimeter Institute for Theoretical Physics. I gave a talk about graphical linear algebra, which was recorded and available to watch online here. I emphasised the “computer science” way of looking at string diagrams as a resource sensitive syntax. That story will eventually make its way onto the blog in full, for now you can only read the beginning.

One interesting thing about the Perimeter Institute is the front entrance.


The Greek inscription translates to

Let no one uninterested in geometry enter here.

The phrase is a tribute to the phrase that was said to adorn the entrance to Plato’s Academy:

Let no one unversed in geometry enter here

I like the small change: the Perimeter Institute, in its mission statement, emphasises pedagogy as well as research.

The Perimeter Institute is the perfect research institute. It is a research factory with the best minds in theoretical physics. Crucially and refreshingly, it was built with researchers in mind. For example: there are blackboards everywhere, the open spaces encourage interactions, the architecture is inspiring and the food in the cafeteria is amazing. The contrast with 21st century British academia was… a bit depressing, and I was sad to leave.

If somebody wants to part with a few hundred million dollars, I’d love to set one up for theoretical computer science and category theory. Get in touch, philanthropic billionaires!

Determinants and the Lindström-Gessel-Vienot Lemma

Solomon is currently a student at the University of Pennsylvania where he is pursuing a Ph.D in Computer Science. He has a keen interest in all things Computer Science and Maths, so he decided to specialize in Programming Languages, an area which applies ideas from diverse areas of Computer Science, Mathematics and Philosophy. One of his favourite things is when connections are made between seemingly disparate areas of inquiry, so he has enjoyed reading the Graphical Linear Algebra blog for the connections between string diagrams and linear algebra. As the Lindström-Gessel-Vienot Lemma shows us, there is also a surprising combinatorial connection. Such occurrences make Solomon very happy!

Consider the familiar diagram for the 2 \times 2 matrix \begin{pmatrix} a & b \\ c & d \end{pmatrix}:


Suppose that we wanted to push some flow from the input nodes X_1 and X_2 to the output nodes Y_1 and Y_2.  Moreover, we want to do this in such a way that every output node receives some flow without any congestion in the network. More specifically, want to push flow from the input nodes to the output nodes in such a way that:

  1. no two different paths that are part of the flow intersect at
    a vertex (not even endpoints), and
  2. for each output node Y_j, there is a path from some input
    node X_i to Y_j.

For example, in the diagram above, we can push a units of flow from X_1 to Y_1 and d units of flow from X_2 to Y_2.


We can also push c units of flow from X_1 to Y_2 and b units of flow from X_2 to Y_1.


We are not allowed to push a units of flow from X_1 to Y_1 and b units of flow from X_2 to Y_1, since then we would have two different paths intersecting at Y_1.


Also, we are not allowed to push only a units of flow from X_1 to Y_1, since then there is no path from some X_i to Y_2.


We call a flow that satisfies conditions (1) and (2) a viable flow. Now we will measure the signed weight a viable flow. The weight of a viable flow is simply the product of all weights on the edges that are part of the flow. To get the signed weight, we multiply the result by -1 if the flow links X_1 to Y_2. Otherwise, we multiply the result by +1.
For example, in our 2 \times 2 matrix example, the first viable flow has weight +ad. The unsigned weight of this flow is ad since we push a units of flow from X_1 to Y_1 and d units of flow from X_2 to Y_2. To get the signed weight of this flow, we multiply the unsigned weight ad by +1 since X_1 is linked to Y_1. A similar computation shows that  the signed weight of the second viable flow is -bc.
Interestingly, when we add signed weights of all the viable flows in the diagram, we get the determinant of the matrix associated with the diagram i.e. ad-bc. Coincidence?

The Lindström-Gessel-Vienot Lemma (LGVL) tells us that this in fact is not a
coincidence. Specifically, LGVL associates to each finite directed acyclic
graph (dag) G a matrix whose determinant is exactly equal to the
sum of the signed weights of the viable flows in G.  To make this relationship precise we shall give the statement of LGVL, but first we need to introduce some definitions and notation that we will need in order to give the statement of the theorem.

First, we introduce permutations. For any sets A=\{a_1, \ldots, a_n\} and B=\{b_1, \ldots, b_n\} we call any bijection \sigma : A \longrightarrow B a permutation from A to B. Now assume that \sigma sends a_1, \ldots, a_n to \sigma(a_1), \ldots, \sigma(a_n) respectively. For example, let A = B = \{1,2,3,4,5\}, and let \sigma send 1,2,3,4,5 to 3,5,2,1,4 respectively. Write a_1, \ldots, a_n in a row in that order, skip a line, then write b_i below a_i for each b_i. Draw a line from a_i to \sigma(a_i), and count the number m of crossings in between the {a_i}'s and the {b_i}'s. Define the sign of \sigma be (-1)^m. For the permutation \sigma we have just described, there are six crossings, so the sign of the permutation is (-1)^6 = 1.


The sign of a permutation \sigma can also be computed by raising -1 to the
number inv(\sigma) of inversions of \sigma, where inv(\sigma) is the number of ordered pairs (i, j) such that i < j but \sigma(i) > \sigma(j). In the example above, the number of inversions of \sigma is 6 = |\{(1,3), (1,4), (2,3), (2,4), (2,5), (3,4)\}|, which gives the same result as before.

For the next few pieces of notation as well as the statement of LGVL, assume that G is a finite directed acyclic graph. Assume further that G has input nodes X_1, \ldots, X_n and output nodes Y_1, \ldots Y_n.


As before, we are interested in viable flows, i.e. flows that have the property that:

  1. no two different paths that are part of the flow intersect at
    a vertex (not even endpoints), and
  2. for each output node Y _j, there is a path from some input
    node X_i to Y_j.

Each viable flow can described by an n-tuple P = (P_1, \ldots, P_n) of paths , where P_i is a path from the input node X_i to the output node Y_{\sigma(i)} for some permutation \sigma. Let \sigma(P) be this permutation.

For each directed path P between any two vertices in G, let \omega (P) be the product of the weights of the edges of the path. For any two vertices X_i and Y_j, write e(X_i, Y_j) for the sum

e(X_i,Y_j)=\sum\limits _{P:X_i\to Y_j} \omega (P),

where P ranges over all paths from X_i to Y_j.
We are finally in a position to give the statement of LGVL, which is that


\left \vert {\begin{pmatrix}e(X_{1},Y_{1})&e(X_{1},Y_{2})&\cdots &e(X_{1},Y_{n})\\e(X_{2},Y_{1})&e(X_{2},Y_{2})&\cdots &e(X_{2},Y_{n})\\\vdots &\vdots &\ddots &\vdots \\e(X_{n},Y_{1})&e(X_{n},Y_{2})&\cdots &e(X_{n},Y_{n})\end{pmatrix}} \right \vert = \sum\limits _{{P = (P_{1},\ldots ,P_{n})}}{\mathrm {sign}}(\sigma (P))\prod\limits _{{i=1}}^{n}\omega (P_{i}),

where P ranges over all viable flows in G!

LGVL holds for any finite directed acyclic graph, not just graphs that look like the 2 \times 2 example. For example consider the following graph:


or equivalently, the following string diagram:

The matrix for this is

\begin{pmatrix} ag+bh & bi & 0\\ ch & ci + dj & dk\\ 0 & ej & ek + fl \end{pmatrix}.

It has determinant agciek + agcifl + agdjfe + bhdjfe, and it is easy to check that each of these terms indeed gives a viable flow from the {s_i}'s to the {t_i}'s. Note also that the sign of each term is +1 since s_i is sent to t_i in each of the viable flows.

Now on the one hand, LGVL gives us a way of computing the sum of the signed weights of viable flows in a dag as the determinant of a certain matrix. On the other hand, given a matrix M, we may compute the determinant of M by doing the following. First, we find a special dag G. Next, we find a set of input vertices \{X_i\} and output vertices \{Y_j\} in G. We check to see that M_{ij} = e(X_i, Y_j). Then, we (easily!) compute the sum of the signed weights of viable flows in G from \{X_i\} to \{Y_j\}. Finally we conclude that the result is equal to the determinant of M by LGVL. This compositional approach to computing the determinant often leads to useful and surprising results, as we shall see below.

Consider the determinant of the following matrix:

A = \begin{pmatrix} (1,1) & (1,2) & ... & (1,n) \\ (2,1) & (2,2) & ... & (2,n) \\ \vdots & \vdots & \ddots & \vdots \\(n-1,1) & (n-1,2) & ... & (n-1,n)\\(n,1) & (n,2) & ... & (n,n)\end{pmatrix},

where (k, \ell) denotes the greatest common divisor of k and \ell.

It is difficult to come up with a closed formula for this determinant by using say row-reductions or the co-factor expansion. However, using the strategy involving LGVL gives a simple and surprising result.

But first, a few more definitions! Given any positive integer k, let \phi(k) denote the number of positive integers less than or equal to k which are relatively prime to k. For example, \phi(6) = 2 since the only numbers less than 6 that are relatively prime to 6 are 1 and 5. On the other hand \phi(7)=6 since all the positive integers less than 7 are relatively prime to 1. In fact, it is easy to see that \phi(p)=p-1 for any prime p.
Also, for the purposes of the proof, we shall need to use the fact that

\sum\limits_{d|k} \phi(d) = k,

which is a property first established by Gauss. For example

6 = \phi(1) +\phi(2) + \phi(3) + \phi(6) = 1 + 1 + 2 + 2.

Now we proceed to the actual computation. Consider the graph G which has 3n vertices: \{s_1, \ldots, s_n\} are the input vertices, \{t_1, \ldots, t_n\} are the output vertices and \{v_1, \ldots, v_n\} are intermediate vertices. If d divides k, put the edge s_k \to v_d with weight \phi(d) and the edge v_d \to t_k with weight 1.


What is the weight e(s_k, t_{\ell}) of all the paths from s_k to t_{\ell}? Well each path from s_k to t_{\ell} can have only two edges: an edge s_k \rightarrow v_d of weight \phi(d) and an edge v_d \rightarrow t_{\ell} of weight 1, for some v_d, where d divides k and d divides \ell. Of course the weight of this path is \phi(d) multiplied by 1, which is just \phi(d). Therefore,

e(s_k, t_{\ell}) = \sum\limits_{\substack{d|k \\ d|\ell}}\phi(d) = \sum\limits_{d|(k, \ell)}\phi(d) = (k, \ell)

The last equality is justified by Gauss’ result.

By the LGVL, it follows that 

\det(A) = \sum\limits_{P = (P_1,\ldots,P_n)} \mathrm{sign}(\sigma(P)) \prod\limits_{i=1}^n \omega(P_i),

where P = (P_1, \ldots P_n) ranges over all viable flows from the input vertices S = \{s_1, \ldots, s_n\} to the output vertices T = \{t_1, \ldots, t_n\}.
Now the only path from s_1 to T is s_1 \to v_1 \to t_1 since 1 has only one divisor, namely 1. As a result, for each prime 1 \leq p \leq n, any path from s_p to T in a viable flow must pass through v_p. This is because p has only two divisors, namely 1 and p, but the path through 1 has already been blocked by 1.

Consequently, for any product pq of two primes, any path from s_{pq} to T must pass through v_{pq}. Repeating this argument for all the other numbers less than or equal to n, we conclude that all the viable flows from S to T link s_k to v_k.

Now if there is a viable flow which links v_k to t_{\ell} for some k \neq \ell, then there must be some k', \ell' such that v_{k'} is linked to t_{\ell}' with k' > \ell' (why?). But this means that k' divides \ell', a contradiction since k' > \ell! So we must have v_k = t_k, for all k.

Thus, there is a unique viable flow from S to T i.e. the flow for which s_k is sent to t_k via v_k. Thus, we conclude that the determinant of A is \phi(1) \cdot \ldots \cdot \phi(n), since the weight of each path P_k in this flow is \phi(k) as we showed earlier:

\left \vert \begin{pmatrix}(1,1) & (1,2) & ... & (1,n) \\ (2,1) & (2,2) & ... & (2,n) \\ \vdots & \vdots & \ddots & \vdots \\(n-1,1) & (n-1,2) & ... & (n-1,n)\\(n,1) & (n,2) & ... & (n,n)\end{pmatrix} \right \vert = \phi(1) \cdot \ldots \cdot \phi(n)

1st Workshop on String Diagrams in Computation, Logic, and Physics


Aleks Kissinger and I are organising STRING 2017, a workshop for people who use, or are interested in string diagrams. The diagrams used all over this blog are an example, but string diagrams have been appearing all over the place recently: quantum computing, logic, asynchronous circuit analysis, and even linguistics. I’m trying to write a few articles about why this is happening, so if you’re interested, here’s a link to the first part.

I’m pretty excited about the workshop: first, the venue is amazing, it’s the Jericho Tavern (or more precisely, the upstairs concert venue, famous for being the place where Radiohead, while still at uni, had their (first?) gig. I’ve been to one workshop there before, in October 2014, and it was great: the atmosphere is informal, comfortable, and very conducive to discussions. At that workshop I talked about my initial ideas for graphical linear algebra, and had some really useful feedback. So I’m looking forward to going back.

Second, we have some really fantastic invited speakers: Paul-André Mèllies is coming from Paris to talk about his research on logic, which heavily features the use of string diagrams. His work on logical negation is particularly interesting. He’s the modern day Charles Sanders Peirce, who was doing logic with string diagrams (existential graphs) before Frege did it with classical syntax.

Next, Dan Ghica will give a tutorial on applications of string diagrams in circuit design and verification. His work is related but somewhat different from the graphical linear algebra kinds of diagrams, which Filippo, Fabio and I have been using for signal flow graphs. If you don’t know the details of Dan’s latest work, you might remember him from his articles about developing knot theory for eight year olds.

Finally, Bob Coecke, one of the founders of the string-diagrams-as-syntax-for-quantum-physics school of thought, will give a tutorial on the quantum applications. And possibly, perform in the evening as part of The Quantum Dagger Orchestra.

We are soliciting short abstracts, so please consider submitting something: the deadline is end of June, so there is plenty of time. Of course, you can also come along without presenting anything and simply soak up the string diagrammatic goodness. You’ll be able to register at the FSCD website when they eventually put up a registration link; I don’t remember the exact numbers but the whole thing should cost between £100 and £150, with catering included. You’ll have to pay for your own beer, of course (soft drinks are also available, I’m told). The official call for papers with all the details is below.


1st Annual Workshop on String Diagrams in
Computation, Logic, and Physics
(STRING 2017)


Satellite workshop of FSCD 2017,
co-located with HDRA 2017.

Jericho Tavern, Oxford, UK
September 8-9, 2017

String diagrams are a powerful tool for reasoning about processes and composition. Originally developed as a convenient notation for the arrows of monoidal and higher categories, they are increasingly used in the formal study of digital circuits, control theory, concurrency, quantum and classical computation, natural language processes, logic and more. String diagrams combine the advantages of formal syntax with intuitive aspects: the graphical nature of terms means that they often reflect the topology of systems under consideration. Moreover, diagrammatic reasoning transforms formal arguments into dynamic, moving images, thus building domain specific intuitions, valuable both for practitioners and pedagogy.

This workshop aims to bring together researchers from diverse backgrounds and specialities to collaborate and share their insights, tools, and techniques. It will furthermore provide an informal atmosphere in a unique venue: the upstairs of the Jericho Tavern, a music venue, where famously Radiohead played their first concert. All the usual conference facilities will be provided, and the distinctive location will provide plenty of opportunities to discuss and share ideas. STRING 2017 is a satellite event of FSCD 2017 (http://www.cs.ox.ac.uk/conferences/fscd2017/) and will be co-located with the 3rd Higher-Dimensional Rewriting and Applications (http://hdra.gforge.inria.fr).

Invited Speakers

  • Paul-André Mèllies (CNRS and Paris Diderot) – joint HDRA and STRING invited speaker
  • Dan Ghica (Birmingham) – applications of string diagrams in circuit design and verification
  • Bob Coecke (Oxford) – applications of string diagrams in quantum foundations and quantum computation


We warmly welcome all types of contributions, ranging from work-in-progress to original work and/or overviews of mature work published elsewhere, on topics ranging from theory of string diagrams, to applications and tool demos.

Prospective speakers are invited to submit a title and 2 page abstract via the Easychair page at https://easychair.org/conferences/?conf=string2017

We will try to build a programme that is as inclusive and wide-ranging as possible within the fairly limited time available. Hence, speakers will be invited either to give a full-length talk or give a short talk in a “lightning session” style format.

The accepted abstracts will be made available electronically before the workshop.

Important Dates

Abstract deadline: 30 June 2017
Speaker notification: 14 July 2017
Workshop: 8-9 September 2017

Program Committee

  • Filippo Bonchi (ENS Lyon)
  • Ross Duncan (Strathclyde)
  • Fabio Gadducci (Pisa)
  • Aleks Kissinger (Radboud)
  • Dan Marsden (Oxford)
  • Mehrnoosh Sadrzadeh (Queen Mary)
  • Pawel Sobocinski (Southampton)
  • David Spivak (MIT)
  • Noam Zeilberger (Birmingham)


  • Aleks Kissinger – Radboud Universiteit
  • Pawel Sobocinski – University of Southampton


  • Joshua Holland – University of Southampton

Why string diagrams?

Sometimes I put the URL of this blog into a search engine to see what people are saying. When I first started – having heard about the fabled Internet Hate Machine – I was a little worried that I’d have to deal with tons snarky remarks and negativity. The opposite has happened. I’ve received several touching personal “thank you” emails, and a large majority – maybe even 90% – of social media feedback has been positive and extremely polite.  So, unexpectedly, it’s been a bit of an Internet Love Fest. It’s all a bit disconcerting: hey, I enjoy arguments as much as the next guy!

Of the non-positive 10% of remarks, 95% have been quite useful and constructive. They allowed me to look outside my own conceptual bubble and understand the parts of blog that people have found mysterious, unclear or even annoying. In particular, there is one recurring sentiment.

The guy has some interesting insights, but what’s that weird diagrammatic notation all about?

I realised that I spent a lot of time talking about “the rules of the game” for dealing with string diagrams, using them in proofs and explaining some technical facts about what they mean mathematically, but I never really explained the fundamental questions of why people should care about them. This made me re-evaluate the way I present this material: at the recent Midlands Graduate School I spent half of my four lectures explaining the story of string diagrams and why computer scientists (and engineers, physicists, mathematicians, economists, philosophers,…) ought to care about them.

I think that this is an important missing chapter of the story so far. It’s an interesting tale in itself, involving natural language, Universal Algebra, programming languages and higher category theory. I will make the case that string diagrams are not only important, they’re inevitable. And it’s due to something called resource-sensitivity.

What is resource-sensitivity?

It’s a cliché that the digital revolution has changed the way we understand the world. One, somewhat under appreciated aspect is the way that the concept of data has changed the way human beings perceive reality.  Sequences of 0s and 1s encode all kinds of information and have spoiled us: millenia of experience of coping with finiteness (food, money, time, etc.) has been augmented with an immensely powerful resource that comes with an unprecedented ability to be copied and thrown away at will. Media companies have been struggling with this disturbing new reality since the days of Napster. Dealing with data is also a conceptual leap that programmers have to make when first learning to program; they learn to navigate this unfamiliar and exotic world, governed by a resource like few other in human experience.

Let’s start with a simple example of resource-insensitivity. The following is a very simple function, in C syntax, that takes an integer x, multiplies it by itself, and returns the result.

int square (int x) {
    return x * x;

The same identifier x appears twice in the body of the function (and another time, at the start, where it’s declared as the parameter). That means that the resource that the function deals with, in this case an integer, is implicitly assumed to be copyable. This is fine: an integer is represented by a sequence of 0s and 1s in memory. It’s not a problem to use it several times, copy it somewhere else or even overwrite it.

But not every resource is as permissive as data. One of the first string diagrams on this blog was the recipe for Crema di Mascarpone. The wires were typed with ingredients and there were several generators; for example a crack-egg generator that took an egg and produced a white and a yolk.

Actually, it was subsequently criticised by a colleague on my Facebook for not being truly honest, because it implicitly threw away the shell. But here’s something less controversial: I think everyone would agree that doesn’t make sense to include an egg-copy generator.

So, while copying numbers is ok, copying eggs is not. It’s an amusing idea precisely because we are intuitively aware of the concept of resource-sensitivity. Actually, once you start thinking about it, there are examples everywhere. For example consider the following, grammatically correct sentence in a natural language:

Have your cake and eat it too.

This aphorism is used in situations where you can’t have something both ways. But the literal meaning is referring to a physical object: you can’t have a physical cake, eat it and still have it, since the action of eating it destroys it. So, perhaps, a hypothetical programming lanuage for cake handling should not allow you to write the following routine:

cake handle_cake (cake c) {
    eat c;
    return c;

Here executing the command eat c; consumes the resource c, so there is nothing to return.  It seems that type cake is quite unlike the type int of integers.  We will call the integer kind of data classical: classical data can be copied and discarded, whereas non-classical data (like eggs and cakes) cannot.

The closest thing to classical data that we had before computers is referencing things or concepts. In other words, information, which can be copied (e.g. by monks) and discarded (e.g. in a bonfire). Natural languages like English evolved to express information, and so they have several classical features built-in. In particular, the way that sentences of natural language are constructed are an example of something called classical syntax. Classical syntax is a good fit for dealing with classical data like information, but as we will see, also has some annoying limitations.

What is classical syntax?

Trees are a popular way of organising classical data, and are closely related to the notion of classical syntax.


Unlike traditional trees, like the Hawaiian one in the photo above, abstract trees are usually drawn with the root at the top. For example, the following is the tree representation of the expression 2 × 3 + 4.


The usual convention that × is evaluated before + is captured by the fact that × appears deeper in the tree than +. So actually, trees are better than formulas because they eliminate the need for parentheses. For example, the tree for 2 × (3 + 4) is the following:


Expressions in natural language such as “you can’t have your cake and eat it too” can also be represented as trees. A lot of this is due to work by Noam Chomsky on the theory of context free grammars (CFGs), although some people date work on CFGs and their relationship with natural language back to ancient India. So “have your cake” looks something like this, as a syntax tree:


(Don’t trust me, I never actually studied English grammar. When I went to high school in Australia back in the 90s, teaching formal grammar was unfashionable and considered to “stifle creativity”. What a load of…)

Anyway, as you can see from the tree above, natural language grammars are complicated things. CFGs are also used to define the syntax of programming languages; fortunately, these are actually much simpler to describe than natural language grammars, which are full of exceptions and special cases. So here’s how the cake handling program from before might look like as a tree:


This is roughly the kind of tree that’s constructed by the compiler, or, more precisely, by one of its components, the parser. Here I’m being somewhat reductive: in the programming language context such trees have special features to deal with variables and identifiers, and these kinds of extended trees are referred to as abstract syntax.

The cake c is referred to twice in the body of the function and once in the declaration.  But since cake is a non-classical, resource-sensitive type, then perhaps the compiler ought to ensure that the variable appears only once in the function body? So maybe our cake programming language should reject this program as a syntax error?

This doesn’t really work very well. For example, some actions might preserve the resource. For example, following ought to be ok: there’s no problem with looking at your cake and having it too.

cake handle_cake (cake c) {
    look_at c;
    return c;

One popular way to go about solving this problem is to build a little type theory for cakes and catch these kinds of bugs during type checking. Although this often works, is often very cute, and is quite similar to the kind of semantic analysis that goes on in our heads when trying to make sense of expressions in natural language, I think that it ignores a deeper issue:  we are cramming non-classical data into classical syntax.

Other interesting repercussions that will become clear when we think about the mathematics that governs the behaviour of classical syntax. More about this next time. For now, I’ll highlight some things that I think are symptoms of the widespread disease of classical thinking about non-classical things.

Limitations of classical thinking

Apart from cakes, here are some other examples of non-classical data, in order of decreasing coolness:  qubits, money and time. In fact, almost any finite resource you can think of is non-classical, and they all pose similar problems. All these examples are of things that, like eggs and unlike information, cannot be arbitrarily copied and discarded.

But even for ordinary, classical 0s and 1s kind of data, sometimes resource-sensitivity comes into play. A good example is concurrency: the idea that a computation can be split into a number of threads that are executed in parallel. The threads then maybe communicate and synchronise several times during processing in order to perform a task faster than would be possible when using a single thread.

Concurrent programming is the the task of writing programs that can do several things at the same time. It is more important and necessary then ever before, because hardware has been getting increasingly parallel in the last 15 years. Unfortunately, it is also widely feared and considered to be extremely hard to get right. Here, although the data is classical, the actual act of computation is resource-sensitive.  Two or more threads accessing and modyifying the same piece of data can have weird consequences, because of the non-determinism: we can’t predict before hand which thread will get to the data first, and it quickly becomes infeasible to examine all the possibilities.  Getting this wrong can sometimes have deadly consequences. I’ll return to the topic of concurrency later because it’s one of my favourite things to talk about. For now, I will tell you my own personal hypothesis: one of the main reasons why programming language technology has problems with concurrency is that ordinary, classical syntax has problems dealing with resource-sensitive applications such as concurrency. I think that string diagrams can help. (Incidentally, I recently watched Bartosz Milewski’s introduction to category theory lectures:  it was extremely cool to see someone introducing category theory and talking about programming in general, and concurrency in particular! I highly recommend the videos.)

Finally, one of the main points of this blog is that resource-sensitive thinking can actually help in areas that have traditionally been understood in a resource-insensitive way. For example, doing linear algebra with string diagrams reveals all kinds of beautiful symmetries that are hidden or only half visible when expressed in traditional, classical language. This is because the copying and discarding is explicitly a part of the graphical lexicon – it is the black structure of interacting Hopf monoids. Here, although the data is classical, it is nevertheless useful to be explicit about when the copying and discarding is done: only then the full picture becomes visible.


This episode has turned into one long piece of motivational text, with not too much technical content. This will change, because there is quite a bit of nice mathematics involved. So, my plans in the next few episodes are to answer the following questions in more detail:

  • what are the mathematical properties of classical syntax?
  • how can classical syntax be made (more) resource-sensitive?
  • what is true resource-sensitive syntax? (spoiler: it’s string diagrams, just like the ones all over this blog!)
  • how to convert classical syntax to resource-sensitive syntax?
  • what are some examples of classical syntax, viewed through resource-sensitive glasses?
  • what is the history of string diagrams?
  • what are some future research and technological challenges to be solved?

The moral of the story will be that string diagrams are not just some weird, trendy notation. They are inevitable, once we accept that the world is full of non-classical things that need to be described properly. String diagrams are resource-sensitive syntax.

…, a monoid is a category, a category is a monad, a monad is a monoid, …

In the last few months I taught three graduate mini-courses about string diagrams and graphical linear algebra – at Phd Open in Warsaw, a winter school in Estonia, and a graduate school in Leicester. That and teaching have been keeping me busy, so I haven’t invested as much time as I would have liked on the blog. There was also the need to write down all the work done in Hawaii, and Dusko, Filippo and I just finished writing a meaty technical report, which will soon appear on the arXiv. The time that I actually did spent writing blog stuff was distributed across a number of different articles; so now I have a whole bunch of half-finished stuff. I really need to start managing this whole project a little better!


Here’s something really short and off the cuff. At MGS in Leicester, during the exercise sessions, I talked about distributive laws of props. At the same graduate school, Alexander Kurz was giving a course on category theory.  I got the inspiration for this article from one of his lectures.

As part of the distributive laws of props story, you realise that categories are a special case of the concept of monad. But monads are a special case of the concept of monoidAnd monoids–of course–are a special case of the concept of category. By transitivity of the “special case of” relation, we should therefore probably just admit that they are all really the same thing: some kind of Platonic form of plugging things into each other. The story is full of pleasant categorical mental gymnastics. Think of it as the categorical equivalent of a 5km fun run.

… , a monoid is a category, …

A category is a collection of objects and arrows. Every object has an identity arrow, and any pair arrows f: A → B, g: B → C, where f‘s codomain is the same as g‘s domain, can be composed. Composition is associative, and composing with identities does nothing.

If we restrict to categories with exactly one object – let’s give it a name, say – there is just one identity arrow  (the identity id* on *). Every arrow has * as both domain and codomain, since there are no other objects around. So every arrow can be composed with every other arrow. Composition is associative, and composing with id* on either side leaves any arrow unchanged. In other words, there is a set of arrows Ar with a total, associative, unital operation: i.e. a monoid. The fact that there is an object * adds no interesting information.

So every monoid is a kind of category, a category with one object. Categories, it would seem, are the more general concept. Indeed, some people say that categories are the “categorification” of the concept of monoid. Maybe categories should actually have been called monoidoids?

… , a category is a monad, …

One way to define the concept of monad is as a triple (T, μ, η), where T: C → C is an endofunctor, μ: T2 → T is the multiplication natural transformation, and η: 1 → C is the unit natural transformation. Multiplication μ is associative and has η as unit, which is usually said with the aid of three commutative diagrams.

In the famous paper 1970s paper “The formal theory of monads“, Ross Street realised that this popular definition is unnecessarily specific. There’s no reason to be overly specific in mathematics: if you can easily have a more general theory/theorems, then why not? What matters is getting to the core of the issues, and not getting too hung up on extraneous details.

In this specific case, an endofunctor is a 1-cell in the 2-category Cat of categories, functors, and natural transformations. The data defining a monad is thus one 1-cell and two 2-cells, subject to some commutative diagrams of 2-cells. The fact that this data lives in Cat brings nothing important to the story: the definition clearly makes sense in any 2-category.

The 2-category Span(Set) is one of my favourites. It’s actually a bicategory–composition is not strictly associative, but we will ignore that detail for the purposes of this lightning article. It’s really just a bit of bureaucracy and adds nothing to the story. Trust me.

The objects of Span(Set) are sets. The arrows from X to Y are spans of functions

X ← Z → Y

Identities are spans formed of two identity functions. Composition of spans involves taking pullbacks. Finally, the 2-cells are span morphisms; functions Z→Z’ that make the two resulting triangles commute. I won’t bore you with the details which you can find on nlab.

The interesting question is what happens when we consider a monad in Span(Set). An endo-1-cell is span of the form

X ← Y → X                           (†).

Let’s call the left function δ0 and the right δ1. Now we need to give two 2-cells, that is, morphisms of spans. Let’s do the unit first: when you unpack the definitions it is a function

ι: X → Y such that δ0ιx = δ1ιx = x.

The multiplication μ goes from the span composed with itself back to itself. When you compute the composition, taking the pullback, the span is

X ← Y×XY → X

where XY = { (y,y’) | δ1(y) = δ0(y’) }. So μ: Y×XY → Y, satisfying δ0μ(y, y’) = δ0y and δ1μ(y, y’) = δ1y’.

So far so good. Associativity says simply that μ(μ(y, y’), y”) = μ(y, μ(y’, y”)) and the unit conditions say that μ(y, ιδ1 y) = y = μ( ιδ0y, y).

All this data has another name: a category. Let’s make it clearer with more familiar terminology: X is the set of objects, Y is the set of arrows. δ0 and δ1 give, respectively, the domain and codomain of each arrow. The function ι: X→Y picks out the identity arrows. Next, XY is the set of composable arrows, and μ: Y×XY → Y defines the composition for any composable pair. The remaining conditions say exactly that composition is associative and identities work as they should. It’s precisely the data we need.

So, a category is just a special case of the more general concept of monad, right?

… , a monad is a monoid, …

Not so fast. We talked about monads in the generality of 2-categories. The theory of monoids can be easily generalised too: there’s no reason for the specificity of  “set” M, with unit “element” e and multiplication “function” m, satisfying associativity and the unit equations. The world would be boring if everything was just sets and functions. In fact, the concept of monoid makes sense in any monoidal category. Pick some monoidal category X; the data of a monoid consists of

an object X, and arrows m: X⊗X → X and e: I → X

where I is the monoidal unit object. Associativity and the unit laws can be expressed as three commutative diagrams. The usual monoid-as-a-set version is now a monoid, in the more general sense, instantiated in the monoidal category of sets and functions Set×. Here the cartesian product serves as monoidal product.

So now let’s consider some 2-category C, pick an object  C and focus on the set of 1-cells from C to C. Let’s call this thing [C, C]. Actually, it’s not really a set: it’s the full sub-2-category of C on C; i.e. a 2-category with one object. We saw before that to give a category with one object is to give a monoid. In the same spirit, a 2-category with one object has the same information as a (strict) monoidal category. The objects are the 1-cells, and monoidal product (on 1-cells) is composition. I’ll leave the remaining details as a simple exercise.

So what is a monoid in [C, C], considered as a monoidal category? You guessed it – it’s a monad.

So monads are really special kinds of monoids. And monoids are really special kinds of categories. And categories are really special kinds of monads. And …



Leicester and the battle for universities

This is an off-topic post, but it concerns something very important. More graphical linear algebra is coming next week.

The short story is this: the University of Leicester is going through a top-down restructuring. The management have decided to force a large number of academics to reapply for their jobs. Of course, following this charade, there will be fewer jobs than academics, so the whole thing is a depraved game of musical chairs. In particular, in the excellent mathematics department, around six academics will be made redundant.

It’s very important that we come together and fight, so please sign this petition. In the grand scheme of things, the very concept of the university as an institution is under threat, and this is just the latest battle. Leicester is a reputable British university, but other countries are increasingly heading in the same direction: turning world-leading universities into failing businesses.

If you’re interested in my views on this topic, see the rant below.


Over the last twenty years, British universities have slowly transitioned away from being institutions that deliver learning and research as a public good. Universities were once seen as advancing human knowledge, and this was considered to be a positive contribution, a raison d’etre in its own right. Higher education meant that people could pursue their interests, fulfil their calling, have access to state-of-the-art research and knowledge in the field of their choosing, and perhaps even contribute to its further development.

At some point in the 1980s–coinciding with Thatcher in the UK and Reagan in the US–universities were infected by neoliberal thinking. Abstracts concepts like “advancing human knowledge” were considered suspicious and retrograde; it’s difficult to assign a monetary value to something abstract. Instead, universities were now seen as a engines of future economic growth. This led to investment in fields deemed as economically beneficial, e.g. applied STEM, law, economics, and savage cuts to fields deemed as less useful to the economy, typically meaning the arts, “less marketable” social science, fundamental scientific research, pure mathematics, and so on.

Stefan Collini, writing in the LRB, put it succinctly:

… gradually, what we still call universities are coming to be reshaped as centres of applied expertise and vocational training that are subordinate to a society’s ‘economic strategy’.

Moreover, universities themselves are increasingly being seen as businesses in their own right. The income comes from ever growing student fees, some government support for research and grant income. Academic career progression is now, in reality, often based on how much cash an academic generates. Non-performing (read, non profit-generating) academics are often bullied and threatened, sometimes with tragic consequences.

At the same time, management depressingly talks about “vision”, “strategy”, “stakeholders”, etc. Their emails are often peppered with meaningless middle-management business speak (think David Brent, or if you’re in the US, Michael Scott). Many British universities are expanding and investing millions to build campuses around the world. My own university has built a campus in Malaysia and the jury is still out on whether this was a sound investment. In Sold Out, another excellent LRB article by Stefan Collini, we find an indicative anecdote:

 In June 2012 the University of East London in Cyprus promised to offer ‘high quality British degree programmes in one of Europe’s most popular study destinations’ at a ‘stunning new campus’, but in April 2013 it was announced that after recruiting just 17 students UEL Cyprus would be closed. A spokesman for the university, the Times Higher Education reported, ‘would not disclose how much money the university will lose’.

Senior management, of course, keep giving themselves enormous pay rises and bonuses, regardless of performance: this is justified because they think themselves the equivalent of CEOs of large companies, deserving to be paid at “market value”. For example, the vice-chancellor of the University of Leicester takes home a salary of more that £300k a year. This makes the currently proposed redundancies even more perverse.

Even if you’re not based in the UK, if you care about universities, you should still get involved. Here’s Collini again:

Other countries are looking on with a mixture of regret and apprehension: regret because the university system in this country has been widely admired for so long, apprehension because they fear similar policies may soon be coming their way. In many parts of the world English higher education is, to change the metaphor, seen less as a useful pilot experiment and more as the canary in the mine.

So please, go ahead and sign the petition. As Collini writes in the last paragraph of his essay, we must stop first-rate British universities from turning into third-rate companies.


31. Fibonacci and sustainable rabbit farming

Chapter 12 of Fibonacci’s Liber Abaci is full of compelling problems. One is called “On two men with fish and the customs agent”. Another is a riddle about a man who visits a pleasure garden – accessible through seven guarded doors – where it’s possible to collect apples. You can enter freely, but on your way out you must pay each of the seven guards a tax: half of the apples you’re carrying, plus one additional apple.

Artistic impression of Fibonacci’s pleasure garden. The tax at this door is 10/+ 1 = 6 apples.

So, Fibonacci asks, how may apples does the man need to collect in the pleasure garden so that he gets to take one apple home? (spoiler: 382).

Given Fibonacci’s trading roots, his obsession with tax and customs duties is understandable. Without doubt, however, the most famous part of the chapter is not about taxes. It’s a very short exposition entitled “How many pairs of rabbits are created by one pair in one year”, and it starts as follows:

A certain man had one pair of rabbits together in a certain enclosed place, and one wishes to know how many are created from the pair in one year when it is the nature of them in a single month to bear another pair, and in the second month those born to bear also.

Fibonacci doesn’t tell us the name of the certain man. It will become boring to refer to him repeatedly as “the certain man”, so from now on we will change his gender to female and refer to her as Giusy (pronounced Juicy), short for Giuseppa the Rabbit Breeder.

The following helpful table is included in the margin, with the number of rabbit pairs for each month

table beginning 1
first 2
second 3
third 5
fourth 8
fifth 13
sixth 21
seventh 34
eighth 55
ninth 89
tenth 144
eleventh 233
twelfth 377

and the procedure for calculating the successive entries is explained as follows

You can indeed see in the margin how we operated, namely that we added the first number to the second, namely the 1 to the 2, and the second to the third, and the third to the fourth, and the fourth to the fifth, and thus one after another until we added the tenth to the eleventh, namely the 144 to the 233, and we had the above written sum of rabbits, namely 377, and thus you can in order find it for an unending number of months.

This sequence of numbers in the table

1, 2, 3, 5, 8, 13, 21, 34, 55, 89                          ①

is now known as the Fibonacci numbers. As Fibonacci explains, starting with F= 1 and F= 2, we obtain successive entries by recursively calculating

Fn+2 = Fn+1 + Fn.

This famous series—and the related golden ratio–are the absolute bread and butter of popular mathematics. Many facts, factoids and myths involving pineapples, pine cones, aesthetically pleasing rectangles, renaissance paintings etc. have been assigned to them.  They spice up the otherwise vapid pages of Dan Brown novels. As well as books, there are hundreds of videos on YouTube. While many are terrible, there are some gems: for example, this lecture by Keith Devlin is very good since he manages to be engaging, while at the same time challenging some of the most egregious numerological pseudo-science.


In modern presentations, Fibonacci’s famous sequence is usually given as

0, 1, 1, 2, 3, 5, 8, 13, 21,                              ②


1, 1, 2, 3, 5, 8, 13, 21, 34,                            ③

We will stick with Fibonacci’s original version .  There’s a good reason for this: Giusy’s first couple of rabbits start breeding already in the first month. So if the first rabbit couple was delivered one month late, Giusy would observe the following series

0, 1, 2, 3, 5, 8, 13, 21, …                               ④

Fibonacci’s  starts with the third number of  and the second of .  But it wouldn’t really work to simply shift  or  to the left: differently from , both  and  contain the sequence 1, 1, which Giusy would never observe.

We can obtain  by adding  and . Addition works pointwise, scanning vertically, 0+1=1, 1+1=2, 1+2=3 etc, as illustrated below. As we will see, this is the way of defining addition on number sequences. We will also define multiplication on sequences, but it won’t work anything like this.



The story of graphical linear algebra so far has taken us from zero to the natural numbers, to integers and then fractions. For the next few episodes we will shift our focus of from numbers to sequences. My aim for the rest of this episode is to whet your appetite by giving you a taste of the maths that we will use. For now, it will be a high-level overview, we will not go over everything in detail.

Eventually we will discuss some applications in control theory and I will do my best to explain when we come to that, although I’m not an expert. In this episode, though, I’ll talk about computer science motivations; some of the things that I found the most interesting while playing around in this particular neck of the graphical linear algebra woods.


We start by generalising Fibonacci’s rabbit problem.

Giusy originally got a pair of rabbits. No external rabbits get added, nor any taken away at any point afterwards. So we can think of this as the following rabbit input sequence

1, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

with resulting rabbit output sequence

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

Similarly, as we have already discussed, adding the first pair one month late would mean that the shifted rabbit input sequence

0, 100000000, …

results in the shifted output sequence

0, 1235813213455, …

But there’s no reason to add just one pair. For example, what if Giusy started with two pairs of rabbits? And, moreover, let’s suppose that she goes to the rabbit market every second month to buy another pair of rabbits. This rabbit input sequence would then be

2, 0, 1, 0, 1, 0, 1, 0, 1, 0,                           ⑤

Following the Fibonacci principle of rabbit reproduction, we would get output sequence

24, 7, 12, 20, 33, 54, 88, 143, 232,        ⑥

The reasoning is as follows: the input sequence 2, 0, 0, 0, 0, produces the result 2, 4, 6, 10, 16, … obtained by doubling each entry of . Twice the rabbits, twice the breeding. Next, 0, 0, 1, 0, 0, 0, 0, … produces 0, 0, 1, 2, 3, 5, …; that is, a shifted . Similarly for 0, 0, 0, 0, 1, 0, 0, 0, 0, … and so forth. Adding up all of these input sequences results in , so adding up all of the result sequences results in .

We can use the same kind of reasoning to figure out what would happen if Giusy started an ambitious commercial rabbit breeding farm. Say her business plan started with investing in three pairs, then selling one pair in the first month, two in the second, three in the third, and so forth. In this case, the rabbit input sequence would be

3, -1, -2, -3, -4, -5, -6, -7, -8, -9, …

and the result is, unfortunately

3, 5, 5, 5, 3, -1, -9, -23, -47, -87, …

so it looks like Giusy’s rabbit business would collapse sometime in the fifth month: negative rabbits doesn’t sound good. To be a successful rabbit entrepreneur, Giusy needs to understand the principles of sustainable rabbit farming, which we will discover at the end of this episode.

What’s going on here, mathematically speaking?

It looks like we are defining a function that takes number sequences to number sequences. By introducing and taking away rabbits we are causing the total numbers to change. We could try to write down a formula to calculate subsequent entries in the output sequence based on previously outputted values and the current input value. But formulas are boring.

Moreover, I hope that I’ve managed to make you at least a little bit uncomfortable about the pseudo-scientific language of causality. In graphical linear algebra, relations take centre stage. Of course, it is useful to know when a relation describes a function – as we will see below – but functions are not part of our basic setup. Instead, we think of the Fibonacci rules of rabbit breeding as defining a relation between number sequences. For example, it relates

1000000000, … with 123581321345589, …

2010101010… with 2471220335488143232

3-1-2-3-4-5-6-7-8-9, … with 35553-1-9-23-47-87, …

and so forth.

This relation is a kind of “ratio” between sequences. We have already seen how fractions in graphical linear algebra work and that they can be thought of as ratios. A little reminder example: the fraction 3/4 is drawn


and, seen as a linear relation, contains (4, 3), (8, 6), (-2/3, –1/2) and so on; all those ordered pairs that satisfy the ratio “three times the first number is four times the second”.

Next, a sneak preview of how we will use graphical linear algebra to capture the Fibonacci sequence ratio with diagrams.


The Fibonacci number sequence can be expressed nicely by something called a generating function. We will discover some of the underlying mathematics in the future. If you can’t wait, the classic generatingfunctionology by Herbert S. Wilf is free to download. Wilf’s Chapter 1 starts with the following, well-known simile:

A generating function is a clothesline on which we hang up a sequence of numbers for display.

The kinds of generating functions that we will use will be fractions of polynomials. So, as we will see, the generating function for  is


the one for  is


and the one for  is


If you haven’t seen this kind of thing before then this will all look very mysterious. Don’t worry, with graphical linear algebra everything will become crystal clear. At this point, just trust me: there’s a tight relationship between these fractions and the sequences that we saw previously.

I can’t resist making one remark though: look at the three fractions above. Since all of the denominators are the same, following the usual rules of addition of fractions, the first is just the addition of the second and the third, just like  is + as sequences. Indeed, an extremely interesting and useful property of generating functions is that their algebra as polynomial fractions is compatible with the algebra of the sequences they represent.

Generating functions go back to the 18th century and Euler. I’ve always thought of them as a kind of “prehistoric” computer science, 2oo years before Turing. Indeed, computer scientists are often faced with coming up with finite specifications (e.g. programming languages, automata, regular expressions, Turing machines, …) for infinite behaviour. Generating functions are a beautiful example of this phenomenon: they are a compact, finite syntax for infinite sequences.


It’s time to start drawing diagrams. The following is the diagram for (1+x)/(1-x-x2), the generating function for Fibonacci’s .


You should recognise most of the components of this diagram, apart from all the white x‘s on black backgrounds. They are new generators, and they stand for the indeterminate x of a polynomial. Adding them and their associated equations to our graphical language will give us the ability to write down any polynomial fraction whatsoever. We will go through this properly in the next episode.

Now for a preview of the cool stuff. We already noticed that the Fibonacci sequence ratio can be seen as a function from input rabbit sequences to output rabbit sequences. It turns out that this can be observed just by looking at the diagram. In fact, we can use diagrammatic reasoning to rewire the above into the following



which can be understood as a signal flow graph, a concept to which we will devote quite a bit of time in the future. Here the left boundary can be thought of as input and the right as an output. The denominator of the polynomial has been transformed into a fancy feedback loop.

We can now this latest diagram and turn it into some code. There’s no intelligence involved, a simple procedure takes a signal flow diagram and produces code. Here’s the result in OCaml. We will go through the procedure in due course.

(* "forward" Fibonacci *)
let ffib x = 
    let rec ffibaux r1 r2 r3 x = 
         match x with 
              | [] -> []
              | xh :: xt -> 
                  xh + r1 + r2 + r3 :: 
                      ffibaux xh r3 (xh + r1 + r2 + r3) xt
      in ffibaux 0 0 0 x;;

If you don’t know OCaml, don’t worry, you can still cook along: open this in a fresh browser window. It’s an interactive OCaml shell running right in your browser. Copy and paste the above code into the prompt.

If everything worked as expected, you ought to get the following

val ffib : int list -> int list = <fun>

which says that ffib is a function that takes a list of integers and returns a list of integers. OCaml is nice like that, it figures out the type for you. Now paste the following

# ffib [1; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0];;
- : int list = [1; 2; 3; 5; 8; 13; 21; 34; 55; 89; 144; 233; 377]

which is the list of the first thirteen Fibonacci numbers, as in Liber Abaci. Say Giusy throws in one additional rabbit pair each month, then the numbers will be

# ffib [1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1];;
- : int list = [1; 3; 6; 11; 19; 32; 53; 87; 142; 231; 375; 608; 985]

You get the idea; feel free to try other inputs.


Finally, we return to Giusy’s dream of becoming a successful, sustainable rabbit breeder. She’s in luck because the original diagram can be rewired in another way, again looking like a signal flow graph but this time where the input is on the right and the output in on the left. Now it’s the numerator that turns into a feedback loop.


But it’s still morally the same diagram, the same ratio, relating the same sequences. Just that now we understand it as having its input on the right and its output on the left. Nothing wrong with that: graphical linear algebra is good with symmetries. It’s also why dividing by zero wasn’t a problem.

We can also turn the new diagram into code.

(* "backward" Fibonacci *)
let rfib x = 
    let rec rfibaux r1 r2 r3 x =
        match x with
           | [] -> []
           | xh :: xt -> 
                xh - r1 + r2 + r3 :: 
                    rfibaux (xh - r1 + r2 + r3) r3 (-xh) xt
    in rfibaux 0 0 0 x;;

and, as a function, this now computes the inverse to ffib. So, for example:

# rfib [1; 2; 3; 5; 8; 13; 21; 34; 55; 89; 144; 233; 377];;
- : int list = [1; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0]

Now, let’s say that Giusy has space for 5 rabbit pairs and wants to sell off any overflow. What should her business plan look like?

# rfib [5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5;];;
- : int list = [5; -5; 0; -5; 0; -5; 0; -5; 0; -5; 0]

And we see that she ought to start with 5 pairs, sell off 5 in the first month, and every second month after that. Giusy’s happy, with both bankruptcy and rabbit plagues avoided. Graphical linear algebra saves the day again.


30. The essence of graphical linear algebra

Before we get started, I have an annoucement. Recently I have been contacted by Vincent Verheyen, a Belgian polyglot mathematician/developer who generously offered to translate the blog into several languages, starting with Dutch and French, but also German, Spanish, and even Chinese (with the help of his girlfriend Chung-Yi Lan). The first few episodes are already available and more will follow in the coming weeks. Vincent has set up his website in a way that makes external contributions possible, and the translations themselves are published under the Attribution-ShareAlike 4.0 International licence. So, if you are interested, please do get involved! I’m really honoured that Vincent has invested his time and energy in this effort, and it’s wonderful that the material will find a wider readership: at the moment, about 80% of the hits on the blog are from English speaking countries.


There is a new video series on YouTube by 3Blue1Brown called “Essence of linear algebra”. It’s slick and gives visual intuitions for matrices, spaces, determinants etc. It makes the important point that intuitions are not emphasised enough: too often linear algebra is presented as a series of definitions and algorithms to memorise. But you’ll notice that the diagrams there are quite different to the kinds of diagrams on this blog. The series has inspired me to think about the “essence of graphical linear algebra”. The story on this blog has been divided into bite size chunks, and as a result, it is sometimes difficult to see the wood for the trees. So in this episode we take a step back and look at the big picture. But first, let’s start with a little history.

Early 20th century mathematicians and logicians were a nervous lot, facing an existential crisis and a severe inferiority complex. Engineers were building astonishing machines that transformed the world in the timespan of a single generation. Their work was made possible by physicists who made breakthroughs in understanding the basic laws that govern the physical world, and by chemists who charted interactions between different types of molecules and compounds. For all of these pursuits there was an external reality to point to, a sanity check. A plane that did not fly was not much use; the design was improved or thrown out. Similarly, theories in the physical sciences could be put to the test in the lab. They were falsifiable: if experimental data did not fit the theory, it was the theory that was thrown out, not the real world.

Mathematicians, instead, worked with abstract systems, applying various levels of rigour. Instead of an external reality, they could point to the beauty and intellectual depth of their intuitions and results.  An added bonus was—as the physicist Eugene Wigner put it in 1960—the unreasonable effectiveness of mathematics in the natural sciences. Therefore the party line was (and still is) that even if the applications were not immediately obvious, eventually research in mathematics pays off. Mathematical advances are thus a vital ingredient in the supply chain of civilisation and technological advancement.

It’s a nice story, but what if it’s all built on sand? The mathematician’s ultimate defence–and for many the raison d’être of the entire discipline–is rigour and logical certainty of the work. Unfortunately, this was shaken up severely by Bertrand Russell in 1901 with his famous paradox. We have already seen that Russell had interesting things to say about causality. With the paradox, he showed that (naive) set theory, a branch of mathematics developed in the 19th century as a unifying formalisation for mathematics, was inconsistent. Basically, this meant that any proposition whatsoever could be proved. We could prove 2+2=4, but we could also prove 2+2=5. Nothing could be falsified because everything was both true and false at the same time. This, understandably, led to a moral panic. There were several important consequences, including Hilbert’s programme, Zermelo-Frankel set theory, Gödel’s incompleteness and Nicolas Bourbaki.

This blog is not about philosophy of mathematics, so I will skip right ahead to the main point: my intense dislike of Bourbaki. If you have not heard of Bourbaki before, it’s a rather romantic, movie-script kind of story: the scene is Paris in 1935, where a group of young, disaffected, rebel mathematicians wanted to change the world; or, at least, their discipline. They began to discuss their ideas of how to change mathematics, which they saw as too often informal, too disconnected, old-fashioned and chaotic. It was time for a revolution.


Street scene, Paris 1935.

To emphasise the collective nature of the endeavour, they published under the nom de guerre Nicolas Bourbaki. From 1939 onwards, Bourbaki authored a series of very influential textbooks. The project continued and gained power and prestige: over subsequent decades the group’s membership changed and at various times included many of the leading lights of modern mathematics as contributors, people such as Eilenberg and Grothendieck. The following is a description by Marjorie Senechal, taken from her wonderful 1997 interview with Pierre Cartier, a Bourbaki member between 1955 and 1983:

From its beginning, Bourbaki was a fervent believer in the unity and universality of mathematics, and dedicated itself to demonstrating both by recasting all of mathematics into a unified whole. Its goals were total formalization and perfect rigor. In the post-war years, Bourbaki metamorphosed from rebel to establishment.

Bourbaki’s unifying idea was that mathematics ought to be viewed as the study of mathematical structures. A mathematical structure is a set, i.e. an abstract collection of elements, with some structure: e.g. operations, subsets, constants. These then are required to satisfy a number of axioms. A vector space is one example: it is a set, whose elements we call vectors, containing a special element 0, an action of a field on that set, and an addition operation, all together satisfying a long list of axioms. All this data is now found in the first two pages of any advanced linear algebra textbook. Indeed, linear algebra–from the point of view of a Bourbakian–is the study of the mathematical structure called vector space. End of story.

Bourbaki resulted from similar currents of thought that produced fascism and totalitarian communism: moral panics leading to revolutions, and ultimately “final solutions”, all terrible and evil in various ways and degrees. Bourbaki was an attempted “final solution” for mathematics. This is not my hyperbole: in Senechal’s interview, Cartier says:

A final solution. There are good reasons to hate that expression, but it was in the people’s minds that we could reach a final solution. … In science, in art, in literature, in politics, economics, social affairs, there was the same spirit. The stated goal of Bourbaki was to create a new mathematics. He didn’t cite any other mathematical texts. Bourbaki is self-sufficient. … It was the time of ideology: Bourbaki was to be the New Euclid, he would write a textbook for the next 2000 years.

Bourbaki’s new mathematics had no space for pictures. Diagrams were too fragile, not nearly mensch enough. Cartier says:

Bourbaki’s abstractions and disdain for visualization were part of a global fashion, as illustrated by the abstract tendencies in the music and the paintings of that period.

Bourbaki is the reason why we still have boring textbooks in mathematics in general, and linear algebra in particular. Bourbaki hated intuitions. Bourbaki is partly responsible for the insane levels of specialisation in modern mathematics. In this blog we fight Bourbakism. ¡Hasta la victoria siempre!


Graphical linear algebra is not the study of a mathematical structure in a Bourbakian sense. Instead, we use diagrams to describe linear algebraic concepts. So matrices are not homomorphisms between vector spaces, but diagrams in a theory of adding and copying. As we have seen, an m×n matrix A is a diagram with n dangling wires on the left and m on the right. In our graphical shorthand:


The cool thing is that matrices are just the beginning. For example, a Bourbakian would consider the kernel of A as a sub vector space ker A of the domain of A: that is, a set of vectors v such that Av = 0. In graphical linear algebra the kernel is not a set. As we have seen, it is also a diagram, one that is obtained in a modular way from the diagram for A.


Similarly, the image of A is not the set { w | ∃ v. Av = w }. It is the following diagram.

image.gifOur diagrams are not fragile. They are precise mathematical objects and come with a set of equations which allows us to write proofs, transforming one diagram into another. As we have seen, all of these equations have rather simple justifications and describe how an abstract operation of addition interacts with an abstract operation of copying. Hence the slogan, which we saw back in Episode 7:

Linear algebra is what happens when adding meets copying.

Proofs are done through diagrammatic reasoning, which we discussed in Episode 6 . Their style departs from Bourbakian set theoretic arguments like “take an element x in ker A”. To show, for instance, that A has a null kernel if it is injective, we start with the diagram for A, append another diagram to obtain its kernel, and use the assumption of injectivity to transform it into the diagram for the null subspace. We saw this in the last episode, but here’s that part again, diagrammatically.




What really excites me about graphical linear algebra is the fact that it we can have our cake and eat it too: the diagrams provide intuition but are also sentences of rigorous, formal language.

Moreover, the diagrammatic method feels like a high level language when compared to the low level “machine code” of Bourbakian mathematical structures. We saw that the concepts of natural numbers, integers and fractions all follow from the interactions of the basic generators. There was no need to encode these concepts as separate mathematical structures. And the story led to some surprises too: in particular, we had to get over our dividebyzerophobia.

I hope to emphasise the idea of “diagrams as a high level language” in the next ten episodes or so. My plan is to start exploring how string diagrams can help in describing and reasoning about physical systems; in particular we will look at things called signal flow graphs in systems theory. There are some surprises in store: we will look at the idea of feedback in a new way. As we have seen, feedback is a concept that deeply troubles people who believe in causality. If causalists had their way, they’d happy throw the real world away to get away from the problems that feedback brings.

Bourbakism was the effect of the foundational crisis in mathematics of the early 20th century. Mathematicians can feel a little bit more secure today: after all, computer science and the information revolution has been built on mathematics and formal logic. Mathematicians also no longer seem to care very much about foundations, and computers mean that we don’t need to taylor notation for ease and speed of calculation. Instead, we can emphasise what I believe is the true purpose of maths: making complicated, seemingly impenetrable things intuitive and simple.


My sabbatical has finally started. On Monday evening I arrived in Honolulu, and I will be working with Dusko Pavlovic at the University of Hawaii for the rest of the year. Dusko has been working on very exciting things, one of which is to try to understand the “essence of computability theory” with string diagrams, not unlike the kinds of diagrams that we’ve been using on this blog. He uses them to form a theory called monoidal computer. You can check out an introductory paper here. I’m really excited to find out more and hopefully contribute something interesting.

I will also, hopefully, have more time to write.

Continue reading with Episode 31 – Fibonacci and sustainable rabbit farming.

29. Dividing by zero to invert matrices

I’ve let the blog slide. At the end of this episode I’ll write a bit about what I’ve been doing in the last few months. For now, let’s just jump straight back into graphical linear algebra.

In order to motivate myself to get back to writing, I brought forward an episode that I’ve been excited about since late last year, when I first did the research. This material is closely linked to Episode 26 on division by zero. If you haven’t read it, or you have forgotten, it will be useful to take a look before continuing.


Episode 26 was quite popular — maybe the clickbait title helped. It did leave some people a bit skeptical: ok, so this guy has derived addition and multiplication tables that seem to say that dividing by zero makes sense. But can this weird idea of dividing by zero actually be useful, or is it merely a curiosity?

There are other number systems that, in principle “allow” you to divide by zero. Many of them work similarly to the IEEE standard for floating point arithmetic, which has an additional entity called NaN (not a number). Roughly speaking, it’s the result of all of the calculations that shouldn’t make sense, like dividing by zero. Next, if you add and multiply a NaN with anything else, you still have a NaN. So it’s kind of like a black hole number: once you get it part way through your calculation you can be pretty sure that something has gone wrong and that the overall result will be a NaN — no matter what you do, you will never end up with an ordinary number again.

This is not the case for the number system from Episode 26. In fact, the three additional elements , , do show up in calculations. The cool thing is that they are not a sinkhole like NaN: you don’t need to panic if you get, say, a  in one of your calculations. It’s possible that things will finish well. This episode is entirely devoted to one example: inverting matrices.


We will need a few definitions.

An n×m matrix A is said to be invertible when there exists an m×n matrix B, such that when you compose in either direction, you get the identity. In diagrams:invdef.gif

If such a B exists, we sometimes write A-1 = B. Many matrices are not invertible. One property of those that are is that m=n, that is, invertible matrices are all square; the number of rows is equal to the number of columns.

Actually, it’s like this in most PROPs – you can usually only invert things where the left and the right boundaries have the same size. A useful intuition to keep in mind is that something can only be invertible if it can be undone. And since wires carry information, we can’t just throw one away or add a new one and expect to be able to undo it.

Back to definitions. We will say that an n×m matrix A is injective, or mono, when the following is true.


The backwards facing A is none other than the mirror image symmetry  that we discussed in Episode 27.

The above definition probably seems a bit strange. In most ordinary linear algebra textbooks, the defining property of an injective n×m matrix is that it takes  different m-vectors to different n-vectors: that is, it is injective as a function. This is actually an equivalent formulation, but we don’t think about matrices as functions on this blog.

One interesting property of mono matrices is that they are left-cancellable: that is, given two equal-sized matrices B, C for which AB and AC is defined, if AB=AC then B=C. In fact, this is true when B and C are linear relations. Diagrammatically:


implies that


Here B and C are linear relations, i.e. arbitrary diagrams in IH. This of course includes the case where B and C are matrices.

Actually injectivity and left-cancellability are also equivalent, i.e. assuming one, you can prove the other. I’ll leave the proofs as an exercise but I’ll give you a hint: it’s handy to use the fact, stated in Episode 27, that for any linear relation A, A is its weak inverse.

The bizarro of injective is surjective. We will also use the word epi. Thus, an n×m matrix is surjective when the following is true:


Just as mono matrices are left-cancellable, epi matrices are right-cancellable; given B and C such that BA and CA make sense, if BA=CA then B=C. In terms of ordinary linear algebra, surjective matrices are those that are surjective as functions; for every n-vector b, there exists an m-vector a such that Aa=b.

There is one final observation that we will need later: any invertible matrix is both injective and surjective. To keep things moving along, I’ll also leave the proof of this as an exercise.


Let’s now prove a classic result of linear algebra. It refers to kernels and images, which are linear subspaces that we discussed in the last episode.

Kernel and Image Lemma. Suppose that A is an m×n matrix. Then:

  1. A is injective exactly when it has a null kernel
  2. A is surjective exactly when it has a full image.

Proof. Properties 1 and 2 are mutually bizarro, so it’s enough to prove just one. We will do 1.

Suppose that A is injective. Then:


Which means that A has a null kernel.

Suppose now that A has a null kernel; we want to show that it must be injective.


That’s it, we’re done, QED.



The following lemma is very useful because it gives us some useful graphical intuition about invertible matrices.

Invertible Matrix Lemma

Suppose that A is a square matrix of size m. Then the following are equivalent

  1. A has inverse B
  2. There exists matrix B such


Proof. Let’s first show that 1 implies 2. Assuming 1, Suppose that A has inverse B. Since A is invertible, we know that A is injective. So we have the following:


Which gives us 2. In the first step, we used the assumption that B is the inverse of A, and in the second step we used the fact that A is injective.

We still need to show that 2 implies 1. So, supposing that A is equal to some matrix B heading in the opposite direction, we need to show that B is the inverse of A. To do that, it’s useful to show that A must be injective. Here we can use our Kernel and Image Lemma.


So, by the kernel lemma, A is injective. We can now use this to show that A followed by B is identity.


To show that B followed by A is also identity, we can do the bizarro thing, first showing that B is surjective.


Our aim is to obtain a formula for the inverse of a matrix, if it exists. The starting point is an arbitrary 2×2 matrix, so something like this:2x2.gif

If an inverse exists, then we know it’s also going to be a 2×2 matrix:

The tricky part is, how can we calculate a’b’c’ and d’, knowing ab, c and d? If you remember your linear algebra, maybe you’re thinking determinants and adjugate matrices. We will bypass all that and draw some pictures instead.


Let’s start by using our Invertible Matrix Lemma. As diagrams in IH, we have:


Now adding a matrix to -1 times itself gives us the zero matrix. Using this trivial observation, we get the following rather complicated looking equation:


We can extract the formula for each of a’, b’, c’ and d’ by plugging up some of the dangling wires in the above equation. Let’s work through the procedure for a’.

The first step is to plug a zero and a discard at the right places in both sides, like so:


Doing that, we get the first equality below.

step2.gifThe second equality follows from the bone equation of bimonoids.

So far so good: but now we can drastically simplify the left hand side. Essentially, we’ve plugged a white unit into a black comultiplication, and vice versa; usually when that happens, there is mass destruction. I’ll leave the graphical derivation to you. After you’ve killed off most of the diagram, the left hand side ends up looking like this:


The nice thing is that now a’ is the only entry from the inverse matrix that’s left. And we already know that the diagram above is equal to zero, i.e.


So we are in the situation where a’ plus -1 times some mess is equal to 0; thus:


This is starting to look like a formula. We can untangle the still somewhat messy right hand side above. The result is:


In terms of a bona-fide, old-fashioned syntactic formula—if you’re into that kind of thing—this translates to:


Note that I did not go further in simplifying the denominator because d might be zero, and in that case, multiplication may not be commutative. Relax; to evaluate the formula for specific numbers, we can use our addition and multiplication tables from Episode 26.

There is nothing special about a’. We adapt our procedure by plugging up the other combinations of wires and obtain formulas for b’, c’ and d’. All together:


I find it quite cute how a, b, c and d permute in each of the formulas. And no determinants, nor adjugates in sight!


Let’s do a worked example, calculating the inverse of the matrix:


Referring to the formula for the inverse, we have d=0, so we will divide by zero when calculating a’. For that, we can refer to the multiplication and addition tables from Episode 26. Here’s the calculation:


The other calculations don’t involve any funny stuff — we end up with


I promised you an example where the extended rational algebra of Episode 26 can be useful. And while inverting 2×2 matrices is not going to set the world alight, I hope that this is a hint that there is something interesting going on!

By the way, the formula that we derived for inverting 2×2 matrices generalises to arbitrary square matrices. I don’t have the space for this in this episode, but if there’s interest I can come back to this topic. Let me know.


Not writing is like not going to the gym (which, coincidentally, I’m also guilty of). I mean that once you start not writing, it’s difficult to get back into it. “I’ll do some writing next weekend” – you tell yourself. Before you know it, half a year has passed. Same with (not) going to the gym. I’m still working on that.

Having said that, it’s been a very busy half a year. In addition to teaching,  I visited Paris—and more specifically, Jean Krivine—for a month. I love Paris; I spent a six-month postdoc there in 2005 and it was fantastic to be back in the 13e arrondissement.


Apart from enjoying the city, it was a really productive visit and I learned a lot. Jean is currently working on a graphical high-level programming language for biologists, which shares many features with the kind of string diagrams that this blog is obsessed with.

Amongst other things, Jean told me about a family of string-diagrammatic calculi for logic, developed by the famous 19th century American logician C.S. Peirce (pronounced “purse”; it’s Dutch, apparently). Jean uses some of Peirce’s ideas for his programming language. Peirce’s systems are called existential graphs and have a fascinating history.  While Peirce considered them his most important scientific achievement, they remained unpublished in his lifetime; I guess due to a lack of typesetting technology for diagrams, but maybe also due to the fact that his ideas were so revolutionary that they were about 100 years before their time. In fact, Peirce’s diagrams came before the standard syntactic way of writing logic, with all the upside down As and backwards Es that logicians are so fond of. But his work was forgotten and were rediscovered only in the 1960s. They are still a topic of research.

Existential graphs really blew me away. I plan to understand them properly over the next few months, and maybe I will write about them more seriously one day. For now, if you’re interested in reading further, the PhD thesis of Frithjof Dau seems like a great resource.

This weekend I will be in Porto for the HDRA workshop, and the weekend after that I’m flying to New York for the LiCS conference. It’s going to be a busy summer, but I will try to work more regularly on the blog from now on.

Continue reading with Episode 30 – The essence of graphical linear algebra.

28. Subspaces, diagrammatically

There are two common ways to interpret the information represented by a matrix. Roughly speaking, it boils down to whether—when looking at a grid of numbers—you choose to see a list of columns, or a list of rows. The two interpretations are symmetric, and this is nicely captured by this blog’s obsession, the graphical syntax. And as we will see, the two ways of looking at matrices will lead us to two different, symmetric ways of understanding linear subspaces.


Consider an m×n matrix A, drawn with our graphical sugar for matrices as a gadget with n wires entering on the left and m wires exiting on the right, like so:


A can be viewed as consisting of n column vectors c1, c2, …, c. Each of the c‘s is individually a m×1 matrix, so it has one wire entering on the left and m wires existing on the right. We can then wire up A, emphasising the column vectors, like so:columns

Alternatively, we can view the same matrix A as consisting or of m row vectors r1, r2, …, rn, with each having n wires on the left and one wire exiting on the right. This leads to drawing A like this:rows

Even though in both cases we are talking about the same matrix—the same raw data—it turns out that the two different points of view lead to two quite different interpretations.


Let’s work through a concrete example. Consider the matrix


which has no particular significance; I just pulled it out of a hat, so to speak. But it’s usually a good idea to have a particular, concrete object to work with, and this one does the job.

Emphasising the columns, it can be drawn like this:


Or we can focus on the rows, as shown below.



Viewing a matrix as a list of columns leads to the first interpretation of what a matrix means: a recipe for of transforming n vectors—that is, columns of of n numbers—into m vectors, i.e. columns of m numbers.

Indeed, let’s take another look at A drawn in the way that emphasises its columns.


We can imagine the n dangling wires on the left as being n different inputs. Now each ci takes an input number and produces a column of m numbers. At the end, the n columns, each of size m, are simply added up.

Let’s work through an example to see how this works, using the concrete matrix from before.


Here, our input vector consists of 51 and -2. Each input results in two outputs, according to the recipe within each column block. They are then added up, e.g. 5+(-1)+(-2)=2 in the first output. Using the standard linear algebra jargon, we have produced a linear combination of the column vectors:


In our worked example above we took one particular, quite arbitrary choice of inputs: 5, 1 and -2. This leads to a natural question: exactly which vectors can be produced by this circuit, supposing that we are free to choose any inputs whatsoever.

In the example above, it turns out that all outputs can be produced by picking the appropriate inputs. But this is not always going to be the case; for example, if all the columns consist of only zeros then clearly only the zero output is possible.

The general question is, then, the following: given column vectors c1c2, …, cn, which m vectors can be written as a linear combination

α1c1 +  α2c2 + … + αncn?


How can we express this idea in our graphical language?

It turns out that the idea of considering all possible inputs is simple to describe. It is also quite intuitive. First, let’s return to our venerable discard generator.


Recall that our intuition, backed by the functional semantics, is that this simply discards any number given to it. With matrices, we usually like to think of numbers flowing from left to right. So what are we to make of discard’s mirror image?


Well, if the discard can throw away any number, then a nice way to think about its mirror image is that it can generate any number. So, to get back to our original question, we can talk about the linear subspace of Qm generated by the matrix A by simply considering all the possible inputs to each column. Like so.


The diagram now represents the subspace that consists of all the linear combinations α1c1 +  α2c2 + … + αncn of A‘s column vectors, where each α is a fraction. More precisely—as we have discussed in the last episode—the diagram gives us a linear relation of type Q⇸ Qm, which we can (since Q0 contains only one element) then simply think of as a subspace of Qm.

This subspace is quite famous and has several names; it’s variously called the image, range, or column space of A, or the span of c1, c2, …, cn. And, using our usual syntactic sugar, we can simply write it as:



I claimed above that our example matrix can produce any pair of outputs. Let’s do the calculation to make sure: it’s a nice little exercise in diagrammatic reasoning.


The image is thus the entire space Q2, so for any pair of fractions p, q we can indeed find inputs u,v,w such that



Viewing a matrix as a collection of rows leads to another popular interpretation of the information that a matrix represents: a list of equations that constrain the inputs. Let me explain.

Drawing A in a way that emphasises the rows, we can think of each ri as encoding a linear combination of the inputs.


For example, going back to our original matrix, let’s label the inputs with variables x, y and z.


So the rows tell us how to combine the various inputs to form the outputs: for example, the first row gives us x-y+z and the second 2y+z.

Here, the natural question is to think of the rows as homogenous equations, and consider the set of solutions, those concrete choices of x, y and z that satisfy them. Homogenous, by the way, means simply that we append = 0 to each linear combination of variables.

So for our running example, we could consider the collection of all the inputs constrained so that they satisfy the two equations:

x-y+z = 0                  ①             

2y+z = 0                  ②

We can also represent this graphically. When we emphasised the columns, we introduced a new intuition for the mirror image of discard. This time we are focussing on the rows, and so, not surprisingly, we will introduce a new intuition for the mirror image of the zero generator.

Recall that the zero generator, in the original left-to-right functional interpretation, simply outputs 0.


So how are we to think of it’s mirror image?

constrainThe answer is that it’s a constraint: it only accepts 0!

Using this insight, the solution set of the list of homogenous linear equations given to us by the rows of the matrix A is graphically represented as:


or using our usual sugar:


This is a linear relation of type QnQ0, so for the same reasons as before, it’s pretty much the same thing as a linear subspace of Qn. This subspace is also very important in linear algebra, and is variously called the kernel, or the nullspace of A.


Let’s do a calculation that will give us more information about the kernel of our long running example matrix. The calculation is a bit longer than before, but the details are not so important at this stage.


In the derivation we used the Spider theorem from Episode 23.

Here’s the interesting thing about the calculation: we started with a diagram in which we were thinking of the “flow” as going from left to right, where our inputs had to satisfy some equations. We ended up with a diagram which is easier to understand as going from right to left. Indeed, it’s an example of the kind of subspace that we got by focussing on the columns; it is:


where N is the matrix


In other words, the kernel of M is the same linear subspace as the image of N. This tells us that everything in the kernel is of the form


where α is some fraction. So the solutions of the system of equations  and  are x = 3α, y = α, z = -2α.


We have seen that some descriptions of a linear relation can be read “from left to right” and others “from right to left”. Indeed, we have just seen an example of a linear relation that had two descriptions:


So what is really going on here? It turns out that we can think in two ways about every linear relation.

  1. Every linear relation can be seen as “being generated”, in the way we described the image subspace.

rangesugar2. Every linear relation can be thought of as consisting of the solutions of homogenous equations, of inputs “being constrained”, the way we described the kernel subspace.kernelsugar

Let’s be more precise about this. We will start by considering the first of the two situations, the “being generated” case.

The proof of IH ≅ LinRel—which we haven’t seen yet but we will go through in due course—has one important consequence: for every linear relation R: m⇸n we can find an m×p matrix A and n×p matrix B such that


If we have such matrices A, B then we will say that they are a span form of R. Span forms allow us to think of R as being generated. In fact,

rangesugaris in span form, since

ndiscardis the graphical representation of the unique 0×n matrix.

The proof of IH ≅ LinRel also gives us the “being constrained” case, which is pleasingly symmetric. For every linear relation R:m⇸n, we can, respectively, find q×m and q×n matrices C and D such that


This is called a cospan form of R. Cospan forms capture the idea that a linear relation can be thought of as the solution set of a bunch of homogenous equations. An example is the kernel of a matrix A




is the unique m×0 matrix.

One little quirk of history is that the traditional way of doing linear algebra focusses very heavily on the span view, and almost totally ignores the cospan view. One takes a space and finds a basis, one talks about its dimension, etc. But the cospan view is just as expressive, given the symmetries of graphical linear algebra. So, it’s entirely possible that Martian linear algebra has been developed in a way that prioritised the cospan view of linear subspaces. It’s fun figuring out the details, and we will do some of this in the next episode!



I’m having a very heavy start to 2016: originally I was planning to publish this episode in early January, and it’s now early February. The highlight of the month was a trip to Oxford on Friday January 8. I was there to examine a thesis entitled “!-Logic: First-Order Reasoning for Families of Non-Commutative String Diagrams” of (the now Dr.) David Quick, supervised by Aleks Kissinger. I liked the thesis very much: it’s about a logic for string diagrams, much like the ones on this blog. David’s logic supports reasoning about whole families of diagrams, including supporting sophisticated inductive arguments; much more sophisticated and powerful than the kind of induction that we have been using on the blog. I will have to write about it at some point in the future.

I arrived in Oxford quite early, and it was a really beautiful morning. On my walk from the train station to the computer science department I got the urge to take a photo.


The next Friday, January 15, I was in Cambridge to give a talk. That trip was also a lot of fun, and I got the chance to catch up with some old pals. The evening was very pleasant, and I decided to walk from the computer science department to the train station. On the way I took this photo:


I later noticed—and I promise, as much as it looks like it was, this wasn’t planned!—that the photos themselves are quite symmetric: compare the relationship of water to land. Since we spend a lot of time talking about symmetry on this blog, I think it’s somehow appropriate.

Continue reading with Episode 29 – Dividing by zero to invert matrices.