I’ve let the blog slide. At the end of this episode I’ll write a bit about what I’ve been doing in the last few months. For now, let’s just jump straight back into graphical linear algebra.
In order to motivate myself to get back to writing, I brought forward an episode that I’ve been excited about since late last year, when I first did the research. This material is closely linked to Episode 26 on division by zero. If you haven’t read it, or you have forgotten, it will be useful to take a look before continuing.
Episode 26 was quite popular — maybe the clickbait title helped. It did leave some people a bit skeptical: ok, so this guy has derived addition and multiplication tables that seem to say that dividing by zero makes sense. But can this weird idea of dividing by zero actually be useful, or is it merely a curiosity?
There are other number systems that, in principle “allow” you to divide by zero. Many of them work similarly to the IEEE standard for floating point arithmetic, which has an additional entity called NaN (not a number). Roughly speaking, it’s the result of all of the calculations that shouldn’t make sense, like dividing by zero. Next, if you add and multiply a NaN with anything else, you still have a NaN. So it’s kind of like a black hole number: once you get it part way through your calculation you can be pretty sure that something has gone wrong and that the overall result will be a NaN — no matter what you do, you will never end up with an ordinary number again.
This is not the case for the number system from Episode 26. In fact, the three additional elements ∞, ⊥, ⊤ do show up in calculations. The cool thing is that they are not a sinkhole like NaN: you don’t need to panic if you get, say, a ⊥ in one of your calculations. It’s possible that things will finish well. This episode is entirely devoted to one example: inverting matrices.
We will need a few definitions.
An n×m matrix A is said to be invertible when there exists an m×n matrix B, such that when you compose in either direction, you get the identity. In diagrams:
If such a B exists, we sometimes write A-1 = B. Many matrices are not invertible. One property of those that are is that m=n, that is, invertible matrices are all square; the number of rows is equal to the number of columns.
Actually, it’s like this in most PROPs – you can usually only invert things where the left and the right boundaries have the same size. A useful intuition to keep in mind is that something can only be invertible if it can be undone. And since wires carry information, we can’t just throw one away or add a new one and expect to be able to undo it.
Back to definitions. We will say that an n×m matrix A is injective, or mono, when the following is true.
The backwards facing A is none other than the mirror image symmetry † that we discussed in Episode 27.
The above definition probably seems a bit strange. In most ordinary linear algebra textbooks, the defining property of an injective n×m matrix is that it takes different m-vectors to different n-vectors: that is, it is injective as a function. This is actually an equivalent formulation, but we don’t think about matrices as functions on this blog.
One interesting property of mono matrices is that they are left-cancellable: that is, given two equal-sized matrices B, C for which AB and AC is defined, if AB=AC then B=C. In fact, this is true when B and C are linear relations. Diagrammatically:
Here B and C are linear relations, i.e. arbitrary diagrams in IH. This of course includes the case where B and C are matrices.
Actually injectivity and left-cancellability are also equivalent, i.e. assuming one, you can prove the other. I’ll leave the proofs as an exercise but I’ll give you a hint: it’s handy to use the fact, stated in Episode 27, that for any linear relation A, A† is its weak inverse.
The bizarro of injective is surjective. We will also use the word epi. Thus, an n×m matrix is surjective when the following is true:
Just as mono matrices are left-cancellable, epi matrices are right-cancellable; given B and C such that BA and CA make sense, if BA=CA then B=C. In terms of ordinary linear algebra, surjective matrices are those that are surjective as functions; for every n-vector b, there exists an m-vector a such that Aa=b.
There is one final observation that we will need later: any invertible matrix is both injective and surjective. To keep things moving along, I’ll also leave the proof of this as an exercise.
Let’s now prove a classic result of linear algebra. It refers to kernels and images, which are linear subspaces that we discussed in the last episode.
Kernel and Image Lemma. Suppose that A is an m×n matrix. Then:
- A is injective exactly when it has a null kernel
- A is surjective exactly when it has a full image.
Proof. Properties 1 and 2 are mutually bizarro, so it’s enough to prove just one. We will do 1.
Suppose that A is injective. Then:
Which means that A has a null kernel.
Suppose now that A has a null kernel; we want to show that it must be injective.
That’s it, we’re done, QED.
The following lemma is very useful because it gives us some useful graphical intuition about invertible matrices.
Invertible Matrix Lemma
Suppose that A is a square matrix of size m. Then the following are equivalent
- A has inverse B
- There exists matrix B such
Proof. Let’s first show that 1 implies 2. Assuming 1, Suppose that A has inverse B. Since A is invertible, we know that A is injective. So we have the following:
Which gives us 2. In the first step, we used the assumption that B is the inverse of A, and in the second step we used the fact that A is injective.
We still need to show that 2 implies 1. So, supposing that A is equal to some matrix B heading in the opposite direction, we need to show that B is the inverse of A. To do that, it’s useful to show that A must be injective. Here we can use our Kernel and Image Lemma.
So, by the kernel lemma, A is injective. We can now use this to show that A followed by B is identity.
To show that B followed by A is also identity, we can do the bizarro thing, first showing that B is surjective.
Our aim is to obtain a formula for the inverse of a matrix, if it exists. The starting point is an arbitrary 2×2 matrix, so something like this:
If an inverse exists, then we know it’s also going to be a 2×2 matrix:
The tricky part is, how can we calculate a’, b’, c’ and d’, knowing a, b, c and d? If you remember your linear algebra, maybe you’re thinking determinants and adjugate matrices. We will bypass all that and draw some pictures instead.
Let’s start by using our Invertible Matrix Lemma. As diagrams in IH, we have:
Now adding a matrix to -1 times itself gives us the zero matrix. Using this trivial observation, we get the following rather complicated looking equation:
We can extract the formula for each of a’, b’, c’ and d’ by plugging up some of the dangling wires in the above equation. Let’s work through the procedure for a’.
The first step is to plug a zero and a discard at the right places in both sides, like so:
Doing that, we get the first equality below.
The second equality follows from the bone equation of bimonoids.
So far so good: but now we can drastically simplify the left hand side. Essentially, we’ve plugged a white unit into a black comultiplication, and vice versa; usually when that happens, there is mass destruction. I’ll leave the graphical derivation to you. After you’ve killed off most of the diagram, the left hand side ends up looking like this:
The nice thing is that now a’ is the only entry from the inverse matrix that’s left. And we already know that the diagram above is equal to zero, i.e.
So we are in the situation where a’ plus -1 times some mess is equal to 0; thus:
This is starting to look like a formula. We can untangle the still somewhat messy right hand side above. The result is:
In terms of a bona-fide, old-fashioned syntactic formula—if you’re into that kind of thing—this translates to:
Note that I did not go further in simplifying the denominator because d might be zero, and in that case, multiplication may not be commutative. Relax; to evaluate the formula for specific numbers, we can use our addition and multiplication tables from Episode 26.
There is nothing special about a’. We adapt our procedure by plugging up the other combinations of wires and obtain formulas for b’, c’ and d’. All together:
I find it quite cute how a, b, c and d permute in each of the formulas. And no determinants, nor adjugates in sight!
Let’s do a worked example, calculating the inverse of the matrix:
Referring to the formula for the inverse, we have d=0, so we will divide by zero when calculating a’. For that, we can refer to the multiplication and addition tables from Episode 26. Here’s the calculation:
The other calculations don’t involve any funny stuff — we end up with
I promised you an example where the extended rational algebra of Episode 26 can be useful. And while inverting 2×2 matrices is not going to set the world alight, I hope that this is a hint that there is something interesting going on!
By the way, the formula that we derived for inverting 2×2 matrices generalises to arbitrary square matrices. I don’t have the space for this in this episode, but if there’s interest I can come back to this topic. Let me know.
Not writing is like not going to the gym (which, coincidentally, I’m also guilty of). I mean that once you start not writing, it’s difficult to get back into it. “I’ll do some writing next weekend” – you tell yourself. Before you know it, half a year has passed. Same with (not) going to the gym. I’m still working on that.
Having said that, it’s been a very busy half a year. In addition to teaching, I visited Paris—and more specifically, Jean Krivine—for a month. I love Paris; I spent a six-month postdoc there in 2005 and it was fantastic to be back in the 13e arrondissement.
Apart from enjoying the city, it was a really productive visit and I learned a lot. Jean is currently working on a graphical high-level programming language for biologists, which shares many features with the kind of string diagrams that this blog is obsessed with.
Amongst other things, Jean told me about a family of string-diagrammatic calculi for logic, developed by the famous 19th century American logician C.S. Peirce (pronounced “purse”; it’s Dutch, apparently). Jean uses some of Peirce’s ideas for his programming language. Peirce’s systems are called existential graphs and have a fascinating history. While Peirce considered them his most important scientific achievement, they remained unpublished in his lifetime; I guess due to a lack of typesetting technology for diagrams, but maybe also due to the fact that his ideas were so revolutionary that they were about 100 years before their time. In fact, Peirce’s diagrams came before the standard syntactic way of writing logic, with all the upside down As and backwards Es that logicians are so fond of. But his work was forgotten and were rediscovered only in the 1960s. They are still a topic of research.
Existential graphs really blew me away. I plan to understand them properly over the next few months, and maybe I will write about them more seriously one day. For now, if you’re interested in reading further, the PhD thesis of Frithjof Dau seems like a great resource.
This weekend I will be in Porto for the HDRA workshop, and the weekend after that I’m flying to New York for the LiCS conference. It’s going to be a busy summer, but I will try to work more regularly on the blog from now on.
Continue reading with Episode 30 – The essence of graphical linear algebra.