# 17. Maths with Diagrams

The discussion over the last few episodes has now led us to the point where we have two different languages to talk about the same thing. The two languages, however, are rather different. One consists of diagrams, the other of matrices populated with natural numbers. If you already knew about matrix algebra, maybe you’re asking yourself the question “What’s the point of all these diagrams then? They don’t tell me anything that I didn’t know before. Is this guy claiming that this diagrammatic language is better in some way?”

The short answer is no. I don’t want anyone to throw away their matrices: there are extremely good reasons to know and care about them. But I have two arguments for why we should also care about diagrams.

The first argument is that a different language means thinking about the conceptual landscape in a new way. We will see this over and over again in graphical linear algebra, and sometimes it will cause us to reevaluate certain dogmas: ways of doing things that have come about because of the way that the standard notation works, rather than real underlying mathematical issues. These kind of things are not easy to spot if you only know one notation.

More generally, the idea that the languages we use influence how we think and act—linguistic relativity if you will—is well-known and appreciated by programmers. Most programming languages are Turing complete and so have equivalent power. So why not just learn one, say x86 assembly, and stick with it?

Programmers deal with real problems, which usually do not involve writing functions to compute the Fibonacci sequence or, infamously, inverting a binary tree.  Real problems are multi-faceted, and for some parts it may be useful to put on a functional hat, for others an object-oriented hat, and so on. The different ways of thinking that the various programming methodologies bring to the table are appreciated because

complex problems benefit from a combination of approaches.

When I was a student in the 90s that there were many entertaining flame-wars about which programming language approach was the best. As with most flame-wars, they were pretty pointless: programming has since grown up and nowadays it’s very rare to hear somebody claiming that a particular way of doing things is intrinsically better than another. Good programmers are comfortable thinking in several different languages, and choose the one that is most suitable for the current (sub)problem at hand. Flexibility is key.

Because of this, in the last 10 years or so, the boundaries between different language families have been blurring. For example, most functional programming languages have clever ways to do imperative things, and conversely, more and more functional features are making their way into industrial strength imperative languages. It’s pointless to argue which is the best approach, or the best language: such a thing doesn’t and will never exist. And anyway, it’s fun to see how your favourite problems can be solved with some newfangled language!

It’s been really interesting to see how different communities have reacted to this blog. About a month ago somebody posted a link to this blog on Hacker News. That link made it to the front page and resulted in a massive amount of interest, which really surprised me… who knew that so many people cared about linear algebra! Last week I posted a link there myself and again the feedback has been really great. I’ve also been regularly posting links on Reddit’s /r/math, which has also resulted in a lot of really interesting and useful comments, but also quite a bit of skepticism and maybe even a touch of hostility.

Allow me to philosophise for a moment for why this is the case. I think that it’s because many mathematicians are just not used to the idea of having different languages to talk about their concepts. In many mathematical subfields the notation is very conservative and standardised. It has evolved so slowly, so imperceptibly, that mathematical culture almost brings with an an understanding that  “standard notation” has been given to us by some higher power and it is not to be disturbed. Anyway, the concepts are the important things so who cares about notation. Many mathematicians simply don’t see the value in exploring different ways of expressing themselves. Programmers, it seems, are much more receptive to the idea.

The second argument for why we should care about diagrams is Occam’s razor — the idea that one should trim the fat: discard unnecessary assumptions to try to get at the core of problems. So let’s consider the amount of intellectual technology necessary to develop the concept of

matrices of natural numbers, and their algebra       ①

I need you to forget for one moment that you know all of these things already and that they are totally obvious to you; let’s pretend, for one moment, that we are explaining Earth mathematics to a group of interested Martians. And let’s not be too creative about the Martians. Let’s assume that they are of the Star Trek variety: three dimensional and humanoid enough to interest Dr McCoy in extra-curricular activities, but without the experience of an Earth education. So, to make them understand what ① means, we need to:

1. Explain the concept of natural numbers, together with addition and multiplication
2. Show that the natural numbers form what is called a commutative semiring. Let’s go through these requirements:
1. addition is associative, commutative, and has 0 as it’s identity; that is, for any natural number n, we have 0 + n = n = n + 0
2. multiplication is associative, commutative and has 1 as it’s identity; that is, for any natural number n, we have 1· n = n = n · 1
3. multiplication distributes over addition, that is, for all natural numbers a,b and c, we have a·(b + c) = a·b + a·c (in a non-commutative semiring we would also require (b+c)·a = b·a + c·a, but that follows here from commutativity of multiplication)
3. Define the concept of an m×n matrix, a grid of numbers. If we are totally honest then we ought to probably also go through the corner cases where any combination of m and n can be 0.
4. Write down the formula for matrix multiplication. This relies on already knowing about addition and multiplication of natural numbers. The semiring properties are necessary for matrix multiplication to behave as expected (e.g. for it to be associative).
5. Write down the formula for matrix addition.

This is actually a significant amount of work. And maybe the Martians are a bit curious about where exactly did the formula for multiplying matrices came from. They know from previous Earth experience that you can’t really trust those pesky humans and their crazy ideas.

On the other hand, we have already done all of the work of showing how diagrams work in the first few episodes of this blog. If explaining to the Martians, we would need to:

1. Explain the idea of magic Lego; the algebra of diagrams that consists of the two operations of direct sum and composition ;
2. Go through the rules of diagrammatic reasoning: stretching, tightening, moving generators along wires
3. Introduce the four generators together with the ten equations.

Let’s compare the two ways. In the traditional approach, one first defines the natural numbers and then “bootstraps” matrices on top. With diagrams, there is no precedence given to natural numbers as such; everything gets defined at the same time. Natural numbers can be recovered as special kinds of diagrams, those with one dangling wire on each end. We don’t need to invent any extra formulas for multiplying matrices because composition is a basic operation of the algebra of diagrams. Matrix addition is not a primitive operation at all, it can be constructed from the diagrammatic primitives.

Having said all that, this blog so far has followed a rather conventional route: first we talked of natural numbers as diagrams and then of matrices as diagrams. Today, we are going to go back and think about how the maths would have been developed without knowing about natural numbers and matrices from the outset. We will rediscover diagrams by focussing on the operations and constructions that are natural in their language and see where that takes us.

To spice things up let’s try some alternate history. This is something that historians really hate; my sister, who is a historian, always gets annoyed when I bring it up. But who doesn’t love to annoy their siblings?

We need to turn the clock back to before matrices were common, before notation settled into what appears in textbooks today. Our story thus takes place in early 13th century Pisa. Our hero is a certain young man, Fibonchi, a contemporary and friend of Fibonacci.

While a teenager, Fibonchi sailed to the Balearic Islands with a Pisan trade fleet. In Palma, he met a wise old Berber merchant who introduced him to four curious operations that spoke of putting things together and copying them, governed by collection of simple, self-evident rules. The old men of Palma enjoyed drawing the diagrams and even used them to perform some calculations.

On his way back home, Fibonchi realised that the diagrams are what is now known as a recursive data type: every diagram is put together from the four simple bricks, the identity and twist. Using this insight, he was able to prove a number of interesting things about them, using the principle of induction.

To help in his calculations Fibonchi wrote down the formula for diagrams that copy and add not just one wire, but an arbitrary number on wires. Coincidentally, this was the syntactic sugar we discussed a couple of episodes back.

Next, Fibonchi proved that these sugars satisfied a very pleasing property with respect to arbitrary diagrams. For example, given any diagram D with m dangling wires on the left and n dangling wires on the right, the following is true:

Of course, once Fibonchi had finished this proof, he immediately wrote down his second theorem.

He did not even bother writing down a proof, since he was told about about the principle of “bizarro” by the old man in Palma. Because all the equations in the system have bizarro (reflected, colour-swapped) versions, once you proved a theorem you’ve also proved its bizarro cousin.

Fibonchi also noticed the following two related facts, the second the bizarro version of the first.

These properties intrigued Fibonchi, and he went ahead and defined an operation on compatible diagrams—those that agreed in their respective numbers of dangling wires on the left and right. He called the operation sum. This operation would be rediscovered in different notation a few centuries later and given the name matrix addition.

He proved that sum was associative

and commutative:

Moreover,  each compatible family of diagrams contained an identity for this operation:

Fibonchi showed that D + 0m,n = D = 0m,n + D.

He also noticed that composition distributes over sum: we have

D ; (E+F)    =    D ; E    +    D ; F

The proof, reproduced below, was a straightforward application of his first theorem.

He was, of course, immediately able to state the bizarro version:

(D+E) ; F   =  (D ; F) + (E ; F)

Fibonchi knew that composition of diagrams was not commutative, in most examples it didn’t even make sense to compose diagrams in more than one way because the dangling wires did not match up. Non-commutativity was the norm. However, he did notice some peculiar things about diagrams with exactly one dangling wire on each end.

He noted that there was exactly one such diagram for every natural number, which gives the number of paths from the left dangling point to the right one. He understood that if two such diagrams have an equal number of paths then they are equal. His arguments were close to the rewriting argument of the last episode. Another strange thing was that the bizarro of every such diagram was equal to the diagram you started with.

Most curiously of all, composition between such diagrams was commutative, which Fibonchi confirmed with a simple induction. First the base case, where the first diagram has zero paths:

and the inductive step:

Inspired by Fibonacci’s masterpiece Liber Abaci, Fibonchi wrote his own treatise entitled Liber Diagrammatis. Unfortunately it is now lost; the last remaining copy fell into the hands of a Livornese fishmonger sometime in late 16th century, who used it to wrap his wares. The Livornese never much liked the Pisans or what they had to say.

Fortunately, we can continue Fibonchi’s story on this blog.

Continue reading with Episode 18 – Introducing the Antipode.

## 19 thoughts on “17. Maths with Diagrams”

1. Jacob says:

If you ever turn this work into a single document, I’d cut the first half of this chapter. It comes across as whining, and makes dramatic generalizations from tiny anecdotes. All the random links to programming topics are basically irrelevant and distracting. Claims like “programmers don’t have flamewars about languages anymore” are pretty funny. 😛

About the series more generally: if you could figure out a way to front-load some examples of what your diagrammatic technique can accomplish, it would make the whole project much more interesting. For example, if you could compare the full proof of a tricky theorem in both standard and diagrammatic form, showing a much clearer more compact (and visual) argument in place of a gnarly pile of symbol twiddling, that would be great. You don’t need to explain everything that’s going on, just make it clear that there is some useful power here.

One of my my favorite high school English teacher’s favorite quotations was: “if you start with an example, you end up with a generalization; if you start with a generalization you end up with NOTHING.” That is, all the philosophizing about what mathematics is and how people should learn to think in terms of new languages and so on is putting the cart before the horse. If you show us how this new tool is useful, we can draw those conclusions ourselves. If you preach at us and tell us what to think before we have any idea what specifically you’re talking about, we’ll just get bored and go away before ever getting to the meat.

Finally, I think this whole series is a bit confused about its audience. At some point I recommend you find a handful of real-life test subjects (e.g. high school students, math major undergraduates, 40-year-old mid-career engineers, whoever – I’m not sure who your audience is supposed to be) and hand them your posts here, printed out on paper, and then solicit direct feedback, and use that feedback to aggressively edit. My impressions are that (1) there’s a lot of excessive semi-irrelevant writing in several of the posts, especially ones near the beginning, and (2) the level of assumed mathematical sophistication changes quite a bit over the course of your presentation (but that’s coming from a place of relative comfort with these abstractions, so I’m not sure how other folks will respond).

Personally I’d suggest explicitly aiming your presentation to the level of a bright undergraduate math/science/engineering student, someone you can assume to have already taken a standard undergraduate linear algebra course. That would let you cut the whole argument down by at least 50% while still preserving almost the entirety of your real-world audience.

Like

• Thanks for the suggestions!

You are right that the audience has changed somewhat; when I started writing this I was aiming at a general audience, now I think I’ve settled in at around second year maths undergraduates… and I think that I’ll stay at this level for a while.

When I was an undergrad I remember that a famous professor there had a maxim: a good one hour maths talk should divide into four: the first 15 minutes should be understandable to everyone, the next 15 mins to the experts in the field, the next 15 minutes to the author only, and the last 15 mins no one should understand 🙂

The front-loading of examples that people should care about is a good point, but somehow I’m also attracted to the idea of building “tension” in a story-telling kind of way. There are some big “reveals” coming up in the next few episodes.

Starting in the next episode, I want to start explaining how you can deal with spaces diagrammatically; one side-effect is that we will get fractions, but the construction of fractions is really different in the diagrammatic language to how we are used to it, i.e. as equivalence classes of pairs of ints. Not just different in form, but also in function. This episode was written for this reason, to emphasise what’s natural from the diagrammatic point of view: e.g. non-commutativity is the norm, numbers can be recovered as certain families of diagrams, etc.

It’s too bad that it came across as whiny… I’ve been marking exams for the past two weeks and probably some of that is rubbing off 🙂 Anyway, I really appreciate your comments; maybe you can also let me know what you think after this cycle of around 7 episodes.

Like

2. Jacob says:

One more thing: is this tool supposed to be used for computation directly by end-users (e.g. in robotics, computer vision, computer graphics, physics simulation, scientific data analysis, finance, …) or is it mostly a tool used for mathematicians making proofs or a pedagogical tool for teaching concepts to beginners? After reading all of your posts so far, and as someone who uses linear algebra to solve problems every day, I’m still totally unclear on whether I’d get any advantage trying to use this tool to solve specific real-world problems.

Like

• At the moment it’s something that I’m using in my research. The field of people who know about it is somewhat limited, I would say less then 20.

I think it has pedagogic value, no question. The development of pedagogical materials was one of the goals for writing this blog.

I was surprised when teaching matrix multiplication for the first time just how tough students found it; the things that we consider the most easy and “obvious” are the things are often the toughest to teach. Basic set theory is another example.

This stuff is pretty new, our main paper on the topic is still in peer-review, although we have published some of the applications. I think that it’s too early to make grandiose predictions about how useful this stuff is overall. What I can say at this point is that I find it extremely beautiful and I learn something new about it every day… this blog keeps me working on some of the fundamentals, which is nice.

What kind of specific real-world problems do you have in mind?

Like

3. One last thing: The basic purpose of Linear Algebra, practically, is analyzing/solving linear systems, and the main tools are matrix decompositions/factorizations. I’d recommend figuring out what the expression is of QR, SVD, LU, Cholesky, etc. decompositions look like in terms of diagrams, and seeing if it’s easier to understand how those decompositions work or easier to prove things about them in terms of diagrams. Personally I find the matrix expressions of these concepts fairly straight-forward, but perhaps a diagrammatic understanding could be easier for newcomers? Does a diagrammatic view make it easier to understand eigenvalues and eigenvectors, or determinants?

Like

4. I have a pretty good story about inverting matrices.

Decomposition is something that I’m really interested in and I think that there will be a good graphical story to tell here, but I have not worked out the details, I know about Hermitian normal form and Smith normal form graphically and both are quite cool.

Eigenvalues/eigenvectors can be translated to the diagrammatic language, but I don’t know if there’s a great story there: what do we need to know apart from the algorithm to calculate them? What kind of questions would you like answered? Matrix notation is already a pretty good calculus for doing the computations.

There are some interesting things to say about determinants graphically though, but I won’t write about that for a while yet.

Like

5. Sounds like some fun things to look forward to. I’ll keep checking the contents page every couple weeks. 🙂

Liked by 1 person

6. I respectfully disagree with Jacob. I liked the prelude to this episode, I don’t think it sounds whiny at all, and I think the analogy between math notation and programming languages is spot on. You should look at this:

https://tromp.github.io/cl/diagrams.html

I think you’ll find a kindred spirit there.

I get the feeling there’s some deep insights to be had into tensors by following this line of thinking. As someone who never quite wrapped their brain around tensors it would be great if you could confirm or deny this.

Like

7. Nice, thanks for the link! I didn’t know about that notation.

There are string diagrams for symmetric monoidal closed categories (so basically ones with function types and lambdas). It’s really interesting stuff because syntactic correctness of scope and binding translates to an interesting condition on the diagrams called the Danos-Regnier switching condition.

I think that this work deserves to be better known, check out for example http://arxiv.org/abs/0905.4200!

Like

8. (not quite the same sort of diagram as the ones under discussion in this blog series, but also potentially useful as a tool for reasoning about tensors)

Like

9. “most functional programming languages have clever ways to do imperative things, and conversely, more and more functional features are making their way into industrial strength imperative languages.”

Did you mean to say the same ting twice?

Like

• No, I was saying that there’s imperative in functional, and functional in imperative.

Like

• Upon closer reading, I agree that you did say that. I misread it, got confused by the order change. Sorry.

Like

• Yes, I have seen it, and cited it many times in my papers. It’s a very nice survey!

Liked by 1 person

• clayraat says:

I wonder then – if, when you later introduce the opposites of addition/copying for doing fractions, you are moving from monoidal to autonomous categories?

Like

• More than that, self-dual compact closed.

Like