Episode R1: Redundancy and Zebra Snakes

Jason earned his Ph.D. in mathematics from the University of California, Riverside, working on string diagrams very similar to the ones found in this blog. Because of a temporal coincidence in connection with these diagrams, he was mentioned back in Episode 10. ~~His diagrams are obviously superior, being written vertically instead of horizontally~~. He is currently teaching maths and physics at Victor Valley College and putting together applications for postdoc and tenure-track positions at universities. There is hardly a scrap of paper in Jason’s office that he hasn’t doodled some of these kinds of diagrams on.

In Episode 24, Pawel mentioned there is a bit of redundancy in the graphical presentation of interacting Hopf algebras. In this short series of episodes I will show how you can derive some of the redundant equations, and in pruning the redundancy there will be a bit of a twist. The equational system interacting Hopf algebras, or interacting Hopf monoids, as displayed in Episode 24, is repeated below in its full glory.

Discounting the Scalars equations, of which there are infinitely many, this equational system features a total of 38 equations. As we can see, though, we only need two representative Scalars equations, so if we count by equation representatives, we have a total of 40 equations in the system. Shaving off redundant equations with Occam’s razor, we can cut this total in half.

Antipode and Antipode^op

Let’s start by looking at the five equations in the Antipode box, beginning with how the antipode interacts with adding. This equation

says -a + -b = -(a + b), and it can be proved as follows:

A bizarro argument similarly proves

The interaction between antipode and zero

states that -0 is the same as 0. If you throw a (black) bone to the left side, it’s not hard to see how to prove this using the Hopf law.

A bizarro argu… oh. Wait a minute. We need to be a little bit cautious here. When whittling down the equational system to remove redundancy, careful attention must be paid as to which equations are used to prove various theorems. We used the fact that double negation cancels to show -0 = 0, and since antipode is self-bizarro, that fact would be used in the purported bizarro proof. But our proof that double negation cancels used the statement

So we can’t use the fact that double negation cancels to prove this statement.

Fortunately there is another way:

In that second-last step, we’re using one of the equations from the Division by Zero episode (Episode 26), namely:

The proof of (‡) depended on (A3), though, so it seems we are caught in a circular argument again. Something has to give, so let’s keep things black and white and take (‡) to be part of the reduced equational system, along with its bizarro equation, (†).

Running all these arguments in a mirror, we can show the corresponding Antipode^op equations are also redundant. So, while we didn’t quite manage to get rid of (A3) for free, we have cut down the set of the antipode equations to just:

That’s eight redundant equations thrown out already!

Zebra snakes

The way the antipode interacts with the cups and caps, which we used in the last derivation, hints toward something even bigger that we can tackle. In Episode 23, the cups and caps equations

give us the way to turn half-snakes of one colour into half-snakes of the other colour, so long as an antipode is included on one of the half-snakes. Together with the snake lemma (also in Episode 23) we get an alternate presentation of the antipode.

If you start with the cups and caps equations as given, it’s easy to show

If you take a snake where the left and right halves are opposite colours, one of the halves can be made to have the same colour as the other by introducing the antipode. But now the snake lemma can be used to straighten the snake out, leaving behind only the antipode! That means when adding, copying, and their mirror opposites all meet, the antipode can be taken as a syntactic sugar (i.e. redundant as a generator).

So let’s ignore the cups and caps equations for the moment and define the antipode as a sugar.

Don’t worry. These zebra snakes are not as dangerous as the biological version of zebra snake. When the adder half of the snake is on the lower right, we will take that to be the official designation for the antipode. The other three zebra snakes can be formed using our conventions for bizarro and opposite. The little white box in the top right corner breaks the symmetry, for now, so that we can distinguish it and its mirror image opposite.

Let’s use green for the colour inverse, giving us the the following two variants.

I would apologise to the red-green colourblind readers for this convention, but you actually get a sneak peek at the truth here: red=green. One of the stipulations made when antipode was introduced was that antipode should be self-bizarro – that’s why it was red and square, after all, since bizarro is only supposed to affect black and white. Is this stipulation still true when antipode is simply defined as a zebra snake? Yes!

In the derivation above, first we pull the black half-snake down and left, then pull the white part up and right. Finally, we use the (co)commutativity of the copying and adding.

Mirroring this argument tells us antipode^op is also self-bizarro.

With the zebra snake definition of antipode, the black and white snake lemmas show that the mirror image of the antipode is its inverse:

Recall that the Hopf equation, the main equation involving antipode and the one that says “1+-1=0” is the following

and (together with a couple of now-redundant equations) it implies that -1.-1 = 1, i.e.

So assuming the Hopf equation, and then using the fact that inverses are unique we have

So all the zebra snakes are equal!

Let’s go back to the Cups and Caps equations. Both are now provable, since:

But at this point, we can stop distinguishing between them and return to using just a symmetric, red box and use any of the four definitions in proofs.

So the only equations we need for the antipode—besides its definition as a sugar—are the Hopf equation and its mirror image!

Hold on… did we just prove the Hopf equation? That means the antipode and all the equations involving it is completely redundant! What a twist!

So what have we gained here? We dropped 12 equations and a generator as redundant, and we added a syntactic sugar and four equations. The original big box of equations can be reduced to the following, smaller set:

And this isn’t even IH‘s final form.

I would be remiss not to point out that just because we can treat certain diagrams as syntactic sugars instead of as generators doesn’t mean it is most expedient to do so. We should like to have the antipode even if we do not have access to the opposites of adding and copying, for instance, because (bicommutative) Hopf algebras are important enough to be considered separately from relations. Likewise, we should like to have the twist, independently of the antipode, to describe (bicommutative) bialgebras. After all, the antipode is not accessible when we are drawing diagrams of matrices of natural numbers, as in Episode 11.

While we have made some headway, there are still a handful of equations that I need to get rid of to fulfill my promise of reducing the interacting Hopf algebras equational system from the Big Box of 40 equations to only half as many. In the next part of this miniseries, I will continue chipping away at the redundant equations. There is a saying that less is more, but sometimes the converse is the appropriate aphorism: more is less. There are four very simple equations that are missing from the Big Box, which have already been included in the reduced Big Box above. With their help, we can prove equations that are completely in the mirror universe. And some of those four additional equations end up being redundant, too.

ACT 2018 – Applied Category Theory Research School

I have some exciting news. We are recruiting participants for the inaugural Applied Category Theory (ACT 2018) research school, which will culminate with an intensive research week held at the Lorentz Center in Leiden, the Netherlands, in late April 2018.

The application deadline is soon: 1st November 2017.

If this sounds interesting, read on. Even if you think that this is not for you, maybe you know someone who would be interested: in that case, please pass this information on!

There are four mentors: John Baez, Aleks Kissinger, Martha Lewis and myself.

The high-level goal of the school is to bring early career researchers (i.e. late PhD through to postdoc stage) into the applied category theory community, and (hopefully!) do some exciting research together. In particular, we encourage applications from outside the category theory clique: broad computer science, physical science, engineering and even social science.

Each mentor chose a theme and a couple of papers for the reading list. The themes are:

John Baez – Semantics for open Petri nets and reaction networks
Aleks Kissinger – Unification of the logic of causality
Martha Lewis – Compositional approaches to linguistics and cognition
Pawel Sobocinski – Modelling of open and interconnected systems

I’ll explain what my theme is all about, but first I should say a bit about the innovative format of ACT 2018 developed by the excellent organisers, Brendan Fong and Nina Otter.

Preparations for the April research week will begin already in January 2018: the selected participants will video-conference with mentors, and go through a reading list to prepare them for an intense working week at the school itself; the hope is to get to the point where there is a real possibility of doing some original research, and maybe even obtaining publishable results!

My theme is “modelling of open and interconnected systems” and the two papers I chose are:

A Carboni and RFC Walters. Cartesian bicategories I. Journal of pure and applied algebra 49.1-2 (1987): 11-32.
JC Willems. The behavioral approach to open and interconnected systems. IEEE Control Systems 27.6 (2007): 46-99.

Below is a short advertising spiel that explains what I mean by the title and why I chose these two papers.

The data revolution of recent years means that scientific models (in physics, biology, medicine, chemistry, climate science, computer science, systems theory, …) are getting larger, more fine-grained and detailed and ever more complex. Algorithms that answer questions about model behaviour are often exponentional in the size of the model, meaning that–even with advances in hardware and software parallelisation–studying monolithic models usually does not scale. Moreover, monolithic models are poor engineering practice; for example, they are not portable in the sense that standard components cannot be reused from one application to another. These are just some of the reasons why a comprehensive mathematical theory of open and interconnected systems is urgently needed.

As Willems argued in The behavioral approach to open and interconnected systems, models for such systems must be fundamentally relational in order to be useful – the underlying physical equations that govern behaviour (e.g. Ohm’s law, the gas laws, …) are seldom causal: they merely govern the relationship between various system variables. This contrasts with the human custom of thinking causally about system behaviour. Willems argued that causality–while useful for human intuition–is physically dubious, not compositional and often complicates the mathematics.

To arrive at a comprehensive relational theory of open systems, category theory is unavoidable: one needs the generality to cover the wide variety of mathematical universes relevant in the various application areas, while identifying common structure. Carboni and Walters’ Cartesian Bicategories of relations offers a general and elegant framework for relational theories and has the potential to be relevant across multiple application areas.

I really hope that we manage to attract some engineers and system theorists. My research questions are pretty open-ended:

Assemble a menagerie of relational theories from a variety of disciplines
Evaluate cartesian bicategories of relations as a mathematical foundation of open and interconnected systems
Study extensions (abelian bicategories, nonlinear variants)

I think it will be a lot of fun!

The official call for applications is below.

We’re delighted to announce the Applied Category Theory 2018 Adjoint School, an initiative to bring early career researchers into the applied category theory community.

The Adjoint School comprises two phases: (1) an online reading seminar based on the recent Kan Extension Seminars, and (2) a four day research week at the Lorentz Center, Leiden, The Netherlands. Participants will also be invited to attend Applied Category Theory 2018, which will be held immediately following the research week, also at the Lorentz Center.

During the school, participants will work under the mentorship of four mentors, on one of the following research projects:

John Baez: Semantics for open Petri nets and reaction networks

Aleks Kissinger: Unification of the logic of causality

Martha Lewis: Compositional approaches to linguistics and cognition

Pawel Sobocinski: Modelling of open and interconnected systems

The online seminar begins in early January 2018, and will run until the research week begins on April 23rd, 2018. Applied Category Theory 2018 will be held April 30th to May 4th.

Applications to participate in the school are now open. For more details, including how to apply, please see here:

http://www.appliedcategorytheory.org/school or contact the organisers. Applications close November 1st.

On behalf of the organisers,

Brendan Fong (bfo@mit.edu)

Nina Otter (otter@maths.ox.ac.uk)

Eigenstuff, diagrammatically

Last Tuesday, September 5 2017, I gave a tutorial on Graphical Linear Algebra at CONCUR 2017 the 28th International Conference on Concurrency Theory, which this year was held in Berlin.

I have a personal history with CONCUR. It was the first comp sci conference I attended (CONCUR 2001, Aalborg, Denmark). I gave my first comp sci talk the year after at a satellite event of CONCUR 2002 (Brno, Czech Republic). Another year passed and I had a paper accepted at CONCUR 2003 (Marseille, France), even if it was mostly my brilliant coauthor’s Bartek Klin‘s work. It was, and remains, a community with a particular way of looking at the world, and it has definitely shaped my own thinking.

So it was a real honour for me to be invited to give a tutorial at CONCUR 2017. I had a problem though: how to talk about the stuff on this blog to a community in which linear algebra is not, per se, a topic of interest.

One of the main research threads of CONCUR is process algebra. Process algebra is the study of idealised programming languages. Idealised here means two things: first, one strips away many features that would need to be included in a real-world programming language (I/O, fancy control flow, etc.) and leaves only a few fundamental ways of expressing some computational primitives, say synchronising communication, sequencing and recursion (CCS, the Calculus of Communicating Systems), or asynchronous communication with name passing, sequencing and recursion (Asynchronous π-calculus).

The basic idea is that stripping away irrelevant constructs makes a language more logically/mathematically tractable: a process algebra is thus the distillation of some computationally-inspired artefacts into a mathematically rigorous playground, allowing for the study of the expressivity of those primitives that remain and the interesting ways they interact, without interference from extraneous elements. An analogy can be made with chemistry: in order to study some fundamental reaction between compounds, one works with pure solutions in order to obtain meaningful results. That’s the basic, motivating idea, in any case.

The second meaning of the word idealised is that – unlike ordinary programming languages – process algebras allow you to write specifications as well as implementations. I won’t attempt to give a formal definition of what these words mean, but I’ll try to give some intuition.

An implementation has to be able to run: at some level of abstraction it must be able to compute outputs from inputs on some notion of computing machine. The mathematical analogy is with (total) functions X → Y. A function must be able to deal with every element of X and, given some x ∈ X, unambiguously evaluate to an element f(x) of Y.

On the other hand, a specification may be partial: it can be undefined on some inputs, where there isn’t any proscribed behaviour and the details are left to the programmer. Similarly, it may be the case that for some inputs we allow several possible results without giving any way of determining which one is preferable or how to arrive at any one particular output – this is known as non-determinism. The analogy here is with a relation R from X to Y. Since a relation is simply a subset of X×Y, it does not need to be total (i.e. there may be some x ∈ X for which there is no y ∈ Y such that (x,y) ∈ R) and it does not need to be single-valued (there may be some x ∈ X and two different elements of Y, say y₁ and y₂, with y₁≠y₂ such that both (x,y₁) and (x,y₂) are in R).

Graphical linear algebra feels like a specification language for linear algebra. Once you start using it, matrices feel like low-level implementations; the diagrammatic language gives you a higher-level way of expressing yourself. Of course, the equational theory means that we can “compile” down to the low-level stuff. A nice example of this idea in action is the diagrammatic way of dealing with the concepts of an eigenstuff (eigenspaces, eigenvalues, eigenvectors, eigenbasis) and spectral decomposition.

A linear transformation, defined by some n×n matrix A, has an eigenspace E with eigenvalue λ when for each vector v ∈ E, we have

Av = λv.

By isolating an eigenspace/eigenvalue pair (E, λ), we are saying that A has a particularly simple description on the sub-domain defined by E, where it is just scalar multiplication by λ. We can say this graphically as follows:

We will use the above equation as a diagrammatic definition of the concept of an eigenspace, eigenvalue pair (E, λ). It’s an important equality, so let’s go through the individual bits in detail.

E is a diagram from n to 0, therefore, it defines a subspace of Qⁿ. The diagram on the left is the linear relation that results from restricting the domain of A to E. Using traditional notation, the left-hand side is the linear relation

{ (e, Ae) | e ∈ E }

and the right-hand side is the linear relation

{ (e, λe) | e ∈ E }.

The above diagrammatic equation thus says precisely that A, when restricted to E, is the same as simply multiplying by λ.

Suppose that we can find a decomposition of the entire domain (Qⁿ) of A into eigenspaces. That is, suppose we have E₁, …, E_k such that

and for each E_i, we have an eigenvalue λ_i, that is

Such a collection (E_i, λ_i) is called the spectral decomposition of A. Given a spectral decomposition, we can write A as a sum of the constituent behaviours, in the following sense:

It is useful to think of the right hand side of the equation above as a specification. Indeed, we are defining a linear transformation by declaring what it should do on each of its eigenspaces. Continuing this analogy, the left hand side is a matrix, an implementation of the specification.

Here’s a proof of the above equality, using diagrammatic reasoning. In the second step we use the third-to-last diagram above, which says that the entire domain of A can be decomposed as a sum of eigenspaces. The third step is a simple exercise in graphical linear algebra, to which we will return below. In the fourth step we pull A across the addition, and then rearrange the diagram. In the last step, we simply use the diagrammatic defining equation of eigenspaces/eigenvalues.

To dot all the i’s and cross all the t’s, let’s come back to the third equation in the above derivation. It relies on the following lemma

which can be proved by induction, starting with the following equality as the base case.

I’ll leave this last bit as a fun exercise.

Let’s go through a concrete computation that demonstrates how to turn a linear algebra specification into a matrix implementation.

So suppose we want to construct a function that acts on the following two subspaces

by multiplying by 5 and -1, respectively. In more traditional set-theory language, these subspaces are { (x,y) | x,y ∈ Q, x+2y = 0 } and { (a, 2a) | a ∈ Q }. You can read off the set theoretical descriptions directly from the diagrams by reading the first from left to right, and the second from right to left, using our standard intuitions for the meaning of the white and black dots.

Notice that the two diagrams are the colour-inverse of one another, and therefore they are orthogonal as subspaces. In general, for an implementation to result in a matrix specification, we don’t necessarily need to start with mutually orthogonal subspaces: it suffices that they add up to give the whole domain and, pairwise, have a trivial intersection, i.e.

whenever i≠j.

Since our chosen eigenvalues are 5 and -1, the first eigenspace/eigenvalue combination lets us write down the first part of the specification.

The second is then

Summing up the two components gives us the entire specification:

We should now be able to compile this thing to a matrix. Let’s start by getting the middle bit to be all black.

Now we can move the numbers out of the way

We are left with two connected black networks in the middle. Here we can use the spider theorem to simplify further.

The diagram above thus consists of a “backwards” matrix

followed by the matrix

If this is unclear, take another look at the episode that describes the connection between diagrams and matrices.

Now a “backwards” matrix, going from right to left, is the same thing as its inverse going from left to right, so the entire thing can be written as:

We can calculate the inverse in any way we like, including using graphical linear algebra. The final result of inverting and multiplying is

and this is the implementation of our original spec.

Let’s use diagrammatic reasoning to work derive the standard “textbook formula” for doing this kind of thing with matrices.

Suppose that A is an n×n matrix, and that we can find an eigenbasis: a basis of its domain Qⁿ that consists of n eigenvectors e₁, e₂, …, e_n, each e_i with eigenvalue λ_i. Using the diagrammatic definition, this means simply that:

Here it’s useful to put some numbers on the wires, to make clear that each eigenvector e_i is a column vector, i.e. a diagram from 1 to n.

Now let’s add up all of the components to get the specification for A:

We can now use diagrammatic reasoning to get this thing into something more familiar:

The last diagram can be written in matrix form. Let E be the n×n matrix with e₁,e₂, …, e_n as its column vectors. Then the first part of the final diagram above is E backwards, followed by a diagonal matrix D with the λ₁, λ₂, …, λ_n on its diagonal, finally followed by E. We therefore get the standard textbook change-of-basis formula:

This is the story I used in order to explain graphical linear algebra to people from the CONCUR community: we can think of the diagrammatic language as a specification language for linear algebra. If we pursue this analogy, as I did above, it is natural to think of matrices as implementations, implementing simple linear programs (linear transformations) on the “abstract machine” where the states come from a finite-dimensional vector space.

But graphical linear algebra is more expressive than matrices: we can give partial specifications of behaviour, as demonstrated with the diagrammatic definition of the eigenspaces/eigenvalues, as well as non-deterministic specifications. Non-determinism gives rise to the idea of refinement: how to go from a non-deterministic specification to a (non-unique) implementation that does the job. I had a paper about this at CONCUR this year, presented by my PhD student Joshua Holland (who, by the way, applied for his PhD last year using the link on this blog!) and co-authored with Filippo Bonchi and Dusko Pavlovic. I will write more about non-determinism and refinement soon.

Last week was particularly busy. After my CONCUR tutorial in Berlin, I flew back to the UK for the inaugural STRING meeting, the 1st Workshop on String Diagrams in Computation, Logic, and Physics, which was held on Friday and Saturday (8-9 September 2017), co-located with the HDRA workshop on higher-dimensional rewriting and applications. The meeting was held in the upstairs gig area of the Jericho Tavern in Oxford.

You never know how much interest you’ll get in a brand new event, but I have to say that it exceeded my expectations: it was a full house, with more than 60 people in the audience, enjoying an exciting programme full of great talks. I loved it.

Paul-André Mèllies giving his invited talk.

The back of the room, the Jericho was totally packed!

It was also really fun to meet a few blog readers who decided to attend. Some told me to get off my ass and write more articles. It’s definitely given me some extra motivation, so watch this space!

If you enjoyed this article, please subscribe to get email updates when new articles are published. You will find a subscription link at the bottom of this page.

Orthogonality and projections

This blog is currently a bit of a mess. I’ve given up–for now–on trying to keep a coherent order of episodes; eventually I will collect things and put them in the right place. But, for now, I hope that the story in this article is cute enough to stand on its own two legs.

A couple of weeks ago Robin Piedeleu left a comment in Episode 28, asking about orthogonal projections in graphical linear algebra; specifically, he asked

how would I go about forming the diagram for the orthogonal projection onto the kernel of a matrix?

This is a pretty standard exam type problem in ordinary linear algebra, and requires a fair bit of matrix hackery that one needs to memorise. I used to hate these kinds of problems, especially in exams, because I’d always make routine errors in calculations. Here, I think, the graphical notation really stands out both for pedagogical purposes and for calculation: I definitely make fewer mistakes!

This post will also be a bit quicker paced than usual because I want to get to the answer and actually perform some computations. I will omit many of the proofs, but will (quickly) explain what the concepts of orthogonality, orthogonal complements and orthogonal projections mean and how one deals with them diagrammatically.

We start with a slightly different notation for arbitrary linear relations (I think due to Peter Selinger?) which is convenient for illustrating the two symmetries (colour swap and mirror image) of graphical linear algebra. Given a arbitrary diagram R: m → n, we will draw it as

The reason for this notation is that the symmetry breaking black square gives us a nice way to denote the mirror image diagram (R^†: n → m).

(Filippo, in addition, likes to mirror the letter R; I’m a bit less keen on that but it does fit the spirit.)

We can also use this notation to show the colour swap symmetry

and combine the two to get the graphical notation for bizarro (combining colour swap and mirror image) of R.

Given the quick and dirty mood of this article, I will, from now on, leave out the labels on the wires; they can be reinserted back consistently and without too much effort; leaving them out makes the diagrams cleaner and faster to draw.

The linear relations of interest for this topic will be of the form R: m → 0, which we know are just a graphical linear algebra way of saying “a subspace of Q^m“. We have a nice graphical way of denoting the sum and intersection of such subspaces. Given, R: m→0 and S: m→0, their sum R + S and intersection R ∩ S are, respectively:

The first question to tackle is what exactly does orthogonality mean, diagrammatically speaking?

Let’s start with the usual explanations. In the geometric interpretation of linear algebra, two vectors v, w are said to be orthogonal exactly when they are at right angles to each other. Algebraically, one defines the dot product of vectors:

v ⋅ w = v₁⋅w₁ + v₂⋅w₂ + … + v_m⋅w_m.

Then v and w are orthogonal exactly when v ⋅ w = 0. For example (1, 1) and (1, -1) are at right angles when we draw them in Q² and indeed, 1⋅1 + 1⋅(-1) = 0.

Given a linear subspace R, one can define the orthogonal complement of R,

R^⊥ = { v | ∀ w ∈ R. v⋅w = 0 },

which contains precisely those vectors that are orthogonal to everything in R.

The two standard facts to mention here are that R ∩ R^⊥ = {0} (the only vector that is both in R and R^⊥ is the zero vector) and R + R^⊥ = V, where V is the entire vector space. It is for this reason that R^⊥ is called a complement.

So how does this all work in graphical linear algebra?

It turns out that the answer is very easy. The orthogonal complement of R is simply its colour-inverted version. Indeed, one can show that (for any R: m→0)

which is the diagrammatic way of saying R ∩ R^⊥ = 0. Since equations in graphical linear algebra are invariant under swapping of colours (and copying/adding is commutative), we get for free that

which is the diagrammatic way of saying R + R^⊥ is everything.

We might as well consider an example at this point. Consider the linear subspace of Q² containing pairs (x,y) such that x + 2y = 0. The diagram for this is

The orthogonal complement is the colour swapped diagram, which is the following:

Note that the scalar has changed direction – this is of course a consequence of how the sugar for natural numbers is defined, since, recall:

Colour swapping the above makes the 2 gate point the other way.

Let’s just do a quick sanity check. An example of a vector that satisfies x + 2y = 0 is (6,-3). The diagrammatic spec of the orthogonal can be read as the subspace that contains vectors of the form (a, 2a) for any a. So let’s take, e.g. a = -1, obtaining (-1,-2). Now (6,-3)⋅(-1,-2) = -6 + 6 = 0. This is not a proof, of course, but I did say that I’m skipping those today.

Ok, so now what does it mean to project orthogonally onto some linear subspace R?

It means finding a linear transformation (or, if you prefer, a matrix) A such that

Ax = x if x ∈ R and Ax = 0 if x ∈ R^⊥.

So, roughly speaking, we preserve everything in R and kill everything in its orthogonal complement. This is exactly how geometric projections work: for example if you want to project a 3D shape onto a plane. Much of the maths that underlies 3D graphics relies on this kind of linear algebra.

Now back to the story; let us construct the orthogonal projection diagrammatically.

First, consider the following diagram.

We can think of this as the linear relation that represents the partial function that acts like the identity, but restricts its domain to R. It is thus not total, because, for example, it is undefined on the orthogonal complement.

The true orthogonal projection, given something not in R, should send it to zero. Here’s another partial function that does exactly that.

Now it turns out that to get the entire thing–the orthogonal projection onto R–we just need to sum up the two cases:

which gives us the diagrammatic specification for the orthogonal projection. Let’s take the right hand side above and rewrite it into the following, more aesthetically more pleasing diagram. This is graphical linear algebra after all, and we care about diagram prettiness.

We will use the above as the diagrammatic master formula for calculating orthogonal projections.

Let’s do a couple of worked examples, e.g. projecting onto

{ (x,y) | x + 2y = 0 },

i.e.

which was our example from before. Recall that the orthogonal complement here is:

Substituting into the master equation gives

To get the actual matrix we can do a little diagrammatic calculation.

where we used the Spider Theorem (Episode 23) twice in the third step to simplify the diagram. We could continue with diagrammatic calculation, but at this point it’s quicker just to multiply the constituent matrices (and remember, the order of multiplication is “backwards”):

You may feel like doing a sanity check to see if this indeed does the job. That is, take a vector in { (x,y) | x + 2y = 0 }, which the matrix ought to leave untouched, and take one from the orthogonal complement, which the matrix must send to 0.

Just for fun, let’s quickly do the calculations for other projection, onto the orthogonal subspace.

Here the matrix is, therefore,

But Robin’s original question was about projecting onto the kernel of some matrix. And here graphical algebra has some additional pretty insights. I mentioned in a previous episode that the bizarro of a matrix (combining colour swap and mirror) is the transpose. Thus, we have, for an m×n matrix A

That is, the colour swap of a matrix is the same thing as its transpose, mirrored.

This can be quite useful. In particular consider the kernel of A.

We know that if we want the orthogonal complement we need to swap colours.

But, using our new insight, this is the same as

So the orthogonal complement of the kernel of A is the image of its transpose. Simple.

Dually, the orthogonal complement of the image of A

which is the same thing as

So the orthogonal complement of the image of A is the kernel of its transpose. These trivial consequences of graphical linear algebra notation are sometimes called the Fundamental Theorem of Linear Algebra.

Now we can, given A, calculate a formula for the orthogonal projection onto its image. Substituting the relevant bits into the master formula gives:

From which we can read off the somewhat mysterious formula A (A^TA)^-1 A^T that appears in many linear algebra textbooks.

For the projection onto the kernel of A we get

which doesn’t obviously simplify into a “textbook formula”, but is just as easy to use in calculations.

So, since we’ve done loads of random calculations so far, let’s do one more for the road. Let’s say we want to calculate the orthogonal projection onto the kernel of

which, as a matrix is, diagrammatically

The kernel is thus

and the orthogonal complement of the kernel is

It remains to plug all the bits into the master formula for calculating projections and simplify:

which multiplies out to give

Last week I had the pleasure of visiting the Perimeter Institute for Theoretical Physics. I gave a talk about graphical linear algebra, which was recorded and available to watch online here. I emphasised the “computer science” way of looking at string diagrams as a resource sensitive syntax. That story will eventually make its way onto the blog in full, for now you can only read the beginning.

One interesting thing about the Perimeter Institute is the front entrance.

The Greek inscription translates to

Let no one uninterested in geometry enter here.

The phrase is a tribute to the phrase that was said to adorn the entrance to Plato’s Academy:

Let no one unversed in geometry enter here

I like the small change: the Perimeter Institute, in its mission statement, emphasises pedagogy as well as research.

The Perimeter Institute is the perfect research institute. It is a research factory with the best minds in theoretical physics. Crucially and refreshingly, it was built with researchers in mind. For example: there are blackboards everywhere, the open spaces encourage interactions, the architecture is inspiring and the food in the cafeteria is amazing. The contrast with 21st century British academia was… a bit depressing, and I was sad to leave.

If somebody wants to part with a few hundred million dollars, I’d love to set one up for theoretical computer science and category theory. Get in touch, philanthropic billionaires!

Determinants and the Lindström-Gessel-Vienot Lemma

Solomon is currently a student at the University of Pennsylvania where he is pursuing a Ph.D in Computer Science. He has a keen interest in all things Computer Science and Maths, so he decided to specialize in Programming Languages, an area which applies ideas from diverse areas of Computer Science, Mathematics and Philosophy. One of his favourite things is when connections are made between seemingly disparate areas of inquiry, so he has enjoyed reading the Graphical Linear Algebra blog for the connections between string diagrams and linear algebra. As the Lindström-Gessel-Vienot Lemma shows us, there is also a surprising combinatorial connection. Such occurrences make Solomon very happy!

Consider the familiar diagram for the $2 \times 2$ matrix $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$ :

Suppose that we wanted to push some flow from the input nodes $X_1$ and $X_2$ to the output nodes $Y_1$ and $Y_2$ . Moreover, we want to do this in such a way that every output node receives some flow without any congestion in the network. More specifically, want to push flow from the input nodes to the output nodes in such a way that:

no two different paths that are part of the flow intersect at
a vertex (not even endpoints), and
for each output node $Y_j$ , there is a path from some input
node $X_i$ to $Y_j$ .

For example, in the diagram above, we can push $a$ units of flow from $X_1$ to $Y_1$ and $d$ units of flow from $X_2$ to $Y_2$ .

We can also push $c$ units of flow from $X_1$ to $Y_2$ and $b$ units of flow from $X_2$ to $Y_1$ .

We are not allowed to push $a$ units of flow from $X_1$ to $Y_1$ and $b$ units of flow from $X_2$ to $Y_1$ , since then we would have two different paths intersecting at $Y_1$ .

Also, we are not allowed to push only $a$ units of flow from $X_1$ to $Y_1$ , since then there is no path from some $X_i$ to $Y_2$ .

We call a flow that satisfies conditions (1) and (2) a viable flow. Now we will measure the signed weight a viable flow. The weight of a viable flow is simply the product of all weights on the edges that are part of the flow. To get the signed weight, we multiply the result by -1 if the flow links $X_1$ to $Y_2$ . Otherwise, we multiply the result by +1.
For example, in our $2 \times 2$ matrix example, the first viable flow has weight $+ad$ . The unsigned weight of this flow is $ad$ since we push $a$ units of flow from $X_1$ to $Y_1$ and $d$ units of flow from $X_2$ to $Y_2$ . To get the signed weight of this flow, we multiply the unsigned weight $ad$ by +1 since $X_1$ is linked to $Y_1$ . A similar computation shows that the signed weight of the second viable flow is $-bc$ .
Interestingly, when we add signed weights of all the viable flows in the diagram, we get the determinant of the matrix associated with the diagram i.e. $ad-bc$ . Coincidence?

The Lindström-Gessel-Vienot Lemma (LGVL) tells us that this in fact is not a
coincidence. Specifically, LGVL associates to each finite directed acyclic
graph (dag) $G$ a matrix whose determinant is exactly equal to the
sum of the signed weights of the viable flows in $G$ . To make this relationship precise we shall give the statement of LGVL, but first we need to introduce some definitions and notation that we will need in order to give the statement of the theorem.

First, we introduce permutations. For any sets $A=\{a_1, \ldots, a_n\}$ and $B=\{b_1, \ldots, b_n\}$ we call any bijection $\sigma : A \longrightarrow B$ a permutation from $A$ to $B$ . Now assume that $\sigma$ sends $a_1, \ldots, a_n$ to $\sigma(a_1), \ldots, \sigma(a_n)$ respectively. For example, let $A = B = \{1,2,3,4,5\}$ , and let $\sigma$ send $1,2,3,4,5$ to $3,5,2,1,4$ respectively. Write $a_1, \ldots, a_n$ in a row in that order, skip a line, then write $b_i$ below $a_i$ for each $b_i$ . Draw a line from $a_i$ to $\sigma(a_i)$ , and count the number $m$ of crossings in between the ${a_i}'$ s and the ${b_i}'$ s. Define the sign of $\sigma$ be $(-1)^m$ . For the permutation $\sigma$ we have just described, there are six crossings, so the sign of the permutation is $(-1)^6 = 1$ .

The sign of a permutation $\sigma$ can also be computed by raising -1 to the
number $inv(\sigma)$ of inversions of $\sigma$ , where $inv(\sigma)$ is the number of ordered pairs $(i, j)$ such that $i < j$ but $\sigma(i) > \sigma(j)$ . In the example above, the number of inversions of $\sigma$ is 6 = $|\{(1,3), (1,4), (2,3), (2,4), (2,5), (3,4)\}|$ , which gives the same result as before.

For the next few pieces of notation as well as the statement of LGVL, assume that $G$ is a finite directed acyclic graph. Assume further that $G$ has input nodes $X_1, \ldots, X_n$ and output nodes $Y_1, \ldots Y_n$ .

As before, we are interested in viable flows, i.e. flows that have the property that:

no two different paths that are part of the flow intersect at
a vertex (not even endpoints), and
for each output node $Y _j$ , there is a path from some input
node $X_i$ to $Y_j$ .

Each viable flow can described by an $n$ -tuple $P = (P_1, \ldots, P_n)$ of paths , where $P_i$ is a path from the input node $X_i$ to the output node $Y_{\sigma(i)}$ for some permutation $\sigma$ . Let $\sigma(P)$ be this permutation.

For each directed path $P$ between any two vertices in $G$ , let $\omega (P)$ be the product of the weights of the edges of the path. For any two vertices $X_i$ and $Y_j$ , write $e(X_i, Y_j)$ for the sum

$e(X_i,Y_j)=\sum\limits _{P:X_i\to Y_j} \omega (P),$

where $P$ ranges over all paths from $X_i$ to $Y_j$ .
We are finally in a position to give the statement of LGVL, which is that

$\left \vert {\begin{pmatrix}e(X_{1},Y_{1})&e(X_{1},Y_{2})&\cdots &e(X_{1},Y_{n})\\e(X_{2},Y_{1})&e(X_{2},Y_{2})&\cdots &e(X_{2},Y_{n})\\\vdots &\vdots &\ddots &\vdots \\e(X_{n},Y_{1})&e(X_{n},Y_{2})&\cdots &e(X_{n},Y_{n})\end{pmatrix}} \right \vert = \sum\limits _{{P = (P_{1},\ldots ,P_{n})}}{\mathrm {sign}}(\sigma (P))\prod\limits _{{i=1}}^{n}\omega (P_{i}),$

where $P$ ranges over all viable flows in $G$ !

LGVL holds for any finite directed acyclic graph, not just graphs that look like the $2 \times 2$ example. For example consider the following graph:

or equivalently, the following string diagram:

The matrix for this is

$\begin{pmatrix} ag+bh & bi & 0\\ ch & ci + dj & dk\\ 0 & ej & ek + fl \end{pmatrix}$ .

It has determinant $agciek + agcifl + agdjfe + bhdjfe$ , and it is easy to check that each of these terms indeed gives a viable flow from the ${s_i}'$ s to the ${t_i}'$ s. Note also that the sign of each term is +1 since $s_i$ is sent to $t_i$ in each of the viable flows.

Now on the one hand, LGVL gives us a way of computing the sum of the signed weights of viable flows in a dag as the determinant of a certain matrix. On the other hand, given a matrix $M$ , we may compute the determinant of $M$ by doing the following. First, we find a special dag $G$ . Next, we find a set of input vertices $\{X_i\}$ and output vertices $\{Y_j\}$ in $G$ . We check to see that $M_{ij} = e(X_i, Y_j)$ . Then, we (easily!) compute the sum of the signed weights of viable flows in $G$ from $\{X_i\}$ to $\{Y_j\}$ . Finally we conclude that the result is equal to the determinant of $M$ by LGVL. This compositional approach to computing the determinant often leads to useful and surprising results, as we shall see below.

Consider the determinant of the following matrix:

$A = \begin{pmatrix} (1,1) & (1,2) & ... & (1,n) \\ (2,1) & (2,2) & ... & (2,n) \\ \vdots & \vdots & \ddots & \vdots \\(n-1,1) & (n-1,2) & ... & (n-1,n)\\(n,1) & (n,2) & ... & (n,n)\end{pmatrix},$

where $(k, \ell)$ denotes the greatest common divisor of $k$ and $\ell$ .

It is difficult to come up with a closed formula for this determinant by using say row-reductions or the co-factor expansion. However, using the strategy involving LGVL gives a simple and surprising result.

But first, a few more definitions! Given any positive integer $k$ , let $\phi(k)$ denote the number of positive integers less than or equal to $k$ which are relatively prime to $k$ . For example, $\phi(6) = 2$ since the only numbers less than 6 that are relatively prime to 6 are 1 and 5. On the other hand $\phi(7)=6$ since all the positive integers less than 7 are relatively prime to 1. In fact, it is easy to see that $\phi(p)=p-1$ for any prime $p$ .
Also, for the purposes of the proof, we shall need to use the fact that

$\sum\limits_{d|k} \phi(d) = k,$

which is a property first established by Gauss. For example

$6 = \phi(1) +\phi(2) + \phi(3) + \phi(6) = 1 + 1 + 2 + 2$ .

Now we proceed to the actual computation. Consider the graph $G$ which has $3n$ vertices: $\{s_1, \ldots, s_n\}$ are the input vertices, $\{t_1, \ldots, t_n\}$ are the output vertices and $\{v_1, \ldots, v_n\}$ are intermediate vertices. If $d$ divides $k$ , put the edge $s_k \to v_d$ with weight $\phi(d)$ and the edge $v_d \to t_k$ with weight 1.

What is the weight $e(s_k, t_{\ell})$ of all the paths from $s_k$ to $t_{\ell}$ ? Well each path from $s_k$ to $t_{\ell}$ can have only two edges: an edge $s_k \rightarrow v_d$ of weight $\phi(d)$ and an edge $v_d \rightarrow t_{\ell}$ of weight 1, for some $v_d$ , where $d$ divides $k$ and $d$ divides $\ell$ . Of course the weight of this path is $\phi(d)$ multiplied by 1, which is just $\phi(d)$ . Therefore,

$e(s_k, t_{\ell}) = \sum\limits_{\substack{d|k \\ d|\ell}}\phi(d) = \sum\limits_{d|(k, \ell)}\phi(d) = (k, \ell)$

The last equality is justified by Gauss’ result.

By the LGVL, it follows that

$\det(A) = \sum\limits_{P = (P_1,\ldots,P_n)} \mathrm{sign}(\sigma(P)) \prod\limits_{i=1}^n \omega(P_i),$

where $P = (P_1, \ldots P_n)$ ranges over all viable flows from the input vertices $S = \{s_1, \ldots, s_n\}$ to the output vertices $T = \{t_1, \ldots, t_n\}$ .
Now the only path from $s_1$ to $T$ is $s_1 \to v_1 \to t_1$ since 1 has only one divisor, namely 1. As a result, for each prime $1 \leq p \leq n$ , any path from $s_p$ to $T$ in a viable flow must pass through $v_p$ . This is because $p$ has only two divisors, namely 1 and $p$ , but the path through 1 has already been blocked by 1.

Consequently, for any product $pq$ of two primes, any path from $s_{pq}$ to $T$ must pass through $v_{pq}$ . Repeating this argument for all the other numbers less than or equal to $n$ , we conclude that all the viable flows from $S$ to $T$ link $s_k$ to $v_k$ .

Now if there is a viable flow which links $v_k$ to $t_{\ell}$ for some $k \neq \ell$ , then there must be some $k', \ell'$ such that $v_{k'}$ is linked to $t_{\ell}'$ with $k' > \ell'$ (why?). But this means that $k'$ divides $\ell'$ , a contradiction since $k' > \ell$ ! So we must have $v_k = t_k$ , for all $k$ .

Thus, there is a unique viable flow from $S$ to $T$ i.e. the flow for which $s_k$ is sent to $t_k$ via $v_k$ . Thus, we conclude that the determinant of $A$ is $\phi(1) \cdot \ldots \cdot \phi(n)$ , since the weight of each path $P_k$ in this flow is $\phi(k)$ as we showed earlier:

$\left \vert \begin{pmatrix}(1,1) & (1,2) & ... & (1,n) \\ (2,1) & (2,2) & ... & (2,n) \\ \vdots & \vdots & \ddots & \vdots \\(n-1,1) & (n-1,2) & ... & (n-1,n)\\(n,1) & (n,2) & ... & (n,n)\end{pmatrix} \right \vert = \phi(1) \cdot \ldots \cdot \phi(n)$

1st Workshop on String Diagrams in Computation, Logic, and Physics

Aleks Kissinger and I are organising STRING 2017, a workshop for people who use, or are interested in string diagrams. The diagrams used all over this blog are an example, but string diagrams have been appearing all over the place recently: quantum computing, logic, asynchronous circuit analysis, and even linguistics. I’m trying to write a few articles about why this is happening, so if you’re interested, here’s a link to the first part.

I’m pretty excited about the workshop: first, the venue is amazing, it’s the Jericho Tavern (or more precisely, the upstairs concert venue, famous for being the place where Radiohead, while still at uni, had their (first?) gig. I’ve been to one workshop there before, in October 2014, and it was great: the atmosphere is informal, comfortable, and very conducive to discussions. At that workshop I talked about my initial ideas for graphical linear algebra, and had some really useful feedback. So I’m looking forward to going back.

Second, we have some really fantastic invited speakers: Paul-André Mèllies is coming from Paris to talk about his research on logic, which heavily features the use of string diagrams. His work on logical negation is particularly interesting. He’s the modern day Charles Sanders Peirce, who was doing logic with string diagrams (existential graphs) before Frege did it with classical syntax.

Next, Dan Ghica will give a tutorial on applications of string diagrams in circuit design and verification. His work is related but somewhat different from the graphical linear algebra kinds of diagrams, which Filippo, Fabio and I have been using for signal flow graphs. If you don’t know the details of Dan’s latest work, you might remember him from his articles about developing knot theory for eight year olds.

Finally, Bob Coecke, one of the founders of the string-diagrams-as-syntax-for-quantum-physics school of thought, will give a tutorial on the quantum applications. And possibly, perform in the evening as part of The Quantum Dagger Orchestra.

We are soliciting short abstracts, so please consider submitting something: the deadline is end of June, so there is plenty of time. Of course, you can also come along without presenting anything and simply soak up the string diagrammatic goodness. You’ll be able to register at the FSCD website when they eventually put up a registration link; I don’t remember the exact numbers but the whole thing should cost between £100 and £150, with catering included. You’ll have to pay for your own beer, of course (soft drinks are also available, I’m told). The official call for papers with all the details is below.

CALL FOR PAPERS
1st Annual Workshop on String Diagrams in
Computation, Logic, and Physics
(STRING 2017)

http://string2017.cs.ru.nl/index.html

Satellite workshop of FSCD 2017,
co-located with HDRA 2017.

Jericho Tavern, Oxford, UK
September 8-9, 2017

String diagrams are a powerful tool for reasoning about processes and composition. Originally developed as a convenient notation for the arrows of monoidal and higher categories, they are increasingly used in the formal study of digital circuits, control theory, concurrency, quantum and classical computation, natural language processes, logic and more. String diagrams combine the advantages of formal syntax with intuitive aspects: the graphical nature of terms means that they often reflect the topology of systems under consideration. Moreover, diagrammatic reasoning transforms formal arguments into dynamic, moving images, thus building domain specific intuitions, valuable both for practitioners and pedagogy.

This workshop aims to bring together researchers from diverse backgrounds and specialities to collaborate and share their insights, tools, and techniques. It will furthermore provide an informal atmosphere in a unique venue: the upstairs of the Jericho Tavern, a music venue, where famously Radiohead played their first concert. All the usual conference facilities will be provided, and the distinctive location will provide plenty of opportunities to discuss and share ideas. STRING 2017 is a satellite event of FSCD 2017 (http://www.cs.ox.ac.uk/conferences/fscd2017/) and will be co-located with the 3rd Higher-Dimensional Rewriting and Applications (http://hdra.gforge.inria.fr).

Invited Speakers

Paul-André Mèllies (CNRS and Paris Diderot) – joint HDRA and STRING invited speaker
Dan Ghica (Birmingham) – applications of string diagrams in circuit design and verification
Bob Coecke (Oxford) – applications of string diagrams in quantum foundations and quantum computation

Submitting

We warmly welcome all types of contributions, ranging from work-in-progress to original work and/or overviews of mature work published elsewhere, on topics ranging from theory of string diagrams, to applications and tool demos.

Prospective speakers are invited to submit a title and 2 page abstract via the Easychair page at https://easychair.org/conferences/?conf=string2017

We will try to build a programme that is as inclusive and wide-ranging as possible within the fairly limited time available. Hence, speakers will be invited either to give a full-length talk or give a short talk in a “lightning session” style format.

The accepted abstracts will be made available electronically before the workshop.

Important Dates

Abstract deadline: 30 June 2017
Speaker notification: 14 July 2017
Workshop: 8-9 September 2017

Program Committee

Filippo Bonchi (ENS Lyon)
Ross Duncan (Strathclyde)
Fabio Gadducci (Pisa)
Aleks Kissinger (Radboud)
Dan Marsden (Oxford)
Mehrnoosh Sadrzadeh (Queen Mary)
Pawel Sobocinski (Southampton)
David Spivak (MIT)
Noam Zeilberger (Birmingham)

Organisers

Aleks Kissinger – Radboud Universiteit
Pawel Sobocinski – University of Southampton

Publicity

Joshua Holland – University of Southampton

Why string diagrams?

Sometimes I put the URL of this blog into a search engine to see what people are saying. When I first started – having heard about the fabled Internet Hate Machine – I was a little worried that I’d have to deal with tons snarky remarks and negativity. The opposite has happened. I’ve received several touching personal “thank you” emails, and a large majority – maybe even 90% – of social media feedback has been positive and extremely polite. So, unexpectedly, it’s been a bit of an Internet Love Fest. It’s all a bit disconcerting: hey, I enjoy arguments as much as the next guy!

Of the non-positive 10% of remarks, 95% have been quite useful and constructive. They allowed me to look outside my own conceptual bubble and understand the parts of blog that people have found mysterious, unclear or even annoying. In particular, there is one recurring sentiment.

The guy has some interesting insights, but what’s that weird diagrammatic notation all about?

I realised that I spent a lot of time talking about “the rules of the game” for dealing with string diagrams, using them in proofs and explaining some technical facts about what they mean mathematically, but I never really explained the fundamental questions of why people should care about them. This made me re-evaluate the way I present this material: at the recent Midlands Graduate School I spent half of my four lectures explaining the story of string diagrams and why computer scientists (and engineers, physicists, mathematicians, economists, philosophers,…) ought to care about them.

I think that this is an important missing chapter of the story so far. It’s an interesting tale in itself, involving natural language, Universal Algebra, programming languages and higher category theory. I will make the case that string diagrams are not only important, they’re inevitable. And it’s due to something called resource-sensitivity.

What is resource-sensitivity?

It’s a cliché that the digital revolution has changed the way we understand the world. One, somewhat under appreciated aspect is the way that the concept of data has changed the way human beings perceive reality. Sequences of 0s and 1s encode all kinds of information and have spoiled us: millenia of experience of coping with finiteness (food, money, time, etc.) has been augmented with an immensely powerful resource that comes with an unprecedented ability to be copied and thrown away at will. Media companies have been struggling with this disturbing new reality since the days of Napster. Dealing with data is also a conceptual leap that programmers have to make when first learning to program; they learn to navigate this unfamiliar and exotic world, governed by a resource like few other in human experience.

Let’s start with a simple example of resource-insensitivity. The following is a very simple function, in C syntax, that takes an integer x, multiplies it by itself, and returns the result.

int square (int x) {
    return x * x;
}

The same identifier x appears twice in the body of the function (and another time, at the start, where it’s declared as the parameter). That means that the resource that the function deals with, in this case an integer, is implicitly assumed to be copyable. This is fine: an integer is represented by a sequence of 0s and 1s in memory. It’s not a problem to use it several times, copy it somewhere else or even overwrite it.

But not every resource is as permissive as data. One of the first string diagrams on this blog was the recipe for Crema di Mascarpone. The wires were typed with ingredients and there were several generators; for example a crack-egg generator that took an egg and produced a white and a yolk.

Actually, it was subsequently criticised by a colleague on my Facebook for not being truly honest, because it implicitly threw away the shell. But here’s something less controversial: I think everyone would agree that doesn’t make sense to include an egg-copy generator.

So, while copying numbers is ok, copying eggs is not. It’s an amusing idea precisely because we are intuitively aware of the concept of resource-sensitivity. Actually, once you start thinking about it, there are examples everywhere. For example consider the following, grammatically correct sentence in a natural language:

Have your cake and eat it too.

This aphorism is used in situations where you can’t have something both ways. But the literal meaning is referring to a physical object: you can’t have a physical cake, eat it and still have it, since the action of eating it destroys it. So, perhaps, a hypothetical programming lanuage for cake handling should not allow you to write the following routine:

cake handle_cake (cake c) {
    eat c;
    return c;
}

Here executing the command eat c; consumes the resource c, so there is nothing to return. It seems that type cake is quite unlike the type int of integers. We will call the integer kind of data classical: classical data can be copied and discarded, whereas non-classical data (like eggs and cakes) cannot.

The closest thing to classical data that we had before computers is referencing things or concepts. In other words, information, which can be copied (e.g. by monks) and discarded (e.g. in a bonfire). Natural languages like English evolved to express information, and so they have several classical features built-in. In particular, the way that sentences of natural language are constructed are an example of something called classical syntax. Classical syntax is a good fit for dealing with classical data like information, but as we will see, also has some annoying limitations.

What is classical syntax?

Trees are a popular way of organising classical data, and are closely related to the notion of classical syntax.

Unlike traditional trees, like the Hawaiian one in the photo above, abstract trees are usually drawn with the root at the top. For example, the following is the tree representation of the expression 2 × 3 + 4.

The usual convention that × is evaluated before + is captured by the fact that × appears deeper in the tree than +. So actually, trees are better than formulas because they eliminate the need for parentheses. For example, the tree for 2 × (3 + 4) is the following:

Expressions in natural language such as “you can’t have your cake and eat it too” can also be represented as trees. A lot of this is due to work by Noam Chomsky on the theory of context free grammars (CFGs), although some people date work on CFGs and their relationship with natural language back to ancient India. So “have your cake” looks something like this, as a syntax tree:

(Don’t trust me, I never actually studied English grammar. When I went to high school in Australia back in the 90s, teaching formal grammar was unfashionable and considered to “stifle creativity”. What a load of…)

Anyway, as you can see from the tree above, natural language grammars are complicated things. CFGs are also used to define the syntax of programming languages; fortunately, these are actually much simpler to describe than natural language grammars, which are full of exceptions and special cases. So here’s how the cake handling program from before might look like as a tree:

This is roughly the kind of tree that’s constructed by the compiler, or, more precisely, by one of its components, the parser. Here I’m being somewhat reductive: in the programming language context such trees have special features to deal with variables and identifiers, and these kinds of extended trees are referred to as abstract syntax.

The cake c is referred to twice in the body of the function and once in the declaration. But since cake is a non-classical, resource-sensitive type, then perhaps the compiler ought to ensure that the variable appears only once in the function body? So maybe our cake programming language should reject this program as a syntax error?

This doesn’t really work very well. For example, some actions might preserve the resource. For example, following ought to be ok: there’s no problem with looking at your cake and having it too.

cake handle_cake (cake c) {
    look_at c;
    return c;
}

One popular way to go about solving this problem is to build a little type theory for cakes and catch these kinds of bugs during type checking. Although this often works, is often very cute, and is quite similar to the kind of semantic analysis that goes on in our heads when trying to make sense of expressions in natural language, I think that it ignores a deeper issue: we are cramming non-classical data into classical syntax.

Other interesting repercussions that will become clear when we think about the mathematics that governs the behaviour of classical syntax. More about this next time. For now, I’ll highlight some things that I think are symptoms of the widespread disease of classical thinking about non-classical things.

Limitations of classical thinking

Apart from cakes, here are some other examples of non-classical data, in order of decreasing coolness: qubits, money and time. In fact, almost any finite resource you can think of is non-classical, and they all pose similar problems. All these examples are of things that, like eggs and unlike information, cannot be arbitrarily copied and discarded.

But even for ordinary, classical 0s and 1s kind of data, sometimes resource-sensitivity comes into play. A good example is concurrency: the idea that a computation can be split into a number of threads that are executed in parallel. The threads then maybe communicate and synchronise several times during processing in order to perform a task faster than would be possible when using a single thread.

Concurrent programming is the the task of writing programs that can do several things at the same time. It is more important and necessary then ever before, because hardware has been getting increasingly parallel in the last 15 years. Unfortunately, it is also widely feared and considered to be extremely hard to get right. Here, although the data is classical, the actual act of computation is resource-sensitive. Two or more threads accessing and modyifying the same piece of data can have weird consequences, because of the non-determinism: we can’t predict before hand which thread will get to the data first, and it quickly becomes infeasible to examine all the possibilities. Getting this wrong can sometimes have deadly consequences. I’ll return to the topic of concurrency later because it’s one of my favourite things to talk about. For now, I will tell you my own personal hypothesis: one of the main reasons why programming language technology has problems with concurrency is that ordinary, classical syntax has problems dealing with resource-sensitive applications such as concurrency. I think that string diagrams can help. (Incidentally, I recently watched Bartosz Milewski’s introduction to category theory lectures: it was extremely cool to see someone introducing category theory and talking about programming in general, and concurrency in particular! I highly recommend the videos.)

Finally, one of the main points of this blog is that resource-sensitive thinking can actually help in areas that have traditionally been understood in a resource-insensitive way. For example, doing linear algebra with string diagrams reveals all kinds of beautiful symmetries that are hidden or only half visible when expressed in traditional, classical language. This is because the copying and discarding is explicitly a part of the graphical lexicon – it is the black structure of interacting Hopf monoids. Here, although the data is classical, it is nevertheless useful to be explicit about when the copying and discarding is done: only then the full picture becomes visible.

This episode has turned into one long piece of motivational text, with not too much technical content. This will change, because there is quite a bit of nice mathematics involved. So, my plans in the next few episodes are to answer the following questions in more detail:

what are the mathematical properties of classical syntax?
how can classical syntax be made (more) resource-sensitive?
what is true resource-sensitive syntax? (spoiler: it’s string diagrams, just like the ones all over this blog!)
how to convert classical syntax to resource-sensitive syntax?
what are some examples of classical syntax, viewed through resource-sensitive glasses?
what is the history of string diagrams?
what are some future research and technological challenges to be solved?

The moral of the story will be that string diagrams are not just some weird, trendy notation. They are inevitable, once we accept that the world is full of non-classical things that need to be described properly. String diagrams are resource-sensitive syntax.

…, a monoid is a category, a category is a monad, a monad is a monoid, …

In the last few months I taught three graduate mini-courses about string diagrams and graphical linear algebra – at Phd Open in Warsaw, a winter school in Estonia, and a graduate school in Leicester. That and teaching have been keeping me busy, so I haven’t invested as much time as I would have liked on the blog. There was also the need to write down all the work done in Hawaii, and Dusko, Filippo and I just finished writing a meaty technical report, which will soon appear on the arXiv. The time that I actually did spent writing blog stuff was distributed across a number of different articles; so now I have a whole bunch of half-finished stuff. I really need to start managing this whole project a little better!

Here’s something really short and off the cuff. At MGS in Leicester, during the exercise sessions, I talked about distributive laws of props. At the same graduate school, Alexander Kurz was giving a course on category theory. I got the inspiration for this article from one of his lectures.

As part of the distributive laws of props story, you realise that categories are a special case of the concept of monad. But monads are a special case of the concept of monoid. And monoids–of course–are a special case of the concept of category. By transitivity of the “special case of” relation, we should therefore probably just admit that they are all really the same thing: some kind of Platonic form of plugging things into each other. The story is full of pleasant categorical mental gymnastics. Think of it as the categorical equivalent of a 5km fun run.

… , a monoid is a category, …

A category is a collection of objects and arrows. Every object has an identity arrow, and any pair arrows f: A → B, g: B → C, where f‘s codomain is the same as g‘s domain, can be composed. Composition is associative, and composing with identities does nothing.

If we restrict to categories with exactly one object – let’s give it a name, say * – there is just one identity arrow (the identity id_* on *). Every arrow has * as both domain and codomain, since there are no other objects around. So every arrow can be composed with every other arrow. Composition is associative, and composing with id_* on either side leaves any arrow unchanged. In other words, there is a set of arrows Ar with a total, associative, unital operation: i.e. a monoid. The fact that there is an object * adds no interesting information.

So every monoid is a kind of category, a category with one object. Categories, it would seem, are the more general concept. Indeed, some people say that categories are the “categorification” of the concept of monoid. Maybe categories should actually have been called monoidoids?

… , a category is a monad, …

One way to define the concept of monad is as a triple (T, μ, η), where T: C → C is an endofunctor, μ: T² → T is the multiplication natural transformation, and η: 1 → C is the unit natural transformation. Multiplication μ is associative and has η as unit, which is usually said with the aid of three commutative diagrams.

In the famous paper 1970s paper “The formal theory of monads“, Ross Street realised that this popular definition is unnecessarily specific. There’s no reason to be overly specific in mathematics: if you can easily have a more general theory/theorems, then why not? What matters is getting to the core of the issues, and not getting too hung up on extraneous details.

In this specific case, an endofunctor is a 1-cell in the 2-category Cat of categories, functors, and natural transformations. The data defining a monad is thus one 1-cell and two 2-cells, subject to some commutative diagrams of 2-cells. The fact that this data lives in Cat brings nothing important to the story: the definition clearly makes sense in any 2-category.

The 2-category Span(Set) is one of my favourites. It’s actually a bicategory–composition is not strictly associative, but we will ignore that detail for the purposes of this lightning article. It’s really just a bit of bureaucracy and adds nothing to the story. Trust me.

The objects of Span(Set) are sets. The arrows from X to Y are spans of functions

X ← Z → Y

Identities are spans formed of two identity functions. Composition of spans involves taking pullbacks. Finally, the 2-cells are span morphisms; functions Z→Z’ that make the two resulting triangles commute. I won’t bore you with the details which you can find on nlab.

The interesting question is what happens when we consider a monad in Span(Set). An endo-1-cell is span of the form

X ← Y → X (†).

Let’s call the left function δ₀ and the right δ₁. Now we need to give two 2-cells, that is, morphisms of spans. Let’s do the unit first: when you unpack the definitions it is a function

ι: X → Y such that δ₀ιx = δ₁ιx = x.

The multiplication μ goes from the span composed with itself back to itself. When you compute the composition, taking the pullback, the span is

X ← Y×_XY → X

where Y×_XY = { (y,y’) | δ₁(y) = δ₀(y’) }. So μ: Y×_XY → Y, satisfying δ₀μ(y, y’) = δ₀y and δ₁μ(y, y’) = δ₁y’.

So far so good. Associativity says simply that μ(μ(y, y’), y”) = μ(y, μ(y’, y”)) and the unit conditions say that μ(y, ιδ₁ y) = y = μ( ιδ₀y, y).

All this data has another name: a category. Let’s make it clearer with more familiar terminology: X is the set of objects, Y is the set of arrows. δ₀ and δ₁ give, respectively, the domain and codomain of each arrow. The function ι: X→Y picks out the identity arrows. Next, Y×_XY is the set of composable arrows, and μ: Y×_XY → Y defines the composition for any composable pair. The remaining conditions say exactly that composition is associative and identities work as they should. It’s precisely the data we need.

So, a category is just a special case of the more general concept of monad, right?

… , a monad is a monoid, …

Not so fast. We talked about monads in the generality of 2-categories. The theory of monoids can be easily generalised too: there’s no reason for the specificity of “set” M, with unit “element” e and multiplication “function” m, satisfying associativity and the unit equations. The world would be boring if everything was just sets and functions. In fact, the concept of monoid makes sense in any monoidal category. Pick some monoidal category X; the data of a monoid consists of

an object X, and arrows m: X⊗X → X and e: I → X

where I is the monoidal unit object. Associativity and the unit laws can be expressed as three commutative diagrams. The usual monoid-as-a-set version is now a monoid, in the more general sense, instantiated in the monoidal category of sets and functions Set_×. Here the cartesian product serves as monoidal product.

So now let’s consider some 2-category C, pick an object C ∈ C and focus on the set of 1-cells from C to C. Let’s call this thing [C, C]. Actually, it’s not really a set: it’s the full sub-2-category of C on C; i.e. a 2-category with one object. We saw before that to give a category with one object is to give a monoid. In the same spirit, a 2-category with one object has the same information as a (strict) monoidal category. The objects are the 1-cells, and monoidal product (on 1-cells) is composition. I’ll leave the remaining details as a simple exercise.

So what is a monoid in [C, C], considered as a monoidal category? You guessed it – it’s a monad.

So monads are really special kinds of monoids. And monoids are really special kinds of categories. And categories are really special kinds of monads. And …

Aside

Leicester and the battle for universities

This is an off-topic post, but it concerns something very important. More graphical linear algebra is coming next week.

The short story is this: the University of Leicester is going through a top-down restructuring. The management have decided to force a large number of academics to reapply for their jobs. Of course, following this charade, there will be fewer jobs than academics, so the whole thing is a depraved game of musical chairs. In particular, in the excellent mathematics department, around six academics will be made redundant.

It’s very important that we come together and fight, so please sign this petition. In the grand scheme of things, the very concept of the university as an institution is under threat, and this is just the latest battle. Leicester is a reputable British university, but other countries are increasingly heading in the same direction: turning world-leading universities into failing businesses.

If you’re interested in my views on this topic, see the rant below.

Over the last twenty years, British universities have slowly transitioned away from being institutions that deliver learning and research as a public good. Universities were once seen as advancing human knowledge, and this was considered to be a positive contribution, a raison d’etre in its own right. Higher education meant that people could pursue their interests, fulfil their calling, have access to state-of-the-art research and knowledge in the field of their choosing, and perhaps even contribute to its further development.

At some point in the 1980s–coinciding with Thatcher in the UK and Reagan in the US–universities were infected by neoliberal thinking. Abstracts concepts like “advancing human knowledge” were considered suspicious and retrograde; it’s difficult to assign a monetary value to something abstract. Instead, universities were now seen as a engines of future economic growth. This led to investment in fields deemed as economically beneficial, e.g. applied STEM, law, economics, and savage cuts to fields deemed as less useful to the economy, typically meaning the arts, “less marketable” social science, fundamental scientific research, pure mathematics, and so on.

Stefan Collini, writing in the LRB, put it succinctly:

… gradually, what we still call universities are coming to be reshaped as centres of applied expertise and vocational training that are subordinate to a society’s ‘economic strategy’.

Moreover, universities themselves are increasingly being seen as businesses in their own right. The income comes from ever growing student fees, some government support for research and grant income. Academic career progression is now, in reality, often based on how much cash an academic generates. Non-performing (read, non profit-generating) academics are often bullied and threatened, sometimes with tragic consequences.

At the same time, management depressingly talks about “vision”, “strategy”, “stakeholders”, etc. Their emails are often peppered with meaningless middle-management business speak (think David Brent, or if you’re in the US, Michael Scott). Many British universities are expanding and investing millions to build campuses around the world. My own university has built a campus in Malaysia and the jury is still out on whether this was a sound investment. In Sold Out, another excellent LRB article by Stefan Collini, we find an indicative anecdote:

In June 2012 the University of East London in Cyprus promised to offer ‘high quality British degree programmes in one of Europe’s most popular study destinations’ at a ‘stunning new campus’, but in April 2013 it was announced that after recruiting just 17 students UEL Cyprus would be closed. A spokesman for the university, the Times Higher Education reported, ‘would not disclose how much money the university will lose’.

Senior management, of course, keep giving themselves enormous pay rises and bonuses, regardless of performance: this is justified because they think themselves the equivalent of CEOs of large companies, deserving to be paid at “market value”. For example, the vice-chancellor of the University of Leicester takes home a salary of more that £300k a year. This makes the currently proposed redundancies even more perverse.

Even if you’re not based in the UK, if you care about universities, you should still get involved. Here’s Collini again:

Other countries are looking on with a mixture of regret and apprehension: regret because the university system in this country has been widely admired for so long, apprehension because they fear similar policies may soon be coming their way. In many parts of the world English higher education is, to change the metaphor, seen less as a useful pilot experiment and more as the canary in the mine.

So please, go ahead and sign the petition. As Collini writes in the last paragraph of his essay, we must stop first-rate British universities from turning into third-rate companies.

31. Fibonacci and sustainable rabbit farming

Chapter 12 of Fibonacci’s Liber Abaci is full of compelling problems. One is called “On two men with fish and the customs agent”. Another is a riddle about a man who visits a pleasure garden – accessible through seven guarded doors – where it’s possible to collect apples. You can enter freely, but on your way out you must pay each of the seven guards a tax: half of the apples you’re carrying, plus one additional apple.

Artistic impression of Fibonacci’s pleasure garden. The tax at this door is ¹⁰/₂+ 1 = 6 apples.

So, Fibonacci asks, how may apples does the man need to collect in the pleasure garden so that he gets to take one apple home? (spoiler: 382).

Given Fibonacci’s trading roots, his obsession with tax and customs duties is understandable. Without doubt, however, the most famous part of the chapter is not about taxes. It’s a very short exposition entitled “How many pairs of rabbits are created by one pair in one year”, and it starts as follows:

A certain man had one pair of rabbits together in a certain enclosed place, and one wishes to know how many are created from the pair in one year when it is the nature of them in a single month to bear another pair, and in the second month those born to bear also.

Fibonacci doesn’t tell us the name of the certain man. It will become boring to refer to him repeatedly as “the certain man”, so from now on we will change his gender to female and refer to her as Giusy (pronounced Juicy), short for Giuseppa the Rabbit Breeder.

The following helpful table is included in the margin, with the number of rabbit pairs for each month

	beginning	1
	first	2
	second	3
	third	5
	fourth	8
	fifth	13
	sixth	21
	seventh	34
	eighth	55
	ninth	89
	tenth	144
	eleventh	233
	twelfth	377

and the procedure for calculating the successive entries is explained as follows

You can indeed see in the margin how we operated, namely that we added the first number to the second, namely the 1 to the 2, and the second to the third, and the third to the fourth, and the fourth to the fifth, and thus one after another until we added the tenth to the eleventh, namely the 144 to the 233, and we had the above written sum of rabbits, namely 377, and thus you can in order find it for an unending number of months.

This sequence of numbers in the table

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, … ①

is now known as the Fibonacci numbers. As Fibonacci explains, starting with F₀= 1 and F₁= 2, we obtain successive entries by recursively calculating

F_n+2 = F_n+1 + F_n.

This famous series—and the related golden ratio–are the absolute bread and butter of popular mathematics. Many facts, factoids and myths involving pineapples, pine cones, aesthetically pleasing rectangles, renaissance paintings etc. have been assigned to them. They spice up the otherwise vapid pages of Dan Brown novels. As well as books, there are hundreds of videos on YouTube. While many are terrible, there are some gems: for example, this lecture by Keith Devlin is very good since he manages to be engaging, while at the same time challenging some of the most egregious numerological pseudo-science.

In modern presentations, Fibonacci’s famous sequence is usually given as

0, 1, 1, 2, 3, 5, 8, 13, 21, … ②

1, 1, 2, 3, 5, 8, 13, 21, 34, … ③

We will stick with Fibonacci’s original version ①. There’s a good reason for this: Giusy’s first couple of rabbits start breeding already in the first month. So if the first rabbit couple was delivered one month late, Giusy would observe the following series

0, 1, 2, 3, 5, 8, 13, 21, … ④

Fibonacci’s ① starts with the third number of ② and the second of ③. But it wouldn’t really work to simply shift ② or ③ to the left: differently from ④, both ② and ③ contain the sequence 1, 1, which Giusy would never observe.

We can obtain ① by adding ② and ③. Addition works pointwise, scanning vertically, 0+1=1, 1+1=2, 1+2=3 etc, as illustrated below. As we will see, this is the way of defining addition on number sequences. We will also define multiplication on sequences, but it won’t work anything like this.

The story of graphical linear algebra so far has taken us from zero to the natural numbers, to integers and then fractions. For the next few episodes we will shift our focus of from numbers to sequences. My aim for the rest of this episode is to whet your appetite by giving you a taste of the maths that we will use. For now, it will be a high-level overview, we will not go over everything in detail.

Eventually we will discuss some applications in control theory and I will do my best to explain when we come to that, although I’m not an expert. In this episode, though, I’ll talk about computer science motivations; some of the things that I found the most interesting while playing around in this particular neck of the graphical linear algebra woods.

We start by generalising Fibonacci’s rabbit problem.

Giusy originally got a pair of rabbits. No external rabbits get added, nor any taken away at any point afterwards. So we can think of this as the following rabbit input sequence

1, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

with resulting rabbit output sequence

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

Similarly, as we have already discussed, adding the first pair one month late would mean that the shifted rabbit input sequence

0, 1, 0, 0, 0, 0, 0, 0, 0, 0, …

results in the shifted output sequence

0, 1, 2, 3, 5, 8, 13, 21, 34, 55, …

But there’s no reason to add just one pair. For example, what if Giusy started with two pairs of rabbits? And, moreover, let’s suppose that she goes to the rabbit market every second month to buy another pair of rabbits. This rabbit input sequence would then be

2, 0, 1, 0, 1, 0, 1, 0, 1, 0, … ⑤

Following the Fibonacci principle of rabbit reproduction, we would get output sequence

2, 4, 7, 12, 20, 33, 54, 88, 143, 232, … ⑥

The reasoning is as follows: the input sequence 2, 0, 0, 0, 0, … produces the result 2, 4, 6, 10, 16, … obtained by doubling each entry of ①. Twice the rabbits, twice the breeding. Next, 0, 0, 1, 0, 0, 0, 0, … produces 0, 0, 1, 2, 3, 5, …; that is, a shifted ①. Similarly for 0, 0, 0, 0, 1, 0, 0, 0, 0, … and so forth. Adding up all of these input sequences results in ⑤, so adding up all of the result sequences results in ⑥.

We can use the same kind of reasoning to figure out what would happen if Giusy started an ambitious commercial rabbit breeding farm. Say her business plan started with investing in three pairs, then selling one pair in the first month, two in the second, three in the third, and so forth. In this case, the rabbit input sequence would be

3, -1, -2, -3, -4, -5, -6, -7, -8, -9, …

and the result is, unfortunately

3, 5, 5, 5, 3, -1, -9, -23, -47, -87, …

so it looks like Giusy’s rabbit business would collapse sometime in the fifth month: negative rabbits doesn’t sound good. To be a successful rabbit entrepreneur, Giusy needs to understand the principles of sustainable rabbit farming, which we will discover at the end of this episode.

What’s going on here, mathematically speaking?

It looks like we are defining a function that takes number sequences to number sequences. By introducing and taking away rabbits we are causing the total numbers to change. We could try to write down a formula to calculate subsequent entries in the output sequence based on previously outputted values and the current input value. But formulas are boring.

Moreover, I hope that I’ve managed to make you at least a little bit uncomfortable about the pseudo-scientific language of causality. In graphical linear algebra, relations take centre stage. Of course, it is useful to know when a relation describes a function – as we will see below – but functions are not part of our basic setup. Instead, we think of the Fibonacci rules of rabbit breeding as defining a relation between number sequences. For example, it relates

1, 0, 0, 0, 0, 0, 0, 0, 0, 0, … with 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

2, 0, 1, 0, 1, 0, 1, 0, 1, 0, … with 2, 4, 7, 12, 20, 33, 54, 88, 143, 232, …

3, -1, -2, -3, -4, -5, -6, -7, -8, -9, … with 3, 5, 5, 5, 3, -1, -9, -23, -47, -87, …

and so forth.

This relation is a kind of “ratio” between sequences. We have already seen how fractions in graphical linear algebra work and that they can be thought of as ratios. A little reminder example: the fraction ³/₄ is drawn

and, seen as a linear relation, contains (4, 3), (8, 6), (-²/₃, –¹/₂) and so on; all those ordered pairs that satisfy the ratio “three times the first number is four times the second”.

Next, a sneak preview of how we will use graphical linear algebra to capture the Fibonacci sequence ratio with diagrams.

The Fibonacci number sequence can be expressed nicely by something called a generating function. We will discover some of the underlying mathematics in the future. If you can’t wait, the classic generatingfunctionology by Herbert S. Wilf is free to download. Wilf’s Chapter 1 starts with the following, well-known simile:

A generating function is a clothesline on which we hang up a sequence of numbers for display.

The kinds of generating functions that we will use will be fractions of polynomials. So, as we will see, the generating function for ① is

the one for ② is

and the one for ③ is

If you haven’t seen this kind of thing before then this will all look very mysterious. Don’t worry, with graphical linear algebra everything will become crystal clear. At this point, just trust me: there’s a tight relationship between these fractions and the sequences that we saw previously.

I can’t resist making one remark though: look at the three fractions above. Since all of the denominators are the same, following the usual rules of addition of fractions, the first is just the addition of the second and the third, just like ① is ②+③ as sequences. Indeed, an extremely interesting and useful property of generating functions is that their algebra as polynomial fractions is compatible with the algebra of the sequences they represent.

Generating functions go back to the 18th century and Euler. I’ve always thought of them as a kind of “prehistoric” computer science, 2oo years before Turing. Indeed, computer scientists are often faced with coming up with finite specifications (e.g. programming languages, automata, regular expressions, Turing machines, …) for infinite behaviour. Generating functions are a beautiful example of this phenomenon: they are a compact, finite syntax for infinite sequences.

It’s time to start drawing diagrams. The following is the diagram for (1+x)/(1-x-x²), the generating function for Fibonacci’s ①.

You should recognise most of the components of this diagram, apart from all the white x‘s on black backgrounds. They are new generators, and they stand for the indeterminate x of a polynomial. Adding them and their associated equations to our graphical language will give us the ability to write down any polynomial fraction whatsoever. We will go through this properly in the next episode.

Now for a preview of the cool stuff. We already noticed that the Fibonacci sequence ratio can be seen as a function from input rabbit sequences to output rabbit sequences. It turns out that this can be observed just by looking at the diagram. In fact, we can use diagrammatic reasoning to rewire the above into the following

which can be understood as a signal flow graph, a concept to which we will devote quite a bit of time in the future. Here the left boundary can be thought of as input and the right as an output. The denominator of the polynomial has been transformed into a fancy feedback loop.

We can now this latest diagram and turn it into some code. There’s no intelligence involved, a simple procedure takes a signal flow diagram and produces code. Here’s the result in OCaml. We will go through the procedure in due course.

(* "forward" Fibonacci *)
let ffib x = 
    let rec ffibaux r1 r2 r3 x = 
         match x with 
              | [] -> []
              | xh :: xt -> 
                  xh + r1 + r2 + r3 :: 
                      ffibaux xh r3 (xh + r1 + r2 + r3) xt
      in ffibaux 0 0 0 x;;

If you don’t know OCaml, don’t worry, you can still cook along: open this in a fresh browser window. It’s an interactive OCaml shell running right in your browser. Copy and paste the above code into the prompt.

If everything worked as expected, you ought to get the following

val ffib : int list -> int list = <fun>

which says that ffib is a function that takes a list of integers and returns a list of integers. OCaml is nice like that, it figures out the type for you. Now paste the following

# ffib [1; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0];;
- : int list = [1; 2; 3; 5; 8; 13; 21; 34; 55; 89; 144; 233; 377]

which is the list of the first thirteen Fibonacci numbers, as in Liber Abaci. Say Giusy throws in one additional rabbit pair each month, then the numbers will be

# ffib [1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1; 1];;
- : int list = [1; 3; 6; 11; 19; 32; 53; 87; 142; 231; 375; 608; 985]

You get the idea; feel free to try other inputs.

Finally, we return to Giusy’s dream of becoming a successful, sustainable rabbit breeder. She’s in luck because the original diagram can be rewired in another way, again looking like a signal flow graph but this time where the input is on the right and the output in on the left. Now it’s the numerator that turns into a feedback loop.

But it’s still morally the same diagram, the same ratio, relating the same sequences. Just that now we understand it as having its input on the right and its output on the left. Nothing wrong with that: graphical linear algebra is good with symmetries. It’s also why dividing by zero wasn’t a problem.

We can also turn the new diagram into code.

(* "backward" Fibonacci *)
let rfib x = 
    let rec rfibaux r1 r2 r3 x =
        match x with
           | [] -> []
           | xh :: xt -> 
                xh - r1 + r2 + r3 :: 
                    rfibaux (xh - r1 + r2 + r3) r3 (-xh) xt
    in rfibaux 0 0 0 x;;

and, as a function, this now computes the inverse to ffib. So, for example:

# rfib [1; 2; 3; 5; 8; 13; 21; 34; 55; 89; 144; 233; 377];;
- : int list = [1; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0]

Now, let’s say that Giusy has space for 5 rabbit pairs and wants to sell off any overflow. What should her business plan look like?

# rfib [5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5;];;
- : int list = [5; -5; 0; -5; 0; -5; 0; -5; 0; -5; 0]

And we see that she ought to start with 5 pairs, sell off 5 in the first month, and every second month after that. Giusy’s happy, with both bankruptcy and rabbit plagues avoided. Graphical linear algebra saves the day again.