| Development |
When you think of programming, you might not immediately think of mathematics. In the day-to-day practice of writing software it’s often hard to see much theory behind REST requests and database schema migrations.
There is, however, a rich world of theory that applies to the work we do, from basic data structures to architectural patterns. This is the field of category theory, sometimes described as “the mathematics of mathematics”. Category theory is concerned with structure and the ways of converting between structures.
NOTE: This article is based on an Axosoft Dev Talk I recently gave, also titled Practical Category Theory: Monoids. Watch that video or keep reading!
Hopefully this sounds like the kind of thing you’re familiar with as a programmer, and if it doesn’t, hopefully it just sounds like a good ol’ time. After all, category theory doesn’t care about specifics; it’s abstract, which helps it be applicable to a great many fields.
Software therefore needs to apply the concepts of category theory to match the realities of programs that execute on real computers. You may have heard of some of these, especially if you’ve looked into statically typed functional programming languages such as Haskell, OCaml, F#, or Scala.
In talking about Monoids, we actually need to talk about two structures: the Semigroup and the Monoid. A Monoid is a superset of a Semigroup, so let’s start there. The Semigroup is a simple structure that has to do with combining. In the following example, I’ve combined two strings with an operator such as +:
helloWorld = “Hello” + “world”
And in this example, I’ve created a concatenating list:
parts = concat([“Parsley”, “Sage”, “Rosemary”], [“Thyme”])
Both these examples have some similar properties. They take two values of a certain shape and combine them into a single value of the same shape. We don’t take two strings and combine them to get a number or an array; it’s quite reasonable to expect the result remain a string.
Let’s start with just this single property, that there is some “append” operation that takes in two values of some type and gives back a combined result of that same type. In various languages this might look like:
C#
T Append(T first, T second)
Javascript with Flow
append(first : T, second : T) : T
Haskell, Elm, PureScript
append :: t -> t -> t
This may not seem like a particularly novel thing to do, after all, you probably do this sort of operation all the time on various kinds of data. Remember though, that we’re talking about a concept from category theory, so while the above type signatures are the end of the story in terms of what that compiler can enforce, we are obligated to obey a few more rules.
These rules are “things that must remain true,” and in math we call these laws. You’re already familiar with tons of these laws even if you haven’t heard them referred to as such. One example is with addition. When adding two numbers you can freely swap the order of the arguments, so 1 + 2 is the same as 2 + 1. This is the commutative property, and the law here is that swapping the order doesn’t affect the resulting value.
For Semigroups, we must obey the associative property, which has the law that the order in which you combine things does not matter. That is
(“hello” + “ “) + “world”
is the same as
“hello” + (“ “ + “world”)
As you can see this doesn’t mean we can swap the order, just that we can do our combine operation on any two adjacent elements, and when we’re all done we’ll end up with the same result.
The append operation is whatever you want it to be (as long as it obeys the rules)
Consider CSS classes. They’re kind of like a string, except when you combine them you don’t just shove them together; it’s more like we’re doing a join operation using a comma and space as the separator:
“app” + “button” + “selected”
should turn out to be
“app, button, selected”
CSS classes aren’t the only type where this is true; if you think in terms of Semigroups, there are lots of cases where fundamentally we’re combining two things of the same type, and that type has a unique rule around how to do the combination.
That’s actually all there is to Semigroup, an append operation that is associative. If you’re feeling underwhelmed, don’t feel bad! I was hardly excited when I first encountered the idea. Let’s dig a bit deeper to see how this unassuming structure punches quite a bit above its weight class.
For the purposes of grounding our discussion, let’s pick a few types that satisfy the rules of Semigroup. List is a straightforward data structure that every mainstream language supports (even if it’s named something else), so we’ll talk in terms of Lists moving forward.
Depending on the language you’re using, you might be able to express Semigroup as an interface or protocol, and you might be able to implement that interface/protocol on your language’s List type. If not, don’t fret! These ideas are still useful and applicable even if your language doesn’t give you a way to represent it in your type system (or if you’re using a dynamic language and don’t have a static type system). I’ll use square brackets to represent a list and the ++ operator for my List’s “append” operation.
bandMembers = [“Stevie”, “Lindsay”] ++ [“Christine”, “John”, “Mick”]
The second type we’ll consider is the String. For our discussion, we’ll use the same ++ operator to concatenate Strings.
greeting = “Hello” ++ “ “ ++ “World”
Interesting attribute #1: Semigroups are partition friendly
Consider the following SQL related code:
databaseQuery = selection ++ table ++ constraints
At the end of the day, we want to have a nice query we can use to get a result from the database. If our top level value is a String that makes up our query, then the associativity of Semigroups means that we have a lot of flexibility in building this up.
One example of this might be in building up the selection. We want to end up with something like:
select * from
or if we want specific columns:
select a, b from
It seems reasonable to build up this String from some fixed parts, the “select” and “from” mixed with some customizable parts, the “*” or “a, b”.
selection = “select “ ++ ”a, b” ++ “ from “
The resulting String will be combined with the other elements of the query, but it doesn’t matter when we decide to combine the elements, we could defer the column name conversion to a String by leaving it as a List.
selectionParts = [“select ”, [“a”, “b”], “ from”]
When we’re ready for a selection String, we could combine the column names first, then combine the resulting String with the other two parts. This same flexibility in when to go from a more structured form into a final string, is present in other parts of the query as well.
table = “myTableName” constraintParts = [“where “, constraint1, “ AND “, constraint2] constraint1 = [columnName, “ IS “, value] constraint2 = [columnName2, “ NOT “, value2]
As developers, we can decide when it’s best to go from the representation of values in Lists to a final String form—either for a part or for the whole of our query. We can defer this decision through the layers of functions used to build up the query, and we know we can go from parts to whole for any section of the query—independent of any other part, and nothing will change.
We originally had:
databaseQuery = selection ++ table ++ constraints
Let’s see what the query might look like if we had all of the bits inlined:
databaseQuery = “select “ ++ “a, b” ++ “ from “ ++ “myTableName” ++ “ where “ ++ columnName ++ “ IS “ ++ value ++ “ AND “ ++ columnName2 ++ “ NOT “ ++ value2
If we were to recreate the order of operations in the original version, it would require us to group our values into 3 parts:
databaseQuery = (“select “ ++ “a, b” ++ “ from “) ++ (“myTableName”) ++ (“ where “ ++ columnName ++ “ IS “ ++ value ++ “ AND “ ++ columnName2 ++ “ NOT “ ++ value2)
But because of associativity, these are exactly the same. We can group the individual combinations however we want, which is effectively what we would be doing along the way by collapsing the various query parts into selection, table, and constraints.
This might seem obvious (after all we’re dealing with Strings), but the ability to arbitrarily partition your data and combine it, is useful for more exotic types as well. Imagine you have a type Log that represents a single logging event and is a Semigroup. You might have files filled with Logs spread across files that are rotated, grouped by date or time range, or some other bucketing criteria. Being a Semigroup means you can combine adjacent Logs into an aggregate Log in any order you want. You could split up the work of combining Logs across multiple threads and know that it is still 100% safe to combine the individual thread’s results into a final result.
Interesting attribute #2: Semigroups are “incremental combination” friendly
Carrying on with the idea of Logs, consider a remote logging service that accepts Logs from your various servers and aggregates them down into something you can do reporting on. If we still have a Log type that is a Semigroup, we have a lot of freedom in how we proceed. For example, when a Log is generated by one of your servers, it could send it immediately, or combine some number of Logs before sending them:
Remote Logging Server | Worker Server | |
---|---|---|
Log | <-worker sends Log immediately– | Log |
Log ++ Log | ||
Log |
Remote Logging Server | Worker Server | Local Batching | |
---|---|---|---|
Log | Log | –> | |
Log | Log | ||
Log | Log | –> | Log |
Log | Log ++ Log | ||
…repeat until n logs or time limit reached… | |||
Log | <-worker sends batched Log– | Log |
On the receiving end, you could accept the Logs and combine them with the main log as they arrive, or you could batch them:
Remote Logging Server | Worker Servers | ||
---|---|---|---|
Log | <– | Log | |
Log ++ Log |
Remote Logging Server | Local Batching | Worker Servers | ||
---|---|---|---|---|
Log | <– | Log | ||
Log | Log | |||
Log | Log | <– | Log | |
Log | Log ++ Log | |||
…repeat until n Logs or time limit reached… | ||||
Log | <– | Log | ||
Log ++ Log |
Ok, that’s a lot of things we can do with Semigroups, so where do Monoids fit in? Thankfully, Monoids are, by comparison, very simple. They are everything that a Semigroup is, plus an “empty” value of the type.
For Strings, this would be an empty string; for Lists it would be an empty List; for Logs… well, that’s slightly less clear. If you can define a concept of an empty value for your type though, then congratulations, your type is a Monoid. The types signatures for this sort of thing look like this:
C#
T Empty
Javascript with Flow
empty : T
Haskell, PureScript, Elm
empty :: t
You guessed it! There are laws that go along with Monoids, and this time there are two: the left and right identity. These state that combining the empty element with any value shouldn’t change the meaning of the value. That is, the empty element is an identity value for that type. Looking at the String we see that
“” ++ “some string”
is the same as
“some string”
This works on either the left or right side (hence the left/right identity):
“some string” ++ “”
is the same as
“some string”
This comes in handy when you want to append values but you don’t want to distinguish between having zero and having one already. Consider the case of the remote logging service:
Remote Logging Server | Worker Server | |
---|---|---|
Log | <-worker sends Log immediately– | Log |
Log ++ Log |
Where did that Log that the Remote Logging Service starts with come from? If Log is a Semigroup then we’d need to generate it somehow. However, if Log is a Monoid, we have our answer: the initial Log value on the Remote Logging Server is the “empty” value of Log. Since combining the empty value with a Log doesn’t change the Log, this is safe to assume, and now we can deal exclusively in terms of Log ++ Log operations. Less special cases means happier developers™.
There are lots more fun use cases for Monoids. One example is taking a Monoidal type in a list and running it through a fold/reduce/Aggregate operation, where the fold function is appended and the initial value is empty. You could write a specialized fold that only works for Lists of Monoids but can aggregate them with no other information needed.
Go forth and aggregate!
For something so simple as having an append operation, an empty value, and the associative property, I hope you’ll agree there’s a lot of depth to Semigroups and Monoids. This is just the tip of the categorical iceberg though. Due to its abstract nature, category theory can describe a variety of structures, and best yet, be supported by lots of formal reasoning (aka proofs).
This formal reasoning can give us a lot of confidence that what we’re doing really will work as long as we’re implementing our types in accordance with the laws of the structure. If you’re looking for next steps, Functor is a mild step up in complexity, but it’s equally as rich in terms of applications and reach.