About
I’m a first-year applied category theory PhD student in the Mathematically Structured Programming Group at the University of Strathclyde, Glasgow, United Kingdom. My supervisor is Neil Ghani.
I’m interested in studying nature’s machinery using the most universal language possible - which seems to be the language of category theory. My main interests lie in understanding the theoretical and practical considerations of computation performed by systems we classify as intelligent. I’m a strong advocate of a compositional, rigorous, category-theoretic view of the world.
I have written my master thesis on a related topic - categorification of cycle-consistent generative adversarial neural networks (CycleGAN) - while I was studying at Faculty of Electrical Engineering and Computing (FER) in Zagreb, Croatia.
Research philosophy and goals
I have a long-term research philosophy which I try to keep updated as I learn more about the world. I am also learning how to best express this philosophy. As such, these are still rough notes that I keep coming back to in order to reflect on and improve them. Please do challenge anything that doesn’t come across as clear and compelling. I’m also trying to scale up these endeavours, so if your goals overlap, feel free to get in touch!
One of my main long-term goals is understanding the fundamental principles of intelligence, in the same way we understand the laws of physics. Understanding the principles being intelligent behavior largely entails understanding what notions such as life and autopoiesis are, and vice versa: understanding the essence of life and autopoiesis largely entails understanding intelligent behaviour.
Neither of these terms - intelligence, life, or autopoiesis - are actually well-defined. Wikipedia is a good starting point to get acquainted with the lack of consensus for the definition of intelligence and life. This means that a mere formulation of a sensible definition of these terms is a milestone in itself. As this is one of the most profound questions we can ask ourselves, I do not expect to complete this ambitious goal, only to make progress towards it. Even more, for reasons unknown to me yet, the question itself might be nonsensical or ill-phrased.
Nevertheless, there still exist plenty of subtle clues that tell us how the general shape of the definition, if it exists in any sensible form, might look like. I will describe these clues below. I emphasize the following is my view of the matter at hand and not necessarily something there exists a general consensus on.
There are 5 main clues I have identified so far.
- Independence of implementation. Just as a wave is a process unfolding in some underlying substrate, so intelligence seems to be a process, rather than a substance. We can analyze properties of a wave (its wavelength, speed, or frequency) without knowing the underlying substrate it is travelling through. Likewise, we can analyze properties of intelligent behavior without knowing if it’s implemented in a biological tissue or a silicon chip. Intelligence seems to be a specific sort of computation that’s substrate agnostic. This alone is an astonishing idea, because it allows us to study intelligence in its own right, separate from its implementation:
“If it is whiteness we want to think about, we must somehow separate it from white horse, white house, white hose, and all the other white things that it invariably must come along with, in order for us to experience it at all.”
(Barry Mazur, When is one thing equal to some other thing?)
This can be placed in context of the original goal of studying fundamental principles of intelligence. It means that this goal should encompass understanding of both the animal brain, but also any sort of a system (mechanical, human, hybrid, a computer simulation, regardless of its underlying implementation) whose behavior we classify as intelligent. Observe that the question of whether something is actually intelligent is ignored.
Generalization. Intelligence encompasses pattern recognition, natural language processing, knowledge representation, metacognition, planning, perception, analogical reasoning, creativity and many other aspects listed here. In order to describe a system whose behaviors exhibit properties of intelligence - the description itself can’t be specific to creativity, pattern recognition, planning or any of the other mentioned properties. We need to work on a higher level of abstraction whose special cases are those properties.
Internal model. All intelligent systems seem to have one thing in common: possession of an internal model of their environment, whose accuracy in modelling the said environment seems to be correlated with the degree of the intelligence of the corresponding system. Intelligent systems tend to organize and layer various hierarchical representations of the world in order to be more efficient at its internal simulation and thus prediction of its future states. Being tightly related to the notion of autopoiesis, these systems show a tendency not to only maintain themselves, but also to maintain their environment in a state which is compatible with their existence. One could consider these systems to be good regulators which strive to become isomorphic to the environment - whatever isomorphic might mean in this context.
No free lunch. In previous paragraph I kept mentioning the word “environment” to talk about intelligence. Perhaps the question “Does the behavior of X exhibit properties of intelligence?” is incomplete. Perhaps it only makes sense to study intelligent properties of a system with respect to a given environment. A system we deem intelligent, alive or autopoietic in one environment might not be considered as such in a completely different environment. For instance, consider a giraffe in its natural habitat. Although we can consider a giraffe to be a stable living system in a savanna, a giraffe in some arctic climate is not an autopoietic system. Another example is you, a human being. You consider yourself to be intelligent. But a human being can only survive in a tiny range of all possible environments. By placing you in a randomly chosen point in the universe, you most surely wouldn’t last very long. So it means you are also intelligent, but only with respect to your environment. Then, a natural question to ask is whether there’s a system which would be classified as intelligent with respect to any environment? Although I’m having trouble imagining how a random sample from the space of all environments could look like, there’s an interesting insight from optimization we can think about. No free lunch theorem states that, when averaged over all possible optimization problems, any two optimization algorithms are equal. It is a very interesting result that is not easy to wrap your head around. Does it give us any insight about what it means to be intelligent with respect to all environments? One could conclude all systems are equally intelligent when averaged over all environments! In other words, a rock and a person are equally intelligent, when averaged over all environments. Even though that might be hard to internalize, there is a good “rebuttal” to the argument. One might argue we are not interested in averaging over all possible, environments, but rather across a subset of environments we are interested in. This is the exactly the argument why optimization algorithms work at all - it is because we are not interested in their general performance on any data, but rather on a specific type of data.
Deep learning. Another hint which tells us how to approach reasoning about intelligence comes from deep learning. An increasing number of components of a modern deep learning system can be learned. For example, Generative Adversarial Networks learn the cost function. The paper Learning to Learn by gradient descent by gradient descent specifies networks that learn the optimization function. The paper Decoupled Neural Interfaces using Synthetic Gradients specifies how gradients themselves can be learned. These are just rough examples, but they give a sense of things to come. As more and more components of these systems stop being fixed throughout training, there is an increasingly larger need for precise formal specification of the things that do stay fixed. This is not an easy task; the invariants across all these networks seem to be rather abstract and hard to describe.
My research explores the hypothesis that the language of category theory could be well suited to describe intelligent systems in a precise manner. By doing so, I hope to describe and quantify high-level intelligent behavior.
Category theory is becoming a central hub for all of mathematics - but is also slowly finding applications in chemistry, game theory, neuroscience, causality and database theory to name a few. It is unmatched in organizing and layering abstractions across seemingly disparate disciplines.
Category theory, however, is much more than that. I believe we have just begun scratching the surface of general reasoning using category theory, as it can be extremely potent in guiding, structuring, and compressing thought. Category theory and intelligence seem to be deeply linked - both are guided by the goal of organizing and layering abstractions. Automation of exactly that - organizing and layering abstractions - seems to be the quest of machine learning.
The goal of recent efforts in AI is to discover a way to find patterns and concepts in data, organize those concepts in a compositional, hierarchical structure such that new concepts can be added and integrated in a body of knowledge the agent already posseses. Surprisingly, category theory seems to be doing exactly that. It is a structured way of adding concepts and information to our network of interconnected concepts.
With the above in mind, I make my goals more precise. I’m interested in understanding the theoretical and practical considerations of computation performed by systems we classify as intelligent, with a focus on those that utilize gradient information during learning. My interests are both at a low level (automatic differentiation, composition of differentiable maps) and at a high level of abstraction (categorically modeling network architectures, training methods, as well as concepts such as ‘generalization’ and ‘analogy’).
This is, of course, just the tip of the iceberg; understanding and implementing this type of computation implies an understanding of plenty of other things things that are complex enough by themselves already. This is why I’m also a strong advocate of purely functional programming languages and type-driven development. I believe we should outsource a part of our cognitive load to the compiler and use it as a guide while writing programs.
One important thing to note is that it is not the case that I’m trying to impose a special “category-theoretic way” of modeling these systems; what I’m doing is trying to find the most natural, leaky-abstraction-free and composable way there is to understand these systems. It turns out that I end up doing category theory.