Saturday, March 28, 2026

Topological Spaces

From Points to Waves — Blog
Generating cover image with AI…
Cover image generated by AI
Blog 1 of a series · Topology & the Future of ML Representation

From Points
to Waves

Why the Next Generation of AI Representations Should Think in Signals
"A vector tells you where something is. A signal tells you what something does."

A Personal Starting Point

Over the past year, I have been working on problems that sit at the intersection of signals and cognition — including sleep-stage classification, cognitive performance modelling, and participation in the EEG 2025 NeurIPS challenge. These experiences exposed me to biological signals not just as data, but as structured, dynamic processes: oscillations, rhythms, synchrony, and noise interacting over time.

This naturally led me to think about representation from a more neuroscientific perspective. In the brain, information is not encoded as static points — it is carried through patterns of activity, often oscillatory, where timing and phase relationships play a crucial role. That raised a simple but persistent question:

Why do we represent meaning in machine learning as a static point in space?

Language is not static. Images are not static. The world that machine learning models are trying to understand is fundamentally dynamic — things unfold, interact, interfere with each other, and change meaning depending on context and timing. And yet the dominant paradigm in representation learning is to compress all of that into a fixed vector: a point in $\mathbb{R}^d$ that just sits there.

I started thinking about what it would look like to represent multimodal data — text, audio, vision — not as vectors but as signals: functions over time or frequency, things that have phase as well as amplitude, things that can constructively amplify each other or destructively cancel. Then I discovered that a whole family of models — State Space Models, including S4 [1] and Mamba [2] — had already begun moving in this direction, using signal-processing machinery from control theory as the foundation of sequence modelling.

This blog series is an attempt to explore an alternative viewpoint: what if representations were not points, but signals?

A note on process I am writing this series to document my learning from first principles and to make that journey useful to others exploring similar questions. I have also used LLMs for ideation, restructuring, and rephrasing in places, but the core ideas, technical direction, and learning are my own. The mathematical equations explaining each of the spaces are also generated using LLMs.

Part I — The Hierarchy of Mathematical Spaces

Before we can argue that one kind of space is better for representation, we need to understand what a "space" actually is in mathematics, and what properties different spaces add on top of each other. Think of this as a ladder — each rung adds structure. (Reference YouTube video)

1. Topological Space — The Most General Setting

A topological space is the most general notion of a geometric space [3]. You start with a set $X$ and a collection of subsets called open sets, satisfying three axioms: the empty set and $X$ itself are open; arbitrary unions of open sets are open; finite intersections of open sets are open.

That is all. There is no notion of distance, no notion of angle, no notion of size. But you gain the concept of continuity — a map between two topological spaces is continuous if the preimage of every open set is open.

This matters for machine learning because continuity is the prerequisite for anything meaningful to happen. If your embedding function is not continuous, nearby inputs can map to wildly different representations, and generalisation becomes impossible.

2. Metric Space — Adding Distance

A metric space $(X, d)$ is a topological space equipped with a distance function $d : X \times X \to \mathbb{R}_{\geq 0}$ satisfying non-negativity, identity of indiscernibles ($d(x,y) = 0 \iff x = y$), symmetry, and the triangle inequality $d(x,z) \leq d(x,y) + d(y,z)$ [4].

Metric spaces are where most practising ML researchers implicitly live. Euclidean distance, cosine distance, edit distance — these are all metrics. But metric spaces say nothing about directions, addition, or scaling.

3. Normed Vector Space — Adding Size and Algebra

A normed vector space $(V, \|\cdot\|)$ is a vector space over $\mathbb{R}$ (or $\mathbb{C}$) equipped with a norm satisfying positive definiteness, absolute homogeneity ($\|\alpha v\| = |\alpha| \|v\|$), and the triangle inequality [4]. This is where the $L^p$ spaces live:

$$\|f\|_2 = \left(\int_\Omega |f(x)|^2 \, dx\right)^{1/2}$$

4. Inner Product Space — Adding Geometry and Angles

An inner product space adds a bilinear form $\langle \cdot, \cdot \rangle : V \times V \to \mathbb{F}$. The inner product adds angles:

$$\cos\theta = \frac{\langle u, v \rangle}{\|u\| \|v\|}$$

This is the space that current LLMs inhabit. Dot-product attention computes $\text{Attention}(Q, K) \propto QK^\top$ [5] — exactly a matrix of inner products. The geometry of meaning, in today's models, is entirely encoded in the angles and magnitudes of vectors in an inner product space.

5. Banach Space — Completeness

A Banach space is a normed vector space that is complete: every Cauchy sequence converges to a limit that is still inside the space [4]. All finite-dimensional normed vector spaces are automatically Banach spaces.

6. Hilbert Space — The Meeting Point of Algebra and Analysis

A Hilbert space $\mathcal{H}$ is a complete inner product space [3, 4]. The canonical example is $L^2(\mathbb{R})$: the space of square-integrable functions on the real line, with:

$$\langle f, g \rangle = \int_{-\infty}^\infty f(x)\overline{g(x)} \, dx$$

The complex exponentials $\{e^{2\pi i n x}\}_{n \in \mathbb{Z}}$ form an orthonormal basis — the Fourier basis. The Fourier transform is a rotation in Hilbert space. Moving from the time domain to the frequency domain is not a loss of information — it is a change of basis in an infinite-dimensional inner product space.

7. Other Spaces

Sobolev spaces extend function spaces by incorporating derivatives [6]. They appear in regularisation theory, in physics-informed neural networks, and in the theoretical analysis of approximation by neural networks with smooth activation functions [7].

Minkowski space introduces an indefinite inner product — the geometric setting of special relativity. In machine learning, it appears in hyperbolic embedding methods [9]: hyperbolic space can represent hierarchical structures exponentially more efficiently than Euclidean space.


Part II — What LLMs Are Actually Doing, and What They Are Missing

Embeddings as Static Vectors

In every major large language model — GPT-4 [10], Llama [11], Mistral — tokens are mapped to vectors in $\mathbb{R}^d$ for some large $d$ (commonly 1024 to 16384). These embeddings are static: the token "bank" always starts as the same point in $\mathbb{R}^d$, regardless of context. This is a rich geometric structure — but it is fundamentally amplitude-only. There is no phase, no timing information, no notion of whether two features are in sync or in opposition.

The Magnitude Heuristic and Its Failure Modes

In real-valued networks, the common heuristic is: a feature is important if its activation is large. Magnitude proxies for salience. This creates several problems:

  1. Frequency vs importance ambiguity — A feature that fires frequently in training data accumulates large weights, making it hard to distinguish structural relevance from statistical prevalence.
  2. Superposition — Polysemantic neurons — single neurons that respond to multiple unrelated concepts — have been extensively documented [12].
  3. Limited interaction mechanisms — In real-valued spaces, you cannot have two representations that destructively interfere. The only way to suppress a feature is to add a neuron with the opposite sign.

Hardy Spaces and the Frequency Domain View

The Laplace transform maps a continuous-time signal $f(t)$ to a function of a complex variable $s = \sigma + i\omega$:

$$\mathcal{L}\{f\}(s) = \int_0^\infty f(t) e^{-st} \, dt$$

The Z-transform does the same for discrete sequences:

$$\mathcal{Z}\{x\}(z) = \sum_{n=0}^\infty x[n] z^{-n}$$

The spaces of functions these transforms naturally live in are Hardy spaces $H^2$ [13] — Hilbert spaces consisting of holomorphic functions with $L^2$ boundary behaviour. In these spaces, a "signal" encodes the real part (amplitude information), the imaginary part (phase information), and the inner product (overlap, synchrony, and coherence).


Part III — Signals as Richer Representations

Complex Activations and Phase

In a complex-valued neural network (CVNN) [14], activations are elements of $\mathbb{C}$. A single complex activation $z = re^{i\theta}$ encodes two quantities: the magnitude $r = |z|$ (analogous to standard activation strength) and the phase $\theta = \arg(z)$ — a second, independent channel of information encoding how a feature relates to the system's current state.

Interference: The Mechanism That Real-Valued Networks Lack

When two complex-valued signals are summed:

$$z_1 + z_2 = r_1 e^{i\theta_1} + r_2 e^{i\theta_2}$$

the result depends critically on the phase relationship $\theta_1 - \theta_2$. This gives a principled mechanism for:

  1. Noise suppression — Stochastic noise has a uniform phase distribution. When many noisy signals are summed, their phases cancel in expectation — exactly how phase-array radar works [15]. Real-valued networks have no analogous mechanism.
  2. Feature synchronisation — Semantically related features can be "phase-locked" — assigned similar phases so they constructively amplify each other. This mirrors the binding hypothesis in neuroscience [16].
  3. Geometric stability — Phase-aware models encode relationships as rotations in the complex plane. Rotations are isometries — they preserve distances — which tends to improve generalisation.

Why This Matters for Multimodal Representations

The standard approach to multimodal representation is to train modality-specific encoders and project them into the same $\mathbb{R}^d$ — forcing fundamentally different signal types into the same geometric box. But audio is intrinsically a signal: a pressure wave with frequency, amplitude, and phase. A Hilbert space representation would allow audio, image (via 2D Fourier structure), and text (via sequence dynamics) to be represented as functions — elements of an $L^2$ space — and their cross-modal relationships encoded as inner products and phase relationships.


Part IV — State Space Models: The Existing Bridge

When I was developing these ideas, I came across a body of work that had already built something close to what I was imagining: State Space Models (SSMs).

The S4 model [1] parameterises sequence processing using a continuous-time state space:

$$\dot{x}(t) = Ax(t) + Bu(t), \quad y(t) = Cx(t) + Du(t)$$

where $A$ is a structured HiPPO matrix [17] designed to optimally memorise history. In the frequency domain, this system is a rational function of $s$ — exactly the kind of object that lives naturally in a Hardy space.

Mamba [2] extends this with selective state spaces — input-dependent dynamics that allow the model to choose, at each step, what to remember and what to forget. The computational advantage is stark: standard Transformers scale as $O(n^2)$ in sequence length due to the attention matrix, while SSMs scale as $O(n)$. Signal representations come with efficient algorithms as a structural gift.


Part V — A Taxonomy of Spaces for Representation

Space Key Structure Natural ML Application
Metric spaceDistance onlyk-NN, clustering, contrastive learning
Normed vector spaceDistance + linear algebra$L^p$ regularisation, weight decay
Inner product spaceAngles + projectionsDot-product attention, cosine similarity
Hilbert space (finite-dim)Complete inner productStandard neural network layers
Hilbert space (infinite-dim, $L^2$)Function-valued representationsSSMs, functional neural processes
Hardy space $H^2$Holomorphic + $L^2$ boundaryLaplace/Z-transform signal representations
Sobolev space $W^{k,p}$Function + derivative regularityPhysics-informed NNs, smoothness regularisation
Hyperbolic / MinkowskiNegative curvatureHierarchical embedding, knowledge graphs
Riemannian manifoldLocal Euclidean + curvatureGeometric deep learning, manifold learning

Current LLMs sit solidly in the finite-dimensional inner product space column. State space models begin to occupy the $L^2$ and Hardy space columns. The full realisation of signal-based multimodal representation would require working fluently across several of these spaces simultaneously.


Conclusion — Rethinking Representation from the Ground Up

The dominant paradigm treats representation as placement: a token is a point, a meaning is a location, similarity is proximity. This is a powerful and productive view, and it has driven remarkable progress. But it is, ultimately, a static view.

Signals offer a dynamic alternative. A representation that carries phase as well as amplitude can encode how a feature relates to the system's current state, not just that it is present. Representations as functions in a Hilbert space can encode temporal and spectral structure natively, without discarding it at the tokenisation stage. Interference gives a principled mechanism for noise suppression and feature binding that has no real-valued analogue.

State space models are an existence proof that this direction is practically viable. But I think the full potential — particularly for multimodal systems where audio, vision, and language need to be represented in a common framework that respects the intrinsic nature of each modality — remains largely unexplored.

Coming up — Blog 2

The Manifold Hypothesis, linearity in high-dimensional spaces, and Vector Symbolic Architectures as a framework for compositional reasoning inside geometric representations.


References

[1] Gu, A., Goel, K., & Ré, C. (2022). Efficiently modeling long sequences with structured state spaces. ICLR 2022. arXiv

[2] Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752. arXiv

[3] Reed, M., & Simon, B. (1980). Methods of Modern Mathematical Physics, Vol. 1: Functional Analysis. Academic Press.

[4] Kreyszig, E. (1978). Introductory Functional Analysis with Applications. Wiley.

[5] Vaswani, A., et al. (2017). Attention is all you need. NeurIPS 2017. arXiv

[6] Evans, L. C. (2010). Partial Differential Equations (2nd ed.). American Mathematical Society.

[7] Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Information Theory, 39(3), 930–945.

[8] Minkowski, H. (1908). Raum und Zeit. Physikalische Zeitschrift, 10, 75–88.

[9] Nickel, M., & Kiela, D. (2017). Poincaré embeddings for learning hierarchical representations. NeurIPS 2017. arXiv

[10] OpenAI. (2023). GPT-4 Technical Report. arXiv

[11] Touvron, H., et al. (2023). LLaMA: Open and efficient foundation language models. arXiv

[12] Elhage, N., et al. (2022). Toy models of superposition. Transformer Circuits Thread. Article

[13] Garnett, J. B. (2007). Bounded Analytic Functions. Springer.

[14] Trabelsi, C., et al. (2018). Deep complex networks. ICLR 2018. arXiv

[15] Trees, H. L. V. (2002). Optimum Array Processing. Wiley-Interscience.

[16] Singer, W. (1999). Neuronal synchrony: a versatile code for the definition of relations? Neuron, 24(1), 49–65.

[17] Gu, A., et al. (2020). HiPPO: Recurrent memory with optimal polynomial projections. NeurIPS 2020. arXiv

representation-learning hilbert-spaces state-space-models signal-processing machine-learning mathematical-foundations embeddings

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home