Neural Networks: A Mathematical View

The Universal Problem of Function Approximation

Neural networks are often viewed as mysterious “black boxes” — systems that produce results without clear insight into their inner workings. However, at their core, neural networks are powerful tools for approximating functions. Approximation of functions has been a longstanding challenge in fields like mathematics, physics, and engineering. Whether predicting physical systems, modeling complex data, or transforming signals, we continually strive to approximate functions efficiently and accurately.

Historically, methods like Fourier series and Taylor series have provided elegant ways to break down complex functions into simpler components. Fast forward to the age of machine learning, and neural networks offer a modern, algorithmic approach to the same challenge. What many don’t realize is that neural networks share a deep connection with these classical techniques of function decomposition.

In this blog, we’ll explore the mathematical origins of function approximation, from the spectral theorem and truncated series to the principles that naturally lead us to understanding how neural networks approximate functions.


1. The Spectral Theorem: Decomposing Functions into Simpler Parts

The spectral theorem is a powerful tool in mathematics, particularly in linear algebra and functional analysis. It tells us that certain classes of functions or operators can be decomposed into simpler, orthogonal components. These components are often associated with eigenfunctions and eigenvalues, which give us insight into the behavior of the function.

A function f as a summation of orthogonal functions.

In simpler terms, the spectral theorem says that a function can be expressed as a sum of orthogonal functions, much like how we can decompose a sound wave into its constituent frequencies using a Fourier transform. These orthogonal functions form a basis, and by taking an infinite sum of them, we can represent any function within that space.

But what happens if we don’t use an infinite number of terms?


2. Truncated Series: Approximating Functions with Finite Sums

Classical function decomposition methods, such as Fourier series or Taylor series, allow us to express a function as an infinite sum of simpler functions (e.g., sines, cosines, or polynomials). However, in practice, we often work with truncated sums — finite sums of the most important components — because infinite sums are impractical for real-world computations.

For instance, in Fourier series, a function f(x) can be approximated by the first N terms:

The larger the N, the better the approximation, but even with a few terms, we often get a good approximation. The idea of truncated series is foundational in many areas of applied mathematics.

But what if we could learn the best components to approximate a function instead of using predefined functions like sines and cosines? This is where neural networks come into play.


3. Neural Networks: Learning the Basis Functions

Neural networks are universal function approximators. According to the universal approximation theorem, a neural network with at least one hidden layer can approximate any continuous function to any desired level of accuracy, given enough neurons:

But how do they do this?

Think of neural networks as a learned, data-driven approach to function approximation. Instead of relying on predefined basis functions like in a Fourier or Taylor series, neural networks learn the components or “basis functions” directly from the data. These components are represented by the activation functions of neurons in the hidden layers.

In a way, neural networks are performing a task similar to the truncated series concept — approximating a function with a finite set of learned components (neurons). As we add more neurons or layers, we increase the network’s ability to capture more complex aspects of the target function, just like adding more terms in a truncated Fourier series improves the approximation.


4. Connecting the Dots: From Spectral Theorem to Neural Networks

Let’s put it all together. The spectral theorem tells us that functions can be decomposed into sums of orthogonal components, which can be infinite. In practice, we use truncated series to approximate functions with a finite number of terms, balancing complexity and accuracy. Now, with neural networks, we take this idea one step further by learning the best components directly from data, optimizing for accuracy and flexibility.

Neural networks can be viewed as an adaptive, algorithmic approach to approximating functions. Rather than relying on predefined mathematical functions, they use optimization techniques (like gradient descent) to discover the “best-fit” basis functions for a particular dataset or problem.


5. Why This Matters: Practical Applications of Neural Networks

Neural networks’ ability to approximate complex functions has made them indispensable in fields like:

  • Image and Signal Processing: Just as Fourier series decompose signals into frequency components, neural networks can learn to decompose and recognize patterns in images, sound, and video data.

  • Natural Language Processing: Instead of predefined linguistic rules, neural networks learn the underlying structure of language through probabilities, offering state-of-the-art performance in tasks like translation, summarization, and sentiment analysis.

  • Predictive Analytics: In business, finance, and healthcare, neural networks learn complex relationships in data to provide accurate predictions and recommendations./


6. Conclusion: Neural Networks as the Future of Function Approximation

From the classical spectral theorem to modern neural networks, the idea of breaking down complex functions into simpler components is a unifying theme in mathematics and machine learning. Neural networks represent the next evolution in function approximation — an adaptive, flexible, and powerful tool that can learn the best way to approximate any function from data.

In a way, neural networks offer a fresh perspective on old ideas, combining the mathematical rigor of function decomposition with the practical power of modern algorithms. As we continue to advance in AI, understanding the deep mathematical roots of neural networks gives us insight into their success and potential future applications.