# How does a lens make an image?

I’ve talked before on this blog about the propagation of electromagnetic waves, and that’s not stopping any time soon. This time, I’d like to go through from first principles and demonstrate how a lens forms an image in all the gory details. Beware, there be maths ahead…

In the beginning there was Maxwell

Now, light is an electromagnetic wave, so to describe it all we need are the following four equations: $\nabla \cdot \mathbf{E} = 0$ $\nabla \cdot \mathbf{B} = 0$ $\nabla \times \mathbf{E} = -\frac{\partial\mathbf{B}}{\partial t}$ $\nabla \times \mathbf{B} = \mu_0\epsilon_0n^2\frac{\partial\mathbf{E}}{\partial t}$

These are the equations for the 6 vector components of the electric and magnetic fields, and includes dielectric materials through the $n^2$ factor (where $n$ is the refractive index). Light is a sinusoidal wave, so we know that we can assume that the only time dependence of these fields is a factor $e^{i\omega_0 t}$. Using this fact, one can show: $\left(\nabla^2 + n^2k_0^2\right)\mathbf{E} + \nabla\left(\frac{1}{n^2}\left(\mathbf{E}\cdot\nabla n^2 \right)\right) = 0$

where $k_0 = \omega_0/c$ is the light wavevector. If we further assume that gradients in refractive index are slow, we arrive at the Helmholtz equation: $\left(\nabla^2 + k^2\right)\mathbf{E} = 0$

where the refractive index has been absorbed into the definition of the material-dependent wavevector $k$. It is possible to show that if a $\delta$-function source is placed at the origin, the solution of the Helmholtz equation (the Green’s function) is: $G(\mathbf{r}, t) \propto \frac{\exp(i(\omega_0t - \mathbf{k}\cdot\mathbf{r}))}{|\mathbf{r}|}$

Furthermore, in this limit of slow refractive index variations, it can be shown that all the components of the fields vary in the same way. We therefore only concentrate on one component, of the electric field say, and this is then scalar diffraction theory. Vector diffraction must be considered when calculating the field distribution of tightly-focussed laser pulses. The Green’s function simply tells us the phase of the light wave is constantly increasing with time at temporal frequency $\omega_0$, and decreasing with propagation distance with spatial frequency $k$.

The intensity of the light wave we consider is given by the Poynting vector $|\mathbf{E} \times \mathbf{H}| = GG^* \propto \frac{1}{|\mathbf{r}|^2}$

which scales with distance as $r^{-2}$ as expected, and confirms that the Green’s function above represents the amplitude of an energy flux conserved over radial shells of area $4\pi r^2$.

The setup

Now we’ve (rather rapidly) travelled from Maxwell’s equation to an expression for the wave emitted by a single point, we need to use this to figure out how a lens builds up an image. I’ll refer to the following conceptual setup in the derivation: We have an object in plane $U_0$, with coordinates $(\xi, \eta)$. The light from this object travels a distance $z_1$ to the plane $U_l(x,y)$, just before the lens. The lens of focal length f applies some magic, and transforms $U_l$ to $U_{l'}(x,y)$. Finally, this transformed wave travels a further distance $z_2$ to form an image at plane $U_i(u,v)$. Let’s get started.

Paraxial facts

Here on out we stipulate a further restriction: the light waves don’t travel too far from the optical axis, or $z_1, z_2 \gg a$ where $a$ is the transverse size of the object/lens/image. This is the paraxial approximation, and makes things much simpler. In the real world the paraxial approximation is too simple, and instead lenses must take into account the fact that it isn’t perfectly valid. We’ll ignore this inconvenient fact here.

Suppose we propagate a light ray from a point $(\xi, \eta)$ to $(x,y)$ over a longitudinal distance $z$. The total distance the ray travels is: $r = \sqrt{(x - \xi)^2 + (y - \eta)^2 + z^2}$ $r \approx z + \frac{1}{2z}\left( (x - \xi)^2 + (y - \eta)^2\right)$

Substituting this expression into the Green’s function, in the paraxial approximation we have: $G(x,y,z) \approx \frac{1}{z}\exp\left(\frac{ik}{2z}\left( (x - \xi)^2 + (y - \eta)^2\right)\right)$

We have omitted a constant phase factor $e^{ikz}$ from the Green’s function as it is constant across the wave field, and when we consider physical intensities constant phase factors are physically irrelevant.

Getting to the lens

In the object plane at $(\xi, \eta)$ the wave amplitude is given by $U_0(\xi, \eta)$. From the Green’s function, we know that this point contributes to the point in the plane $U_l(x,y)$ a factor $U_0(\xi, \eta)G(x,y,z)$. Adding up all of the waves emitted by the object we arrive at our first expression for the propagated wave, in this case just before the lens: $U_l(x,y) \propto \frac{e^{ikz_1}}{z_1}\int\int e^{i\frac{k}{2z_1}\left((x - \xi)^2 + (y - \eta)^2\right)}U_0(\xi, \eta)d\xi d\eta$

What the lens does

Let’s consider a simple lens, flat on one side and consisting of a spherical cap of radius of curvature $R$ on the other. The thickness of the lens as a function of position is then given by $\sqrt{R^2 - x^2 - y^2}$

Assuming the radius of curvature is large, we may expand the square root. If the refractive index of the lens is $n$, the excess phase shift imposed by the lens is: $\Delta\phi = -k(n-1)\frac{x^2 + y^2}{2R} + \text{const.}$

where again we have omitted a constant phase shift. You might recognise from the lensmakers equation the definition of the focal length of the lens $\frac{1}{f} = \frac{n-1}{R}$

and so $\Delta\phi = -\frac{k(x^2 + y^2)}{2f}$

The act of the lens is then to apply a phase shift to the incoming light which increases with the square of the radius, and which is proportional to the inverse of the focal length. The light field immediately after the lens is thus: $U_{l'}(x,y) = \exp\left(-i\frac{k(x^2 + y^2)}{2f}\right)U_l(x,y)$

We’ve assumed the two fields at $l$ and $l'$ occupy the same $z$ position, and so implicitly assume the lens is infinitely thin. Unsurprisingly this is known as the thin lens approximation, and is something else which doesn’t necessarily hold in the real world.

Making an image

We’re almost there. We’ve made it to and through the lens, the last step is to form an image. To avoid writing too many integral signs, let’s define a transfer function $h$ which dictates how the input field transforms into the output field: $U_i(u,v) = \int \int h(u,v,\xi,\eta)U_0(\xi, \eta)d\xi d\eta$

We have most of the ingredients for $h$ above, we just need to apply a second propagation step over the distance $z_2$. Doing this we end up with the slightly scary expression: $h(u,v,\xi,\eta) \propto e^{i\frac{k}{2z_2}(u^2 + v^2)}e^{i\frac{k}{2z_1}(\xi^2 + \eta^2)}$ $\times \int\int \exp\left(i\frac{k}{2}\left(\frac{1}{z_1} + \frac{1}{z_2} - \frac{1}{f}\right)(x^2 + y^2)\right)$ $\times \exp\left(-ik \left(\left(\frac{\xi}{z_1} + \frac{u}{z_2}\right)x + \left(\frac{\eta}{z_1} + \frac{v}{z_2}\right)y\right)\right)dxdy$

Let’s look at this term-by-term:

1. The first term is a pure phase term in the image coordinates which will disappear when we calculate the intensity, so it can go.
2. The second term is second-order in the object coordinates, so becomes negligible for sufficiently small objects. We’re in the paraxial approximation, so neglecting this is OK.
3. The third term is inside the integral, so does complicated things. However, we notice that if we make the stipulation $f^{-1} = z_1^{-1} + z_2^{-1}$, then this term disappears. Of course, this is the relation in geometric optics relating the object and image distances.

With these terms out the way, the expression is much more manageable. We have an inkling about the geometric properties of the problem now, so define the image magnification $M = z_2/z_1$: $h(u,v,\xi,\eta) \propto \int \exp\left(-i\frac{2\pi}{\lambda z_2} (u + M\xi)x\right) dx \int \exp\left(-i\frac{2\pi}{\lambda z_2} (v + M\eta)y\right) dy$

Let’s take the limit that $\lambda \rightarrow 0$ – the geometric optics limit. This is equivalent to scaling the integration variables such that the limits go to infinity, and each integral becomes a $\delta$ function: $h(u,v,\xi,\eta) \rightarrow \delta(u + M\xi)\delta(v + M\eta)$

This transfer function tells us that the object and image planes are the same, as long as we change coordinates such that $u = -M\xi$ and $v = -M\eta$. We therefore have a a new plane containing a perfect, inverted, scaled copy of the wave field at the object plane – also known as an image!

If we had included effects like the finite lens aperture, rather than a $\delta$-function we would instead have an Airy function. The image would then be a convolution between the object and the Airy pattern, reducing resolution below the theoretically perfect one. If the imaging distances don’t fulfil the condition above, the transfer function is additionally broadened in a complex way which reduces resolution further.

The payoff

Well done for making it this far. If you were eagerly anticipating a fancy graphic or something, I’m afraid you’re out of luck. However, don’t despair! Now I’ve set the groundwork, the next post is pretty much all pretty GIFs. I’ll see you then, assuming you haven’t already unsubscribed.