State-space representations and why should I care?

State-space representations of filters come up frequently in the topic of control systems but do not seem to be a focus when learning-about or implementing regular filter structures. Instead, the focus seems to be on transfer functions. I think this is quite a shame because state space representations both: map conveniently to implementations and provide a much more natural way to see and exploit the mathematics behind a filter. Their representation in terms of matrices opens up a gigantic amount of mathematical tools for manipulating them. Their usefulness is demonstrated in my previous post (Arbitrary polynomial-segment signal generation) and I figured would expand on some of the details in this post… although, this is really just scratching the surface.

I did some googling to see if I could figure out why state space is not completely ubiquitous and stumbled upon this DSP related blog post which appears to ponder the same thing. The author argues that “the reason is how electrical engineers think” – which is probably true – but surely the way we think has been conditioned by our education (and/or work).

I want this page to provide a bit of an overview of state space and hopefully provide some compelling reasons to use it. Let’s see how it goes…

A quick introduction.

At the core, state space representations consist of two matrix equations:

    \[   \begin{array}{@{}ll@{}}     z\hat{q} & = \mathbf{A} \hat{q} + \mathbf{B} \hat{x} \\     \hat{y} & = \mathbf{C} \hat{q} + \mathbf{D} \hat{x}   \end{array} \]

The choice of z (as opposed to s or another variable) is irrelevant – but intentional in this post as we are going to focus on discrete systems. The \mathbf{A} and \mathbf{B} matrices define how the previous N states (stored in the column vector \hat{q}) and input combine to create the next set of states. \mathbf{C} and \mathbf{D} show how the previous states and input are mixed to create the output. The above is equivalent to the difference equation below:

    \[   \begin{array}{@{}ll@{}}     \hat{q}[n] & = \mathbf{A} \hat{q}[n-1] + \mathbf{B} \hat{x}[n] \\     \hat{y}[n] & = \mathbf{C} \hat{q}[n-1] + \mathbf{D} \hat{x}[n]   \end{array} \]

Where \hat{x}, \hat{y} and \hat{q} are column vectors at time n.

When we design and implement IIRs using transfer functions and difference equations, the states are the previous N output samples. In a state space system, we can define the states as previous output samples, but there are many ways we can assign the states. I’m going to suggest that choosing previous output samples as states is actually quite a bad choice: it’s convenient and easy to understand in an implementation, but must imply that the “true” states need to be derived from previous output samples. Consider modelling an analogue electronic system with a capacitor and an inductor: somehow our model is predicting the voltage across the capacitor and current through the inductor using previous output samples. In theory, this is no problem – but on a computer with fixed precision, it can explode into instability depending on the system. This has already been widely discussed with high-order difference equations and is one of the main reasons for using cascades of biquad sections to implement high-order filters. With state space, we can quantify just how badly a system will perform by looking at the condition number of the \mathbf{A} matrix. The condition number is a reflection of how much small changes in state can be amplified when passed through the matrix which is a metric for defining the loss of precision in a process.

Going from a transfer function to state space

While the transfer function of a system is unique, the state space representation is not. There are an infinite number of ways to choose \mathbf{A}, \mathbf{B} and \mathbf{C} that will give exactly the same output.

Choosing good ways to map a transfer function into state space is outside the scope of this post, but we can create a system which is equivalent to the 2nd order direct-form system below:

    \[   \begin{array}{@{}ll@{}}     \frac{Y(z)}{X(z)} & = \frac{b_0 + b_1 z^{-1} + b_2 z^{-2}}{1 + a_1 z^{-1} + a_2 z^{-2}} \\     y[n] & = b_0 x[n] + b_1 x[n-1] + b_2 x[n-2] - a_1 y[n-1] - a_2 y[n-2]   \end{array} \]


    \[   \begin{array}{@{}ll@{}}     \mathbf{A} & = \left[\begin{array}{@{}cc@{}}       -a_1 & -a_2 \\       1 & 0     \end{array}\right] \\     \mathbf{B} & = \left[\begin{array}{@{}cc@{}}       1 \\       0     \end{array}\right] \\     \mathbf{C} & = \left[\begin{array}{@{}cc@{}}       b_1 - a_1 b_0 & b_2 - a_2 b_0     \end{array}\right] \\     \mathbf{D} & = b_0   \end{array} \]

The above is implemeneted in ss_siso_init() on GitHub. As stated in the comments, it’s not a particularly good mapping.

Writing a program to execute the filter is only needs to evaluate the matrix mutliplies. Notice that for a single-input single-output system (SISO), \mathbf{B} is a column vector, \mathbf{C} is a row vector and \mathbf{D} is a scalar – so we are exclusively doing vector-matrix multiplies. If the system happened to be 4th order (i.e. the \mathbf{A} matrix is 4×4) and we were targetting x86 with SSE, we could do the whole state update operation using SSE operations.

Going from state space to a transfer function

Going back to a transfer function from state space is more complicated – although for a 2nd order system (or a block diagonal system of 2×2 and 1×1 blocks), we can work out a closed-form solution by hand. The transfer function of a state space system is given as:

    \[   \begin{array}{@{}ll@{}}     H(z) & = \mathbf{C}\left(\mathbf{I} z - \mathbf{A}\right)^{-1}\mathbf{B} + \mathbf{D} \\      & = \frac{\mathbf{C} \mbox{adj} \left( \mathbf{I} z - \mathbf{A} \right) \mathbf{B}}{\left| \mathbf{I} z - \mathbf{A} \right|}   \end{array} \]

I’m going to argue that going in this direction is almost never necessary, so we aren’t going to think much about it. If we need to figure out the frequency responce, we can do exactly what we do with transfer functions and make the substitution z=e^{j w T_0} – but now we have a matrix inverse to perform.

Changing the states

Let’s make the substitution \hat{q}=\mathbf{W} \hat{v} where \mathbf{W} is some NxN invertible matrix. To remove the \mathbf{W} from the LHS of the state update, we will left-multiply both sides by \mathbf{W}^{-1} to obtain a new system in state space form.

    \[   \begin{array}{@{}ll@{}}     z \hat{v} & = \mathbf{W}^{-1} \mathbf{A} \mathbf{W} \hat{v} + \mathbf{W}^{-1} \mathbf{B} \hat{x} \\     \hat{y} & = \mathbf{C} \mathbf{W} \hat{v} + \mathbf{D} \hat{x}   \end{array} \]

We now have a system which still produces the same output, but the values of the internal states will take on different values (provided \mathbf{W} \neq \mathbf{I}). This can be particularly useful because:

  • We mentioned before that if the matrix \mathbf{A} has a high condition number, small changes in state variables can be amplified substantially during the state update. On a computer with limited precision types, this can lead to errors which propagate to the output through the \mathbf{C} matrix. By making this transformation, we might be able to reduce the condition number of the new state update matrix.
  • Executing the vector-matrix multiply for the state update increases in complexity as the order of the system increases with N^2. If we can find some way to diagonalise or block diagonalise the state update matrix, we could reduce this to a complexity increase in N.

Concatenation of state space systems

When we have the transfer functions of two systems and want to concatenate them together (i.e. send the output of one to the input of another), we can multiply the transfer functions to get the combined system. With state space, there is a bit more process but we can do a similar thing. Denote two systems:

    \[   \begin{array}{@{}ll@{}}     z\hat{q}_0 & = \mathbf{A}_0 \hat{q}_0 + \mathbf{B}_0 \hat{x} \\     \hat{w} & = \mathbf{C}_0 \hat{q}_0 + \mathbf{D}_0 \hat{x} \\     z\hat{q}_1 & = \mathbf{A}_1 \hat{q}_1 + \mathbf{B}_1 \hat{w} \\     \hat{y} & = \mathbf{C}_1 \hat{q}_1 + \mathbf{D}_1 \hat{w}   \end{array} \]

We can join these systems together to obtain:

    \[   \begin{array}{@{}ll@{}}     z\hat{q}_0 & = \mathbf{A}_0 \hat{q}_0 + \mathbf{B}_0 \hat{x} \\     z\hat{q}_1 & = \mathbf{A}_1 \hat{q}_1 + \mathbf{B}_1 \mathbf{C}_0 \hat{q}_0 + \mathbf{B}_1 \mathbf{D}_0 \hat{x} \\     \hat{y} & = \mathbf{C}_1 \hat{q}_1 + \mathbf{D}_1 \mathbf{C}_0 \hat{q}_0 + \mathbf{D}_1 \mathbf{D}_0 \hat{x}   \end{array} \]

If we now define a new column vector q that is the concatenation of the two column vectors q_0 and q_1, we can work out the matrices for the combined system. i.e.

    \[   \begin{array}{@{}ll@{}}     \hat{q} & = \left[\begin{array}{@{}cc@{}}       \hat{q}_0 \\       \hat{q}_1     \end{array}\right] \\     \mathbf{A} & = \left[\begin{array}{@{}cc@{}}       \mathbf{A}_0 & \mathbf{0} \\       \mathbf{B}_1 \mathbf{C}_0 & \mathbf{A}_1     \end{array}\right] \\     \mathbf{B} & = \left[\begin{array}{@{}cc@{}}       \mathbf{B}_0 \\       \mathbf{B}_1 \mathbf{D}_0     \end{array}\right] \\     \mathbf{C} & = \left[\begin{array}{@{}cc@{}}       \mathbf{D}_1 \mathbf{C}_0 & \mathbf{C}_1     \end{array}\right] \\     \mathbf{D} & = \mathbf{D}_1 \mathbf{D}_0   \end{array} \]

As we mentioned before, growing the size of the \mathbf{A} matrix increases complexity in N^2 unless we can somehow figure out a way to block diagonalise the matrix. If we choose to do this when we concatenate systems together, it means that we need to find some similarity transform that satisfies:

    \[   \begin{array}{@{}ll@{}}     \left[\begin{array}{@{}cc@{}}       \mathbf{A}_0 & \mathbf{0} \\       \mathbf{B}_1 \mathbf{C}_0 & \mathbf{A}_1     \end{array}\right] \left[\begin{array}{@{}cc@{}}       \mathbf{W}_{0,0} & \mathbf{W}_{0,1} \\       \mathbf{W}_{1,0} & \mathbf{W}_{1,1}     \end{array}\right] = \left[\begin{array}{@{}cc@{}}       \mathbf{W}_{0,0} & \mathbf{W}_{0,1} \\       \mathbf{W}_{1,0} & \mathbf{W}_{1,1}     \end{array}\right] \left[\begin{array}{@{}cc@{}}       \mathbf{A}_0 & \mathbf{0} \\       0 & \mathbf{A}_1     \end{array}\right]   \end{array} \]

As long as \mathbf{A}_0 and \mathbf{A}_1 share no common eigenvalues, this problem has a unique solution. Expanding out the above into distinct matrix equations, it turns out that:

    \[   \begin{array}{@{}ll@{}}     \mathbf{W}_{0,0} & = \mathbf{I} \\     \mathbf{W}_{0,1} & = \mathbf{0} \\     \mathbf{W}_{1,1} & = \mathbf{I} \\     \mathbf{A}_1 \mathbf{W}_{1,0} - \mathbf{W}_{1,0} \mathbf{A}_0 & = - \mathbf{B}_1 \mathbf{C}_0   \end{array} \]

The final equation is known as a Sylvester equation and there are closed-form methods for solving them.

I have an implementation which performs this entire diagonalising concatenation in the ss_siso_cat_diag() function on GitHub.

This is a really cool result because it means that if we construct a high-order filter using second order state space representations and none of those individual representations have common eigenvalues, we can completely block diagonalise the system i.e. the filters can be run completely in parallel! This is very similar to a partial fraction decomposition of transfer function. Furthermore, the state space representation of the block diagonalised system shows that there is an efficient implementation for running the system on a computer with vector support where the vectors are half the length of the filter order. Denote a 8th order system with a block diagonalised state transition matrix:

    \[   \begin{array}{@{}ll@{}}     \mathbf{A} & = \left[\begin{array}{@{}cccccccc@{}}       a_{0} & b_{0} & 0 & 0 & 0 & 0 & 0 & 0 \\       c_{0} & d_{0} & 0 & 0 & 0 & 0 & 0 & 0 \\       0 & 0 & a_{1} & b_{1} & 0 & 0 & 0 & 0 \\       0 & 0 & c_{1} & d_{1} & 0 & 0 & 0 & 0 \\       0 & 0 & 0 & 0 & a_{2} & b_{2} & 0 & 0 \\       0 & 0 & 0 & 0 & c_{2} & d_{2} & 0 & 0 \\       0 & 0 & 0 & 0 & 0 & 0 & a_{3} & b_{3} \\       0 & 0 & 0 & 0 & 0 & 0 & c_{3} & d_{3}     \end{array}\right] \\     \mathbf{B} & = \left[\begin{array}{@{}c@{}}       e_{0} \\       f_{0} \\       e_{1} \\       f_{1} \\       e_{2} \\       f_{2} \\       e_{3} \\       f_{3}     \end{array}\right] \\     \mathbf{C} & = \left[\begin{array}{@{}cccccccc@{}}       g_{0} & h_{0} & g_{1} & h_{1} & g_{2} & h_{2} & g_{3} & h_{3}     \end{array}\right] \\     \mathbf{D} & = i   \end{array} \]

Each lower-case letter represents some coefficient vector which will exist in memory. With the system laid out like this, we can write some C to implement the filtering operation (this is written using my cop vector abstraction which is fairly opaque):

#include "cop/cop_vec.h"
float filter(const float *coefs, float *state, float input) {
	/* load coefficients */
	v4f a      = v4f_ld(coefs + 0);
	v4f b      = v4f_ld(coefs + 4);
	v4f c      = v4f_ld(coefs + 8);
	v4f d      = v4f_ld(coefs + 12);
	v4f e      = v4f_ld(coefs + 16);
	v4f f      = v4f_ld(coefs + 20);
	v4f g      = v4f_ld(coefs + 24);
	v4f h      = v4f_ld(coefs + 28);
	float i    = coefs[32];
	/* load existing state */
	v4f q0     = v4f_ld(state + 0);
	v4f q1     = v4f_ld(state + 4);
	/* perform state update and compute output*/
	v4f inb    = v4f_broadcast(input);
	v4f new_q0 = v4f_mul(e, inb);
	v4f new_q1 = v4f_mul(f, inb);
	v4f out    = v4f_mul(g, q0);
	new_q0     = v4f_add(new_q0, v4f_mul(a, q0));
	new_q1     = v4f_add(new_q1, v4f_mul(c, q0));
	out        = v4f_add(out, v4f_mul(h, q1));
	new_q0     = v4f_add(new_q0, v4f_mul(b, q1));
	new_q1     = v4f_add(new_q1, v4f_mul(d, q1));
	v4f_st(state + 0, new_q0);
	v4f_st(state + 4, new_q1);
	/* horizontal adddition required here but I don't support it */
	return v4f_hadd_not_implemented(out) + i * input;

The point of including the code is to show how we can filter a sample through an 8th order system efficiently utilising vector parallelism with 8 multiply/MAC operations. Implementing this with cascaded biquads would:

  • Probably lead to pipelining issues as each biquad is dependent on the output of the previous one.
  • Probably not be implementable using vector support.

Multiple input processing

As a final point, if we wanted to derive a way to process multiple input samples through a high-order difference equation, there is a substantial amount of tedious substitution to be done. If we stay in a state space representation, there is a single substitution to make. For example:

    \[   \begin{array}{@{}ll@{}}     \hat{q}[n]        & = \mathbf{A} \hat{q}[n-1] + \mathbf{B} \hat{x}[n] \\     \hat{q}[n+1]      & = \mathbf{A} \hat{q}[n] + \mathbf{B} \hat{x}[n+1] \\                       & = \mathbf{A}^2 \hat{q}[n-1] + \mathbf{A} \mathbf{B} \hat{x}[n] + \mathbf{B} \hat{x}[n+1] \\     \hat{q}[n+\sigma] & = \mathbf{A}^{\sigma+1}\hat{q}[n-1] + \sum_{k=0}^{\sigma}\mathbf{A}^{\sigma-k}\mathbf{B}\hat{x}[n+k]  \\     \hat{y}[n]        & = \mathbf{C} \hat{q}[n-1] + \mathbf{D} \hat{x}[n] \\     \hat{y}[n+1]      & = \mathbf{C} \hat{q}[n] + \mathbf{D} \hat{x}[n+1] \\                       & = \mathbf{C} \mathbf{A} \hat{q}[n-1] + \mathbf{C} \mathbf{B} \hat{x}[n] + \mathbf{D} \hat{x}[n+1] \\     \hat{y}[n+\sigma] & = \mathbf{C} \mathbf{A}^{\sigma} \hat{q}[n-1] + \mathbf{C} \sum_{k=0}^{\sigma-1}\mathbf{A}^{\sigma-1-k}\mathbf{B} \hat{x}[n+k] + \mathbf{D} \hat{x}[n+\sigma]   \end{array} \]

The final expressions above for \hat{q} and \hat{y} are only valid for \sigma > 0.

This may or may not be useful depending on what you’re trying to do, but in my previous post I made heavy use of it to derive simple expressions for when x[n] is a polynomial in n. It would likely also be of use in building resampling filters where many output samples are discarded and most input samples are zero.

Arbitrary polynomial-segment signal generation (e.g. square, triangle, sawtooth waves)

There is a tool which is part of my Open Diapason project called “sampletune” which is designed to allow sample-set creators to tune samples by ear and save the tuning information back into the sampler chunk of the wave file. The utility plays back a fixed tone and there are keyboard controls to shift the pitch of the sample up and down until the beats disappear. I was hoping to prototype a sawtooth wave as the tuning signal, but realised during implementation that creating arbitrary frequency piecewise-linear waveforms is actually somewhat a difficult problem – fundamentally due to aliasing… so I parked that idea and ended up summing several harmonic sinusoids at some arbitrary levels.

Recently, I was thinking about this again and set about solving the problem. It was only in the last couple of weeks that I discovered tools like Band Limited Impulse Trains (see link at bottom of post) which seem to be the common way of generating these signals. These are fundamentally FIR techniques. I approached the problem using IIR filters and came up with a solution which I think is kind-of interesting – please let me know if you’ve seen it done this way so I can make some references!

The idea

If we generate a piecewise linear waveform wave at a frequency which is not a factor of the sampling rate, we will get aliasing. In fact, if we can work out the number of shifted aliasing spectrum copies which we can expect to see by taking f_0 / \text{gcd}(f_0, f_s). If f_0 is irrational, we have an infinite number of copies. Let’s make the assumption that more-often than not, we are going to be generating a very large number of aliases. The next assumption we are going to make is that most of the signals we are interested in generating have a spectral envelope which more-or-less decays monotonically. With this assumption, we can say: if we generate our signal at a much higher sampling rate, we effectively push the level of the aliasing down in the section of the spectrum we care about (i.e. the audible part). Below is an image which hopefully illustrates this; the upper image shows the spectral envelope of the signal we are trying to create in blue, all of the black lines which follow are envelopes of the aliasing. The second image shows what would have happened if we created the same frequency signal at three times the sampling rate. The interesting content again is in blue and everything in black is either aliasing or components we don’t care about. What we can see is that by increasing the sample rate we generate our content at, we effectively push the aliasing components down in the frequency region we care about.

So let’s assume that instead of generating our waveforms at f_s, we will:

  • Generate them at a sample rate of M f_s where M is some large number.
  • Filter out the high-frequency components above f_s / 2 using some form of IIR filter structure.
  • Decimate the output sequence by a factor of M to get back to f_s (which we can do because of the previous operation).

Now that we’ve postulated a method, let’s figure out if it makes any sense by answering the questions “does it have an efficient implementation?” and “does it actually produce reasonable output for typical signals we may want to generate?”.

Does it have an efficient implementation?

Denote a state-space representation of our SISO discrete system as:

    \[   \begin{array}{@{}ll@{}}     \hat{q}\left[n\right] & = \mathbf{A} \hat{q}\left[n-1\right] + \mathbf{B} x\left[n\right] \\     y\left[n\right] & = \mathbf{C} \hat{q}\left[n-1\right] + D x\left[n\right]   \end{array} \]

For clarity, let’s just consider a single update of the system state where \hat{q}\left[-1\right] represents the “previous state” and x\left[0\right] represents the first sample of new data we are pushing in.

    \[   \begin{array}{@{}ll@{}}     \hat{q}\left[0\right] & = \mathbf{A} \hat{q}\left[-1\right] + \mathbf{B} x\left[0\right] \\     y\left[0\right] & = \mathbf{C} \hat{q}\left[-1\right] + D x\left[0\right]   \end{array} \]

We can derive an expression to insert two samples in via substitution and from that derive expressions for inserting groups of samples.

    \[   \begin{array}{@{}ll@{}}     \hat{q}\left[0\right] & = \mathbf{A} \hat{q}\left[-1\right] + \mathbf{B} x\left[0\right] \\     \hat{q}\left[1\right] & = \mathbf{A} \hat{q}\left[0\right] + \mathbf{B} x\left[1\right] \\  & = \mathbf{A}^2 \hat{q}\left[-1\right] + \mathbf{A}\mathbf{B} x\left[0\right] + \mathbf{B} x\left[1\right] \\ \hat{q}\left[\sigma\right] & = \mathbf{A}^{\sigma+1}\hat{q}\left[-1\right] + \sum_{n=0}^{\sigma}\mathbf{A}^{\sigma-n}\mathbf{B}x\left[n\right]  \\   \end{array} \]

    \[   \begin{array}{@{}ll@{}}     y\left[0\right] & = \mathbf{C} \hat{q}\left[-1\right] + D x\left[0\right] \\     y\left[1\right] & = \mathbf{C} \hat{q}\left[0\right] + D x\left[1\right] \\  & = \mathbf{C}\mathbf{A}\hat{q}\left[-1\right] + \mathbf{C}\mathbf{B}x\left[0\right] + D x\left[1\right] \\ y\left[\sigma\right] & = \mathbf{C}\mathbf{A}^{\sigma}\hat{q}\left[-1\right] + \mathbf{C}\sum_{n=0}^{\sigma-1}\mathbf{A}^{\sigma-1-n}\mathbf{B}x\left[n\right] + D x\left[\sigma\right]   \end{array} \]

We’re interested in inserting groups of samples which form linear segments into our filter. So let’s replace x\left[n\right] with some arbitrary polynomial. This gives us some flexibility; we need only a polynomial of zero degree (constant) for square waves, first degree for saw-tooth and triangle – but we could come up with more interesting waveforms which are defined by higher order polynomials. i.e.

    \[ x\left[n\right] = \sum_{c=0}^{C-1} a_c n^c \]

Let’s substitute this into our state expressions:

    \[   \begin{array}{@{}ll@{}}     \hat{q}\left[0\right] & = \mathbf{A} \hat{q}\left[-1\right] + \mathbf{B} a_0 \\     \hat{q}\left[\sigma\right] & = \mathbf{A}^{\sigma+1}\hat{q}\left[-1\right] + \sum_{n=0}^{\sigma}\mathbf{A}^{\sigma-n}\mathbf{B}\sum_{c=0}^{C-1} a_c n^c \\  & = \left(\mathbf{A}^{\sigma+1}\right)\hat{q}\left[-1\right] + \sum_{c=0}^{C-1} a_c \left(\sum_{n=0}^{\sigma}\mathbf{A}^{\sigma-n}\mathbf{B} n^c\right) \\   \end{array} \]

    \[   \begin{array}{@{}ll@{}}     y\left[0\right] & = \mathbf{C} \hat{q}\left[-1\right] + D a_0 \\     y\left[\sigma\right] & = \mathbf{C}\mathbf{A}^{\sigma}\hat{q}\left[-1\right] + \mathbf{C}\sum_{n=0}^{\sigma-1}\mathbf{A}^{\sigma-1-n}\mathbf{B}\sum_{c=0}^{C-1} a_c n^c + D \sum_{c=0}^{C-1} a_c \sigma^c \\  & = \left(\mathbf{C}\mathbf{A}^{\sigma}\right)\hat{q}\left[-1\right] + \sum_{c=0}^{C-1} a_c \left(\sum_{n=0}^{\sigma-1}\mathbf{C}\mathbf{A}^{\sigma-1-n}\mathbf{B} n^c\right) + \sum_{c=0}^{C-1} a_c \left(D \sigma^c\right)   \end{array} \]

There are some interesting things to note here:

  • The first output sample (y\left[0\right]) is only dependent on a_0 and the previous state. If we restrict our attention to always getting the value of the first sample, we don’t need to do any matrix operations to get an output sample – just vector operations.
  • All of the parenthesised bits in the equations have no dependency on the coefficients of the polynomial we are inserting into the filter – they only depend on the number of samples we want to insert (\sigma), the system state matrices and/or the index of the polynomial coefficient. This means that we can build lookup tables for the parenthesised matrices/vectors that are indexed by \sigma and the polynomial coefficient index and effectively insert a large number of samples of any polynomial into the filter state with a single matrix-matrix multiply and a number of scalar-vector multiplies dependent on the order of the polynomial.

It’s looking like we have an efficient implementation – particularly if we are OK with allowing a tiny (almost one sample at f_s) amount of latency into the system. I’ve built a small test program to validate that all of this works, which you can find here

This is 400 lines of C – most of it is purely filter creation stuff. The bulk of the synthesis code is in main(). Some interesting functions are:

  • ss_siso_cat() – concatenates two single-in, single-out state space systems into one large system.
  • ss_siso_build_advancement_table() – construct the \left(\mathbf{A}^{\sigma+1}\right) and \left(\sum_{n=0}^{\sigma}\mathbf{A}^{\sigma-n}\mathbf{B} n^c\right) matrices used to update the system. I call these “advancement tables”.
  • ss_siso_poly3() – given a system and the advancement tables, advance an IIR by a polynomial with some number of samples.

The program dumps output into a 2 channel 16-bit raw PCM file. The output is a sawtooth waveform generated the naive way in the left channel, and using the IIR system in the right.

Does it work?

Below is a screen capture showing the output of the small program linked above being viewed using Adobe Audition. The system is configured using an elliptic filter with 1 dB of passband ripple and a stop-band gain of -60 dB.

Waveform with frequency of 600*pi Hz. Left channel generated using naive modulo, right generated using the proposed scheme.

Waveform with frequency of 600*pi Hz. Left channel generated using naive modulo, right generated using the proposed scheme. The spectrum of the left channel is green.

The waveform has a frequency of 600\pi which is irrational and therefore has an infinite number of aliases when sampled (this is not strictly true in a digital implementation – but it’s still pretty bad). We can see that after the cutoff frequency of the filter, the aliasing components are forced to decay at the rate of the filter – but we predicted this should happen.

Below is another example using an 4th order Bessel filter instead of an elliptic filter. This preserves the shape of the time-domain signal much more closely than the elliptic system, but rolls off much slower. But the performance is still fairly good and might be useful.

Waveform with frequency of 600*pi Hz. Left channel generated using naive modulo, right generated using the proposed scheme with a Bessel filter.

Waveform with frequency of 600*pi Hz. Left channel generated using naive modulo, right generated using the proposed scheme. The spectrum of the left channel is green.

Here is a link to a good paper describing how classic waveforms have been done previously “Alias-Free Digital Synthesis of Classic Analog Waveforms” [
T. Stilson and J. Smith, 1996]

Generating multiple low-quality random numbers… fast.


A while ago I was doing some profiling to find out ways to improve the load performance of my Open Diapason project. The load process is roughly as follows:

  • Load wave file from disk into memory.
  • Apply a linear phase filter to all loaded samples (this is to compensate for the response of the resampler which occurs during playback).
  • Normalise and quantise the filtered audio using TPDF dither. The normalisation ensures that we use the full 16-bits of precision for each sample. Obviously, the scaling coefficient is stored along with the sample to ensure we play the content back at the intended level.

All of these steps are done across multiple threads to maximise performance (there is one thread that is reading audio files from disk ready to spawn threads which apply the filter and requantise the data)… this is one of those times where the speedup was roughly linear with the first couple of threads which I added.

Anyway, back to the profiling: I was expecting that the vast majority of the cycles would be spent in the application of the filter, but this actually turned out not to be the case. The dithering and re-quantisation of the audio – which I assumed would be cheap – turned out to be chewing cycles on the order of 30% of the load time (all of my profiling was done on a mid 2014 MacBook Pro with an i7 processor).

There were multiple things which I did to speed it up, but I think what I pushed in this change list was the most interesting one – so what follows in this blog is a bit of a write up.

The idea

Uniform random numbers are useful and often we need several of them. In my use-case (performing TPDF dither on a sample pair) I needed four random variables. Generating them needs to be fast but the “quality” of the randomness is not that important. The obvious way to do this is to use a Linear Congruential Generator which ends up forming a code pattern like this:

r0 = (state * A + C) % M;
r1 = (r0 * A + C) % M;
r2 = (r1 * A + C) % M;
r3 = (r2 * A + C) % M;
state = r3;

This forms a dependency chain where each random number requires knowledge of the previous one and could end up leading to pipeline issues (this is really dependent on how the compiler re-orders the code and the architecture of the platform). In the code generated on my machine, this turned out to be a bottleneck. We could get around the pipeline issues by creating four different LCGs:

r0 = (state0 * A0 + C0) % M;
r1 = (state1 * A1 + C1) % M;
r2 = (state2 * A2 + C2) % M;
r3 = (state3 * A3 + C3) % M;
state0 = r0;
state1 = r1;
state2 = r2;
state3 = r3;

But this will require more registers to store the different LCG states and possibly more registers or code memory to hold all the new coefficients. It would be nice to have something a bit simpler…

LCGs are really cool toys. We can choose our A_n and C_n values in such a way that the pseudo-random sequence has period M. Given the algorithm generates the next value from the previous, this must mean that the random sequence contains every integer between 0 and M-1. The first part of the idea, is to modify the code as:

r0 = (state0 * A0) % M;
r1 = (state1 * A1) % M;
r2 = (state2 * A2) % M;
r3 = (state3 * A3) % M;
state0 = (r0 + C0 % M) % M;
state1 = (r1 + C1 % M) % M;
state2 = (r2 + C2 % M) % M;
state3 = (r3 + C3 % M) % M;

This produces a different sequence in r_0 to r_3 because the accumulation now happens at the state assignment, but still has a period M. Now for the dodgy part, we change the algorithm to this:

r0 = (state0 * A0) % M;
r1 = (state0 * A1) % M;
r2 = (state0 * A2) % M;
r3 = (state0 * A3) % M;
state0 = (r0 + C0 % M) % M;

Where C_0 = 1, M = 2^{32} and A_0 to A_3 are all different but satisfy the required conditions for an LCG to work. It should not be difficult to infer that r_0 to r_3 still all have period M – but produce different output sequences. I wrote a test program to look at the correlation of these sequences over their period which can be found on GitHub here – they are mostly independent over their period. I doubt they make great random numbers, but for performing dither, they are completely fine. The generated code seems pretty good also:

sampletest[0x10000a99f] <+3503>: imull  $0xd688014d, %r8d, %edi   ; imm = 0xD688014D 
sampletest[0x10000a9a6] <+3510>: imull  $0xdb71f7bd, %r8d, %r15d  ; imm = 0xDB71F7BD 
sampletest[0x10000a9ad] <+3517>: imull  $0xe05f354d, %r8d, %esi   ; imm = 0xE05F354D 
sampletest[0x10000a9b4] <+3524>: imull  $0xe54c7f35, %r8d, %ecx   ; imm = 0xE54C7F35 
sampletest[0x10000a9bb] <+3531>: movss  (%r11,%rdx,2), %xmm2      ; xmm2 = mem[0],zero,zero,zero 
sampletest[0x10000a9c1] <+3537>: mulss  %xmm1, %xmm2
sampletest[0x10000a9c5] <+3541>: movss  (%r10,%rdx,2), %xmm3      ; xmm3 = mem[0],zero,zero,zero 
sampletest[0x10000a9cb] <+3547>: mulss  %xmm1, %xmm3
sampletest[0x10000a9cf] <+3551>: leal   0x1(%rdi), %r8d
sampletest[0x10000a9d3] <+3555>: cvttss2si %xmm2, %r12
sampletest[0x10000a9d8] <+3560>: cvttss2si %xmm3, %r13
sampletest[0x10000a9dd] <+3565>: addq   %rdi, %rsi
sampletest[0x10000a9e0] <+3568>: addq   %r12, %rsi
sampletest[0x10000a9e3] <+3571>: addq   %r15, %rcx
sampletest[0x10000a9e6] <+3574>: addq   %r13, %rcx
sampletest[0x10000a9e9] <+3577>: shrq   $0x21, %rsi
sampletest[0x10000a9ed] <+3581>: shrq   $0x21, %rcx
sampletest[0x10000a9f1] <+3585>: movl   %edx, %edi
sampletest[0x10000a9f3] <+3587>: movw   %si, (%rax,%rdi,2)
sampletest[0x10000a9f7] <+3591>: leal   0x1(%rdx), %esi
sampletest[0x10000a9fa] <+3594>: movw   %cx, (%rax,%rsi,2)
sampletest[0x10000a9fe] <+3598>: addq   $0x2, %rdx
sampletest[0x10000aa02] <+3602>: decl   %ebx
sampletest[0x10000aa04] <+3604>: jne    0x10000a99f

See the code in the original commit for a better understanding of what’s going on. There is a hefty comment explaining the fixed-point dithering process.

Given a vector: find N orthogonal vectors.

When writing eigen solvers, it’s useful to be able to generate an orthonormal basis containing a particular vector. Another way of saying this is: given some fixed vector, find other vectors that are perpendicular to it and each other. I am under the impression from some websites that this is also useful in computer graphics (but I know nothing about that – let me know if you do!).

I’ve seen people ask how to do this on the internet (for example: here and here) and in general the recommended process is:

  • If only one perpendicular vector is required, use a heuristic to come up with something perpendicular (see previous links).
  • If two perpendicular vectors are required, create one perpendicular vector using the above heuristic then use the cross-product to come up with the final vector.
  • For more than two perpendicular vectors, generate random vectors and use Gram Schmidt to (hopefully) orthogonalise them.
  • For 3D graphics, see the “Update” heading below…

The first two usually require branching in the heuristic. The second may not work (if you’re really really unlucky) and requires multiple normalisation operations. Follows is an approach which I haven’t seen documented elsewhere which requires one square root for an arbitrary dimension vector set.

What we want is some technique which creates an orthogonal matrix from a vector. This is exactly what the Householder transformation matrix does:

    \[ \mathbf{P} = \mathbf{I} - 2 \mathbf{v} \mathbf{v}^* \]

Where \mathbf{v} is some unit vector. The matrix \mathbf{P} is hermitian and unitary and is hence orthonormal. If we can find a simple expression for \mathbf{v} which will force the first column of \mathbf{P} to be equal to some unit vector \mathbf{q}, we have all the information we need to create a basis containing \mathbf{q}. This turns out to be trivial:

    \[ \mathbf{q} = \left( \mathbf{I} - 2 \mathbf{v} \mathbf{v}^* \right) \mathbf{e}_1 \]

    \[ \mathbf{q} = \mathbf{e}_1 - 2 \mathbf{v} \mathbf{v}^* \mathbf{e}_1 \]

    \[ \frac{\mathbf{e}_1 - \mathbf{q}}{2} = \mathbf{v} v^*_1 \]

Explicitly, this looks like:

    \[ \begin{pmatrix} \frac{1 - q_1}{2} \\ -\frac{q_2}{2} \\ -\frac{q_3}{2} \\ \vdots \\ -\frac{q_N}{2} \end{pmatrix} = v^*_1 \begin{pmatrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_N \end{pmatrix} \]

If \mathbf{q} is real (i.e. v^*_1 = v_1 ), we set:

    \[ v_n = \left\{   \begin{array}{@{}ll@{}}     \sqrt{\frac{1 - q_1}{2}}, & \text{if}\ n = 1 \\     -\frac{q_n}{2 v_1}, & \text{if}\ n > 1   \end{array}\right. \]

Here is a very inefficient C implementation (the square root is not actually necessary – this can be seen in the fact that two v_n entries are always multiplied together and the term in the square root is constant – also the most inner loop over j should start from 1 because when j==0, the value is simply q_{i+1} ):

/* The first n values in m is a normalised vector which is desired to be
 * part of the basis set. Following this vector, n-1 new vectors of length n
 * will be added which are orthogonal to it. */
void house_ortho(double *m, unsigned n)
    const double x = sqrt(1.0 - m[0]);
    /* As m[0] approaches 1, x approaches 0. This detects that and will end
     * up forcing the result to be the identity matrix. */
    const double s = (x < 1e-20) ? 0.0 : (-1.0 / x);
    unsigned i, j;
    /* We first convert the m vector into the v vector. Note that we have
     * incorporated the multiply by 2 present in the Householder operation
     * into v by multiplying every element by sqrt(2). */
    for (m[0] = x, i = 1; i < n; i++)
        m[i] *= s;
    /* Create all of the orthogonal vectors using the definition of the
     * Householder transformation matrix: I - 2 v v*. */
    for (i = 1; i < n; i++) {
        for (j = 0; j < n; j++) {
            m[i*n+j] = ((i == j) ? 1.0 : 0.0) - m[i] * m[j];
    /* Convert the v vector back into m. Probably could have written this
     * code in a way where this didn't need to happen. */
    for (m[0] = 1.0 - x * x, j = 1; j < n; j++)
        m[j] = -m[j] * x;

The termination condition for the for loop (over i) in the middle of the code sample could be modified to generate a smaller number of vectors.

This process could be modified to work with a complex valued \mathbf{q} but is slightly more complicated because \mathbf{P} is hermitian (which implies that all values on the diagonal are real). The modification amounts to finding a unit vector which forces q_1 to be real, multiplying \mathbf{q} by this unit vector then dividing all resultant vectors by the factor. I haven’t tested this, but it should work.


I posted a link to this blog on Twitter and reached out to see if anyone else had used this method. Warren Moore helpfully responded and pointed me at this paper “Building an Orthonormal Basis, Revisited” (Pixar) which describes numerical and implementation improvements to an algorithm proposed in this paper “Building an Orthonormal Basis from a 3D Unit Vector Without Normalization” (Frisvad, J. R.).

The algorithm which I’ve described in terms of finding the vector v which creates the Householder matrix P with the first column equal to some unit vector q, ends up producing exactly the same equations (given a vector of dimension 3) given in Frisvad’s paper up to a few sign changes – albeit I’ve gone about defining the problem in a slightly different way. We can demonstrate the equivalence by expanding out the resulting householder matrix given the v vector we derived earlier:

    \[ v_n = \left\{   \begin{array}{@{}ll@{}}     \sqrt{\frac{1 - q_1}{2}}, & \text{if}\ n = 1 \\     -\frac{q_n}{2 v_1}, & \text{if}\ n > 1   \end{array}\right. \]

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & 1 - 2 v_2 v_2 & -2 v_2 v_3 \\  q_3 & -2 v_2 v_3 & 1 - 2 v_3 v_3 \\ \end{pmatrix} \]

Note: we don’t actually need v_1 because the householder matrix is symmetric. The matrix can be expressed purely in terms of q as:

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & 1 - \frac{q^2_2}{1-q_1} & -\frac{q_2 q_3}{1-q_1} \\  q_3 & -\frac{q_3 q_2}{1-q_1} & 1 - \frac{q^2_3}{1-q_1} \\ \end{pmatrix} \]

The Pixar paper addresses the singularity which occurs as q_1 tends towards 1. We can do the same thing here by negating the definition of the Householder matrix which still produces a matrix with the same properties. i.e.

    \[ \mathbf{P} = 2 \mathbf{v} \mathbf{v}^* - \mathbf{I} \]

In this case, the vector v is now defined as:

    \[ v_n = \left\{   \begin{array}{@{}ll@{}}     \sqrt{\frac{1 + q_1}{2}}, & \text{if}\ n = 1 \\     \frac{q_n}{2 v_1}, & \text{if}\ n > 1   \end{array}\right. \]

Which moves the singularity to occur as q_1 tends towards -1. The matrix still looks very similar:

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & \frac{q^2_2}{1+q_1} - 1 & \frac{q_2 q_3}{1+q_1} \\  q_3 & \frac{q_3 q_2}{1+q_1} & \frac{q^2_3}{1+q_1} - 1 \\ \end{pmatrix} \]

Efficient implementation of 3-dimension basis finder

As shown in the Pixar paper, there is no need to test the sign of q_1 and pick a code path to follow (which would create a branch prediction performance hit) – we can simply extract the sign and use it as a multiplier to flip signs where they need to be flipped (if they need to be flipped).

Modifying some signs in the explicit matrices given previously, we can re-write the matrices as:

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & \frac{q^2_2}{q_1-1} + 1 & \frac{q_2 q_3}{q_1-1} \\  q_3 & \frac{q_3 q_2}{q_1-1} & \frac{q^2_3}{q_1-1} + 1 \\ \end{pmatrix} \]

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & \frac{q^2_2}{q_1+1} - 1 & \frac{q_2 q_3}{q_1+1} \\  q_3 & \frac{q_3 q_2}{q_1+1} & \frac{q^2_3}{q_1+1} - 1 \\ \end{pmatrix} \]

Or more simply:

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & \frac{q^2_2}{q_1+s} - s & \frac{q_2 q_3}{q_1+s} \\  q_3 & \frac{q_3 q_2}{q_1+s} & \frac{q^2_3}{q_1+s} - s \\ \end{pmatrix} \]


    \[ s = \left\{   \begin{array}{@{}ll@{}}     1, & \text{if}\ q_1 > 0 \\     -1, & \text{if}\ q_1 < 0 \\     \pm 1, & \text{if}\ q_1 = 0   \end{array}\right. \]

And here’s the C implementation:

/* The first three values in v are a three point unit vector. The
 * function produces two more three point vectors of the same coordinate
 * space which are orthogonal. */
void ortho3(float *v)
    float q1        = v[0];
    float q2        = v[1];
    float q3        = v[2];
    float sign      = copysignf(1.0f, q1);
    float scale     = 1.0f / (q1 + sign);
    float q2scale   = q2 * scale;
    float q3scale   = q3 * scale;
    float q3q2scale = q2scale * q3;
    v[3]            = q2;
    v[4]            = q2 * q2scale - sign;
    v[5]            = q3q2scale;
    v[6]            = q3;
    v[7]            = q3q2scale;
    v[8]            = q3 * q3scale - sign;