Nick Appleton’s blog and stuff - Understanding the Levinson-Durbin Algorithm

Introduction

I’ve recently been playing with this algorithm and wanted to do a write-up of my understanding on how it works. Better references certainly exist, but I wanted to write down how my brain thinks about the algorithm if for no other reason than for my own future reference.

I’m going to focus on the algorithm and explain how to build it - it won’t be the most efficient version, but will hopefully demonstrate the concepts.

Toeplitz systems

The Levinson-Durbin algorithm finds the solution to a Toeplitz system which has a non-zero main diagonal:

\[ \begin{array}{cc} \mathbf{T}\mathbf{x} = \mathbf{y} \\ \left( \begin{array}{ccccc} t_0 & t_1 & t_2 & \dots & t_{n-1} \\ t_{-1} & t_0 & t_1 & \ddots & \vdots \\ t_{-2} & t_{-1} & t_0 & \ddots & t_2 \\ \vdots & \ddots & \ddots & \ddots & t_1 \\ t_{-(n-1)} & \dots & t_{-2} & t_{-1} & t_0 \end{array} \right) \mathbf{x} = \mathbf{y} \end{array} \]

Toeplitz systems arise frequently in signal processing problems and, whenever I see them, I think “convolution”. For example, given a discrete convolution:

\[ y[n] = \sum_k a[k] b[n-k] \]

Where \(a[k]\) is zero outside of the range \(0\leq k \leq N_a\) and \(b[k]\) is zero outside of the range \(0\leq k \leq N_b\), we can write the equation as a Toeplitz matrix vector product:

\[ \left( \begin{array}{c} y_0 \\ y_1 \\ y_2 \\ y_3 \\ \vdots \end{array}{} \right) = \left( \begin{array}{ccccc} a_0 & 0 & 0 & 0 & \dots \\ a_1 & a_0 & 0 & 0 & \ddots \\ a_2 & a_1 & a_0 & 0 & \ddots \\ a_3 & a_2 & a_1 & a_0 & \ddots \\ \vdots & \ddots & \ddots & \ddots & \ddots \end{array} \right) \left( \begin{array}{c} b_0 \\ b_1 \\ b_2 \\ b_3 \\ \vdots \end{array} \right) \]

Toeplitz matrix multiplications can be implemented using convolutions!

The algorithm

Start by solving a 2x2 Toeplitz system using Cramer’s rule:

\[ \left( \begin{array}{cc} t_0 & t_1 \\ t_{-1} & t_0 \end{array} \right) \left( \begin{array}{c} x_0 \\ x_1 \end{array} \right) = \left( \begin{array}{c} y_0 \\ y_1 \end{array} \right) \]

\[ \left( \begin{array}{c} x_0 \\ x_1 \end{array} \right) = \frac{1}{t_0^2 - t_1 t_{-1}}\left( \begin{array}{cc} t_0 & -t_1 \\ -t_{-1} & t_0 \end{array} \right) \left( \begin{array}{c} y_0 \\ y_1 \end{array} \right) \]

To move to higher order systems, let’s define some new quantities:

\(\mathbf{T}_n\) is the \(n\)-by-\(n\) Topelitz matrix that is taken as the first \(n\) rows and columns of a larger Toeplitz matrix. In the above 2x2 example:

\[ \begin{array}{rl} \mathbf{T}_2 &= \left( \begin{array}{cc} t_0 & t_1 \\ t_{-1} & t_0 \end{array} \right) \\ \mathbf{T}_1 &= t_0 \end{array} \]

\(\mathbf{u}_n\) is the \(n\)-row column vector that when multiplied by \(\mathbf{T}_n\) produces \(\mathbf{e}_1\) (i.e. the column vector which has a 1 in the top row and 0 in all other rows). See \(\mathbf{u}\), think upper - it is orthogonal to all but the top row of \(\mathbf{T}_n\).

\[ \mathbf{u}_2 = \frac{1}{t_0^2 - t_1 t_{-1}} \left( \begin{array}{c} t_0 \\ -t_{-1} \end{array} \right) \]

\(\mathbf{l}_n\) is the \(n\)-row column vector that when multiplied by \(\mathbf{T}_n\) produces \(\mathbf{e}_n\) (i.e. the column vector with a 1 in the bottom row and 0 in all other rows). See \(\mathbf{l}\), think lower - it is orthogonal to all but the bottom row of \(\mathbf{T}_n\).

\[ \mathbf{l}_2 = \frac{1}{t_0^2 - t_1 t_{-1}} \left( \begin{array}{c} -t_1 \\ t_0 \end{array} \right) \]

\(\mathbf{y}_n\) is the top \(n\) rows of the column vector \(\mathbf{y}\) being solved.
\(\mathbf{x}_n\) is the top \(n\) rows of the column vector that satisfies \(\mathbf{T}_n \mathbf{x}_n\) = \(\mathbf{y}_n\).

Let’s go ahead and define a \(\mathbf{T}_3\) matrix now:

\[ \begin{array}{rl} \mathbf{T}_3 &= \left( \begin{array}{ccc} t_0 & t_1 & t_2 \\ t_{-1} & t_0 & t_1 \\ t_{-2} & t_{-1} & t_0 \end{array} \right) \\ &= \left( \begin{array}{c|c} \mathbf{T}_2 & \begin{array}{c} t_2 \\ t_1 \end{array} \\ \hline \begin{array}{cc} t_{-2} & t_{-1} \end{array} & t_0\end{array} \right) \\ &= \left( \begin{array}{c|c} t_0 & \begin{array}{cc} t_{1} & t_{2} \end{array} \\ \hline \begin{array}{c} t_{-1} \\ t_{-2} \end{array} & \mathbf{T}_2 \end{array} \right) \end{array} \]

Notice that \(\mathbf{T}_n\) can be expressed in two different by equivalent ways in terms of \(\mathbf{T}_{n-1}\) with two new variables \(t_{n-1}\) and \(t_{-(n-1)}\) inserted into the matrix (all the other new elements are already present in \(\mathbf{T}_{n-1}\)). All \(\mathbf{T}_n\) matricies contain \(t_0\) on the diagonal.

From here we derive three new quantities \(\mathbf{\hat y}_3\), \(\mathbf{\hat u}_3\) and \(\mathbf{\hat l}_3\), and which are based on \(\mathbf{x}_2\), \(\mathbf{u}_2\) and \(\mathbf{l}_2\) that we defined earlier.

\[ \begin{array}{rl} \mathbf{\hat y}_3 &= \mathbf{T}_3 \left( \begin{array}{c} \mathbf{x}_2 \\ 0 \end{array} \right) = \left( \begin{array}{c|c} \mathbf{T}_2 & \begin{array}{c} t_2 \\ t_1 \end{array} \\ \hline \begin{array}{cc} t_{-2} & t_{-1} \end{array} & t_0\end{array} \right) \left( \begin{array}{c} \mathbf{x}_2 \\ 0 \end{array} \right) = \left( \begin{array}{c} \mathbf{y}_2 \\ r_3 \end{array}\right) \\ \mathbf{\hat u}_3 &= \mathbf{T}_3 \left( \begin{array}{c} \mathbf{u}_2 \\ 0 \end{array} \right) = \left( \begin{array}{c|c} \mathbf{T}_2 & \begin{array}{c} t_2 \\ t_1 \end{array} \\ \hline \begin{array}{cc} t_{-2} & t_{-1} \end{array} & t_0\end{array} \right) \left( \begin{array}{c} \mathbf{u}_2 \\ 0 \end{array} \right) = \left( \begin{array}{c} \mathbf{e}_1 \\ \alpha_3 \end{array}\right) \\ \mathbf{\hat l}_3 &= \mathbf{T}_3 \left( \begin{array}{c} 0 \\ \mathbf{l}_2 \end{array} \right) = \left( \begin{array}{c|c} t_0 & \begin{array}{cc} t_{1} & t_{2} \end{array} \\ \hline \begin{array}{c} t_{-1} \\ t_{-2} \end{array} & \mathbf{T}_2 \end{array} \right) \left( \begin{array}{c} 0 \\ \mathbf{l}_2 \end{array} \right) = \left( \begin{array}{c} \beta_3 \\ \mathbf{e}_2 \end{array}\right) \end{array} \]

\(\mathbf{\hat u}_3\) is almost \(\mathbf{e}_1\) except for the annoying probably-non-zero entry \(\alpha_3\) in the bottom row and \(\mathbf{\hat l}_3\) is almost \(\mathbf{e}_3\) except for a probably-non-zero entry \(\beta_3\) at the top row. These scalars have values of:

\[ \begin{array}{rl} \alpha_3 &= \left(\begin{array}{cc}t_{-2}&t_{-1}\end{array}\right) \mathbf{u}_2 \\ \beta_3 &= \left(\begin{array}{cc}t_{1}&t_{2}\end{array}\right) \mathbf{l}_2 \end{array} \]

It is a straight-forward to show that for \(n\), \(\mathbf{\hat u}_n\) and \(\mathbf{\hat l}_n\) have values of:

\[ \begin{array}{rll} \mathbf{\hat u}_n &= \mathbf{T}_n \left( \begin{array}{c} \mathbf{u}_{n-1} \\ 0 \end{array} \right) &=\left( \begin{array}{c} \mathbf{e}_1 \\ \alpha_n \end{array}\right) \\ \mathbf{\hat l}_n &= \mathbf{T}_n \left( \begin{array}{c} 0 \\ \mathbf{l}_{n-1} \end{array} \right) &= \left( \begin{array}{c} \beta_n \\ \mathbf{e}_{n-1} \end{array}\right) \end{array} \]

These values are useless independently, but because of linearity in matrix multiplication, we can use \(\mathbf{\hat u}_n\) and \(\mathbf{\hat l}_n\) to obtain \(\mathbf{u}_n\) and \(\mathbf{l}_n\):

\[ \mathbf{T}_n \left( \gamma \left( \begin{array}{c} \mathbf{u}_{n-1} \\ 0 \end{array} \right) + \delta \left( \begin{array}{c} 0 \\ \mathbf{l}_{n-1} \end{array} \right) \right) = \gamma \left( \begin{array}{c} \mathbf{e}_1 \\ \alpha_n \end{array}\right) + \delta \left( \begin{array}{c} \beta_n \\ \mathbf{e}_{n-1} \end{array}\right) \]

To make the RHS \(\mathbf{e}_1\), we must solve \(\gamma + \delta \beta_n = 1\) and \(\gamma \alpha_n + \delta=0\):

\[ \begin{array}{rl} \gamma &= \frac{1}{1 - \beta_n \alpha_n} \\ \delta &= - \frac{\alpha_n}{1 - \beta_n \alpha_n} \\ \mathbf{u}_n &= \gamma \left( \begin{array}{c} \mathbf{u}_{n-1} \\ 0 \end{array} \right) + \delta \left( \begin{array}{c} 0 \\ \mathbf{l}_{n-1} \end{array} \right) \\ &= \frac{1}{1 - \beta_n \alpha_n} \left( \left( \begin{array}{c} \mathbf{u}_{n-1} \\ 0 \end{array} \right) - \alpha_n \left( \begin{array}{c} 0 \\ \mathbf{l}_{n-1} \end{array} \right) \right) \end{array} \]

To make the RHS \(\mathbf{e}_n\), we must solve \(\gamma + \delta \beta_n = 0\) and \(\gamma \alpha_n + \delta = 1\):

\[ \begin{array}{rl} \delta &= \frac{1}{1 - \beta_n \alpha_n} \\ \gamma &= - \frac{\beta_n}{1 - \beta_n \alpha_n} \\ \mathbf{l}_n &= \delta \left( \begin{array}{c} 0 \\ \mathbf{l}_{n-1} \end{array} \right) + \gamma \left( \begin{array}{c} \mathbf{u}_{n-1} \\ 0 \end{array} \right) \\ &= \frac{1}{1 - \beta_n \alpha_n} \left( \left( \begin{array}{c} 0 \\ \mathbf{l}_{n-1} \end{array} \right) - \beta_n \left( \begin{array}{c} \mathbf{u}_{n-1} \\ 0 \end{array} \right) \right) \end{array} \]

Because \(\mathbf{T}_n \mathbf{l}_n = \mathbf{e}_n\), we know that it is an orthogonal component i.e. any multiple of it may be added to \(\mathbf{x}_n\) and it will only influence the value of \(y_{n-1}\) i.e. the scalar final row of \(\mathbf{T}_n\mathbf{x}_n\).

\[ \begin{array}{rl} \mathbf{\hat y}_3 &= \mathbf{T}_3 \left( \begin{array}{c} \mathbf{x}_2 \\ 0 \end{array} \right) = \left( \begin{array}{c} \mathbf{y}_2 \\ r_3 \end{array}\right) \\ \mathbf{y}_3 &= \mathbf{T}_3 \left( \left( \begin{array}{c} \mathbf{x}_2 \\ 0 \end{array} \right) + (y_{2} - r_3) \mathbf{l}_3 \right) = \mathbf{T}_3\mathbf{x}_3 \end{array} \]

That is, use \(\mathbf{l}_3\) to remove the erroneous quantity \(r_3\) from \(\mathbf{\hat y}_3\) and set it to the desired value i.e. the last element of \(\mathbf{y}_3\) which is \(y_{2}\). \(\mathbf{x}_3\) is now:

\[ \mathbf{x}_3 = \left( \begin{array}{c} \mathbf{x}_2 \\ 0 \end{array} \right) + (y_{2} - r_3) \mathbf{l}_3 \]

The algorithm continues to replace the \(\mathbf{l}\), \(\mathbf{u}\) and \(\mathbf{x}\) vectors until the complete solution exists in \(\mathbf{x}\).

C Implementation

The following implementation follows the naming of quantities in the above working. l, u and x are first initialised to the quantities \(\mathbf{l}_2\), \(\mathbf{u}_2\) and \(\mathbf{x}_2\). The loop then updates and extends these arrays until the solution is found. u and x get a new value appended to the end for each iteration. l gets a new element pre-pended per iteration which removes the need for a reverse-access read.

/* Solves Tx = y for x where x and y are vectors and T is Toeplitz.
 *
 * `ord` is the number of rows and columns in the matrix.
 * `t` contains the toeplitz diagonal coefficients from the bottom left to the
 *   top right (it must contain 2*ord-1 elements). t[ord-1] must be non-zero.
 * `y` contains the desired output (it must contain ord elements).
 * `x` will contain the computed solution. */
static void levindurb(double *x, const double *t, const double *y, unsigned ord) {
    const int diag_offset = ord-1;
    double    u[16];
    double    l[16];
    double    tmp;
    int       i, j;

    assert(ord < 16);

    tmp      = 1.0 / (t[diag_offset] * t[diag_offset] - t[diag_offset+1] * t[diag_offset-1]);
    u[0]     = tmp * t[diag_offset];
    u[1]     = tmp * -t[diag_offset-1];
    l[ord-2] = tmp * -t[diag_offset+1];
    l[ord-1] = u[0];
    x[0]     = y[0] * u[0] + y[1] * l[ord-2];
    x[1]     = y[0] * u[1] + y[1] * l[ord-1];
    for (i = 2; i < ord; i++) {
        double alpha, beta, r, scale;
        double *p_l = &(l[ord-i]);
        for (j = 0, alpha = 0.0, beta = 0.0, r = 0.0; j < i; j++) {
            r     += x[j]   * t[diag_offset-i+j];
            alpha += u[j]   * t[diag_offset-i+j];
            beta  += p_l[j] * t[diag_offset+1+j];
        }
        scale = 1.0 / (1.0 - beta * alpha);
        for (j = 0; j <= i; j++) {
            double ut = (j < i) ? u[j]     : 0.0;
            double lt = (j > 0) ? p_l[j-1] : 0.0;
            double xt = (j < i) ? x[j]     : 0.0;
            u[j]      = scale * (ut - alpha * lt);
            p_l[j-1]  = scale * (lt - beta * ut);
            x[j]      = xt + (y[i] - r) * p_l[j-1];
        }
    }
}