Category Archives: Mathematics

An attempt at an intuitive description of the QR decomposition using Householder reflectors

I really like the Householder matrix. I’ve blogged about its ability to generate an orthonormal basis containing a particular vector in a previous blog post. This blog post is a bit of a tutorial which will use the Householder matrix to perform a QR decomposition. Applying Householder reflectors to compute a QR decomposition is nothing new, but I want this blog post to attempt to provide some intuition into how the algorithm works starting from almost nothing. We’ll briefly visit inner products, matrix multiplication, the Householder matrix and then build a QR decomposition in C. I’m not going to use formal language to define these operations unless it’s necessary. I’m going to assume that you understand things like matrix multiplication and what a matrix inverse is, but not much more than that. The post will be written assuming complex numbers are being used (the C implementation will not… maybe I will add one later).

The inner product

The inner product takes two vectors and produces a scalar. You can think of a vector as being an array of length N values representing coordinates in some N-dimensional space.

    \[ \langle \mathbf{a}, \mathbf{b} \rangle = \sum^{N}_{k=1} a_k \overline{b_k} \]

Note the bar in the above indicates conjugation. If \mathbf{a} and \mathbf{b} are row vectors, we could write the above as:

    \[ \langle \mathbf{a}, \mathbf{b} \rangle = \mathbf{a} \mathbf{b}^H \]

Where \mathbf{b}^H is the transposed conjugate of \mathbf{b}. Two vectors are said to be orthogonal if they have an inner product of zero. If the inner product of a vector with itself is equal to one, the vector is said to be a unit-vector. The inner product of a vector with a unit-vector is the proportion of that vector that is pointing in the same direction as the unit-vector i.e. if \mathbf{u} is some unit vector and I define a new vector as:

    \[ \mathbf{b} = \mathbf{a} - \langle \mathbf{a}, \mathbf{u} \rangle \mathbf{u} \]

The vector \mathbf{b} is now orthogonal to \mathbf{u} i.e. \langle\mathbf{b},\mathbf{u}\rangle = 0.

There are some good geometric visualisations of these for two and three dimensional vectors… but everything continues to work in arbitrary dimensions. The above image contains two plots which both contain two perpendicular unit vectors (red and green) and an arbitrary vector (black). In both plots we can verify the red and green vectors are unit vectors by applying the inner product. We can also verify they are orthogonal by applying the inner product. If we compute the inner product of the black vector on the two unit vectors, then multiply each of the scalar results by the corresponding unit vectors used to compute them and sum the results, we should get back the black vector.

Matrix multiplications

We can define matrix multiplication using inner products of the vectors which make up the rows of the left-hand side matrix with the vectors which make up the columns of the right-hand side matrix.

    \[ \begin{tiny} \begin{pmatrix} \langle r_0, \overline{c_0} \rangle & \langle r_0, \overline{c_1} \rangle & \dots & \langle r_0, \overline{c_{N-1}} \rangle \\ \langle r_1, \overline{c_0} \rangle & \langle r_1, \overline{c_1} \rangle & \dots & \langle r_1, \overline{c_{N-1}} \rangle \\ \vdots & \vdots &  & \vdots \\ \langle r_{M-1}, \overline{c_0} \rangle & \langle r_{M-1}, \overline{c_1} \rangle & \dots & \langle r_{M-1}, \overline{c_{N-1}} \rangle \end{pmatrix} = \begin{pmatrix} \dots & r_0 & \dots \\ \dots & r_1 & \dots \\   & \vdots &  \\ \dots & r_{M-1} & \dots \end{pmatrix} \begin{pmatrix} \vdots & \vdots &  & \vdots \\ c_0 & c_1 & \dots & c_{N-1} \\ \vdots & \vdots &  & \vdots \end{pmatrix} \end{tiny} \]

We can use this to demonstrate that a matrix that is made up of rows (or columns) that are all orthogonal to each other is trivially invertible – particularly if the vectors are unit vectors, in which case the inverse of matrix \mathbf{A} is \mathbf{A}^H. To make this particularly clear:

    \[ \begin{tiny} \begin{pmatrix} \langle r_0, r_0 \rangle & \langle r_0, r_1 \rangle & \dots & \langle r_0, r_{N-1} \rangle \\ \langle r_1, r_0 \rangle & \langle r_1, r_1 \rangle & \dots & \langle r_1, r_{N-1} \rangle \\ \vdots & \vdots &  & \vdots \\ \langle r_{N-1}, r_0 \rangle & \langle r_{N-1}, r_1 \rangle & \dots & \langle r_{N-1}, r_{N-1} \rangle \end{pmatrix} = \begin{pmatrix} \dots & r_0 & \dots \\ \dots & r_1 & \dots \\   & \vdots &  \\ \dots & r_{N-1} & \dots \end{pmatrix} \begin{pmatrix} \vdots & \vdots &  & \vdots \\ \overline{r_0} & \overline{r_1} & \dots & \overline{r_{N-1}} \\ \vdots & \vdots &  & \vdots \end{pmatrix} \end{tiny} \]

If the rows are orthogonal to each other, the inner products off the main diagonal will all be zero. If the rows are unit vectors, the diagonal entries are by definition 1 and \mathbf{I} = \mathbf{A}\mathbf{A}^H.

The Householder matrix

Define a matrix \mathbf{P} as:

    \[\mathbf{P}=\mathbf{I} - 2\mathbf{v}\mathbf{v}^H\]

Where \mathbf{v} is a unit vector. Then:

    \[\begin{array}{@{}ll@{}} \mathbf{P}\mathbf{P} &= \mathbf{I}-2\left(\mathbf{v}\mathbf{v}^H+\mathbf{v}\mathbf{v}^H\right)+4\mathbf{v}\mathbf{v}^H\mathbf{v}\mathbf{v}^H \\ &= \mathbf{I}-4\mathbf{v}\mathbf{v}^H+4\mathbf{v}\left(\mathbf{v}^H\mathbf{v}\right)\mathbf{v}^H \\ & = \mathbf{I} \end{array}\]

Thereore \mathbf{P} is its own inverse (\mathbf{P}=\mathbf{P}^{-1}) and must also be unitary (\mathbf{P}=\mathbf{P}^H).

If we can work out a method to choose \mathbf{v} such that \mathbf{P} will contain a particular unit vector \mathbf{u} and we multiply \mathbf{P} by any scaled version of vector \mathbf{u}, we will get a vector which has only one non-zero entry (because all other rows of the matrix will be orthogonal to \mathbf{u}). I have described this process before in a previous blog post but will repeat it here with some more detail. Define a vector \mathbf{u} that we want the first column of \mathbf{P} to equal:

    \[ \begin{array}{@{}ll@{}} \mathbf{u} &= \left( \mathbf{I} - 2 \mathbf{v} \mathbf{v}^H \right) \mathbf{e}_1 \\ \frac{\mathbf{e}_1-\mathbf{u}}{2} &= \mathbf{v} \overline{v_1} \end{array} \]

If u_1 is real, we can break \mathbf{v} into its individual coordinates to solve for their values:

    \[ \begin{array}{@{}ll@{}} v_1 &= \sqrt{\frac{1-u_1}{2}} \\ v_2 &= \frac{-u_2}{2\sqrt{\frac{1-u_1}{2}}} \\ &\vdots \\ v_n &= \frac{-u_n}{2\sqrt{\frac{1-u_1}{2}}} \end{array} \]

Given the way the matrix is defined, the computation of the square root is unnecessary in an implementation. This is best seen if we expand out the 2 \mathbf{v} \mathbf{v}^H matrix:

    \[ \begin{array}{@{}ll@{}} 2 \mathbf{v} \mathbf{v}^H &= \begin{pmatrix} 2 v_1 \overline{v_1} & 2 v_1 \overline{v_2} & \dots & 2 v_1 \overline{v_n} \\ 2 v_2 \overline{v_1} & 2 v_2 \overline{v_2} & \dots & 2 v_2 \overline{v_n} \\ \vdots & \vdots & \ddots & \vdots \\ 2 v_n \overline{v_1} & 2 v_n \overline{v_2} & \dots & 2 v_n \overline{v_n} \end{pmatrix} \\ &= \begin{pmatrix} 1 - u_1 & -\overline{u_2} & \dots & -\overline{u_n} \\ -u_2 & \frac{u_2 \overline{u_2}}{1-u_1} & \dots & \frac{u_2 \overline{u_n}}{1-u_1} \\ \vdots & \vdots & \ddots & \vdots \\ -u_n & \frac{u_n \overline{u_2}}{1-u_1} & \dots & \frac{u_n \overline{u_n}}{1-u_1} \end{pmatrix} \end{array} \]

Define a scalar \alpha=\frac{1}{1-u_1} and a vector \mathbf{w}:

    \[\mathbf{w} = \begin{pmatrix}1 - u_1 \\ -u_2 \\ \vdots \\ -u_n \end{pmatrix} \]

Then 2 \mathbf{v} \mathbf{v}^H = \alpha \mathbf{w} \mathbf{w}^H. This is a particularly useful formulation because we rarely want to compute \mathbf{P} in its entirity and perform a matrix mutliplication. Rather, we compute:

    \[ \mathbf{P}\mathbf{A} = \left(\mathbf{I}-\alpha\mathbf{w}\mathbf{w}^H\right)\mathbf{A}=\mathbf{A}-\alpha\mathbf{w}\mathbf{w}^H\mathbf{A} \]

Which is substantially more efficient. A final necessary change is necessary: if u_1 is close to one (which implies all other coefficients of \mathbf{u} are approaching zero), \alpha will begin to approach infinity. There is a relatively simple change here which is to recognise that \mathbf{u} and -\mathbf{u} are parallel but pointing in different directions (their inner-product is -1). If we don’t care about the direction of the vector, we can change its sign to be the most numerically suitiable for the selection of \alpha i.e. if u_1 < 0 use \mathbf{u} otherwise use -\mathbf{u}.

This is actually very useful in many applications, one of which is the QR decomposition.

The QR decomposition

The QR decomposition breaks down a matrix \mathbf{A} into a unitary matrix \mathbf{Q} and an upper-triangular matrix \mathbf{R}. There are a number of ways to do this, but we are going use the Householder matrix.

    \[\mathbf{A} = \mathbf{Q}\mathbf{R}\]

Let’s start by defining a series of unitary transformations applied to \mathbf{A} as:

    \[\begin{array}{@{}ll@{}} \mathbf{X}_1&=\mathbf{P}_N \mathbf{A} \\ \mathbf{X}_{2} &= \begin{pmatrix} \mathbf{I}_{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{P}_{N-1} \end{pmatrix} \mathbf{X}_{1} \\ \mathbf{X}_{3} &= \begin{pmatrix} \mathbf{I}_{2} & \mathbf{0} \\ \mathbf{0} & \mathbf{P}_{N-2} \end{pmatrix} \mathbf{X}_{2} \\ & \vdots \\ \mathbf{X}_{N-1} &= \begin{pmatrix} \mathbf{I}_{N-2} & \mathbf{0} \\ \mathbf{0} & \mathbf{P}_{2} \end{pmatrix} \mathbf{X}_{N-2} \end{array} \]

Where \mathbf{P}_n are all n-by-n unitary matrices. Notice that the first row of \mathbf{X}_1 will be unmodified by all the transforms that follow. The first two rows of \mathbf{X}_2 will be unmodified by all the transforms that follow – and so-on.

If we can somehow force the first row of \mathbf{P}_N to be the first column of \mathbf{A} scaled to be a unit vector (while keeping \mathbf{P}_N unitary), the first column of \mathbf{X}_1 will contain only one non-zero entry. We then set about finding a way to select \mathbf{P}_{N-1} such that the second column contains only two non-zero entries… and so on. Once the process is finished, \mathbf{X}_{N-1} is upper triangular and the unitary matrix which converts \mathbf{A} into this form can be written as:

    \[ \mathbf{Q}^H = \begin{pmatrix} \mathbf{I}_{N-2} & \mathbf{0} \\ \mathbf{0} & \mathbf{P}_{2} \end{pmatrix} \dots \begin{pmatrix} \mathbf{I}_{2} & \mathbf{0} \\ \mathbf{0} & \mathbf{P}_{N-2} \end{pmatrix}  \begin{pmatrix} \mathbf{I}_{1} & \mathbf{0} \\ \mathbf{0} & \mathbf{P}_{N-1} \end{pmatrix} \mathbf{P}_N \]

And:

    \[ \begin{array}{@{}ll@{}} \mathbf{X}_{N-1} &= \mathbf{Q}^H\mathbf{A} \\ \mathbf{Q}\mathbf{X}_{N-1}&=\mathbf{A} \end{array} \]

When we follow the above process and the \mathbf{P}_n matricies are chosen to be Householder matrices, we have performed the QR decomposition using Householder matrices.

Follows is the C code that implements the decomposition. Because the algorithm repeatedly applies transformations to \mathbf{A} to eventually arrive at \mathbf{R}, we will design an API that operates in-place on the supplied matrix. The orthogonal matrix will be built in parallel. The N-1 iterations over the i variable, correspond to the N-1 transformation steps presented earlier. I make no claims about the computational performance of this code. The numerical performance could be improved by using a different method to compute the length of the vector (first thing that happens in the column loop).

/* Perform the QR decomposition of the given square A matrix.
 * A/R is stored and written in columns i.e.
 * a[0] a[3] a[6]
 * a[1] a[4] a[7]
 * a[2] a[5] a[8]
 *
 * q is stored in rows i.e.
 * q[0] q[1] q[2]
 * q[3] q[4] q[5]
 * q[6] q[7] q[8] */
void qr(double *a_r, double *q, int N) {
	int i, j, k;
 
	/* Initialise qt to be the identity matrix. */
	for (i = 0; i < N; i++) {
		for (j = 0; j < N; j++) {
			q[i*N+j] = (i == j) ? 1.0 : 0.0;
		}
	}
 
	for (i = 0; i < N-1; i++) {
		double  norm, invnorm, alpha;
		double *ccol = a_r + i*N;
 
		/* Compute length of vector (needed to perform normalisation). */
		for (norm = 0.0, j = i; j < N; j++) {
			norm += ccol[j] * ccol[j];
		}
		norm = sqrt(norm);
 
		/* Flip the sign of the normalisation coefficient to have the
		 * effect of negating the u vector. */
		if (ccol[0] > 0.0)
			norm = -norm;
		invnorm = 1.0 / norm;
 
		/* Normalise the column
		 * Convert the column in-place into the w vector
		 * Compute alpha */
		ccol[i] = (1.0 - ccol[i] * invnorm);
		for (j = i+1; j < N; j++) {
			ccol[j] = -ccol[j] * invnorm;
		}
		alpha = 1.0 / ccol[i];
 
		/* Update the orthogonal matrix */
		for (j = 0; j < N; j++) {
			double acc = 0.0;
			for (k = i; k < N; k++) {
				acc += ccol[k] * q[k+j*N];
			}
			acc *= alpha;
			for (k = i; k < N; k++) {
				q[k+j*N] -= ccol[k] * acc;
			}
		}
 
		/* Update the upper triangular matrix */
		for (j = i+1; j < N; j++) {
			double acc = 0.0;
			for (k = i; k < N; k++) {
				acc += ccol[k] * a_r[k+j*N];
			}
			acc *= alpha;
			for (k = i; k < N; k++) {
				a_r[k+j*N] -= ccol[k] * acc;
			}
		}
 
		/* Overwrite the w vector with the column data. */
		ccol[i] = norm;
		for (j = i+1; j < N; j++) {
			ccol[j] = 0.0;
		}
	}
}

The algorithm follows almost everthing described on this page – albiet with some minor obfuscations (the \mathbf{w} vector is stored temporarily in the output matrix to avoid needing any extra storage). It’s worth mentioning that the first step of initialising the \mathbf{Q} matrix to \mathbf{I} isn’t necessary. The initialisation could be performed explicitly in the first iteration of the loop over i – but that introduces more conditional code.

Release Alignment in Sampled Pipe Organs – Part 1

At the most basic level, a sample from a digital pipe organ contains:

  • an attack transient leading into
  • a looped sustain block and
  • a release which will be cross-faded into when the note is released.

The release cross-fade must be fast (otherwise it will not sound natural or transient details may be lost) and it must also be phase-aligned to the point where the cross-fade begins.

The necessity for phase alignment

Without phase aligning the release, disturbing artefacts will likely be introduced. The effects are different with short and long cross-fades but are always unpleasant.

The following image shows an ideal cross-fade into a release sample. The crossfade begins at 0.1 seconds and lasts for 0.05 seconds. The release is aligned properly and the signal looks continuous.

A good crossfade into a release.

A good crossfade into a release.

The following image shows a bad release where the cross-fade is lagging an ideal release offset by half-a-period. Some cancellation occurs during the cross-fade and the result will either sound something like a “pluck” for long cross-fades or a “click” for short cross-fades.

A worst-case crossfade into a release.

A worst-case crossfade into a release.

(The cross-fade used in generating the above data sets was a raised cosine – linear cross-fades can be used but will result in worse distortions).

The problem of aligning release cross-fades in virtual pipe organs is an interesting one. As an example: at the time of writing this article, release alignment in the GrandOrgue project is not particularly good; it uses a lookup-table taking the value and first-order estimated derivative (both quantised heavily) of the last sample of the last played block as keys. This is not optimal as a single sample says nothing about phase and the first-order derivative estimate could be completely incorrect in the presence of noise.

Another approach for handling release alignment

If the pitch a pipe was to be completely stable, known (f=\frac{1}{T}) and we knew one point where the release was perfectly aligned (t_r), we know that we could cross-fade into the start of the release at:

    \[ \forall n \in \mathbb Z, t = t_r + T n \]

Hence, for any sample offset we could compute an offset into the release to cross-fade into.

In reality, pipe pitch wobbles around a bit and so the above would not strictly hold all the time – that being said, it is true for much of the time. If we could take a pipe sample and find all of the points where the release is aligned we could always find the best way to align the release.

It turns out that a simple way to do this is to find the cross-correlation of the attack and sustain segment with a short portion of the release. Taking the whole release would be problematic because as it decays it becomes less similar to the sustaining segment (which leads to an unhelpful correlation signal).

The first 25000 samples of the signal used for the cross-correlation.

The first 25000 samples of the signal used for the cross-correlation.

The above image shows the attack and some sustain of bottom-C of the St. Augustine’s Closed Horn. This shows visually why single sample amplitude and derivative matching is a poor way to align releases. During one period of the closed horn, there are 14 zero crossings and 16 obvious zero crossings in the derivative. One sample gives hardly enough information.

A 1024 sample cut from the start of the release.

A 1024 sample cut from the start of the release.

The above image shows a 1024 sample segment taken from the release marker of the same Closed Horn sample. It contains just over a single period of the horn.

The next image shows the cross-correlation of this release segment with the sample itself. My analysis program does correlation of the left and right channels and sums them to provide an overall correlation. Positive maximums correspond to points where the release will phase-align well. Minimums correspond to points where the signal has the least correlation to the release.

Normalised cross correlation of the signal with the release segment.

Normalised cross correlation of the signal with the release segment.

Using the correlation and a pitch guesstimate, we could construct a function which given any sample offset in the attack/sustain could produce an offset into the release which we should cross-fade into. This is for next time.

Understanding the Modified Discrete Cosine Transform (MDCT)

After playing around with discrete cosine transforms, I thought I would implement an MDCT and document my understanding of how everything works. I use some similar techniques to those used on the Wikipedia page as they are helpful for understanding but will add some scanned drawings which I think help (I’m not even close to being clever enough to get a computer to draw these for me).

Prerequisites

The only real background knowledge which I think is relevant to understanding the MDCT is the data extensions which the DCT-4 transform assumes.

First DCT-4 Basis Function with Shifted 2N Sample Input

First DCT-4 Basis Function with Shifted 2N Sample Input

I’ll refer to the above image in the Forward Transform overview, but for the mean time, only pay attention to the solid quarter wave. This is the first basis function (i.e. k=0 ) for an N length DCT-4. If the basis is continued past N, it has a repeating symmetrical pattern (the dashed line in the image) which repeats every 4N. The symmetry is even around -0.5 and odd around N-0.5 and holds for every basis function of the DCT-4. i.e. The DCT-4 assumes that the input data continues on forever, repeating itself in the following manner: x_n, -x_{N-n-1}, -x_n, x_{N-n-1}.

Forward Transform

The MDCT takes in 2N real data points and produces N real outputs. These inputs are designed to overlap, so the first half of the input data should be the second half of the input data of the previous call. The definition is:

    \[ X_k = \displaystyle\sum\limits_{n=0}^{2N-1} x_n \cos \frac{ \pi \left( n + 0.5 + N/2 \right) \left( k + 0.5 \right) }{N} \]

It should be trivial to see from the above that the MDCT can be computed using a DCT-4 with an extended number of input data points, all of which have been shifted by half a basis. Go back to the crappy drawing and notice the concatenated N/2 length sequences a, b, c and d. The total length of this sequence is 2N and begins at N/2 (or half the length of a basis function). We need to get b, c and d back into the N point region if we want to compute the MDCT using a DCT-4, this can be achieved with the following concatenated sequence (I will subscript these sequences with r to denote a reversal of the sequence):

    \[ - c_r - d , a - b_r \]

If we take the DCT-4 of this concatenated sequence, we have found the MDCT of the input sequence.

Inverse Transform

The inverse MDCT or IMDCT takes in N real data points and produces 2N real outputs. In this transform, the outputs should overlap such that the first half of the output should be added to the second half of the output data in the previous call. The definition is:

    \[ x_n = \frac{1}{N} \displaystyle\sum\limits_{n=0}^{N-1} X_k \cos \frac{ \pi \left( n + 0.5 + N/2 \right) \left( k + 0.5 \right) }{N} \]

Because we know how the DCT-4 assumes the input and output data repeats in a symmetric pattern, we can get this data trivially in exactly the same fashion as we did in the forward transform. In the following Illustration, we take the output from the forward transform and extend it along the basis:

Extended Projection of the MDCT Output on the First DCT-4 Basis

Extended Projection of the MDCT Output on the First DCT-4 Basis

In output row zero, we can see how to extend the input sequence to obtain the 2N points required. We then see in rows two and three how summing the overlapping blocks causes the aliased sequences to cancel in subsequent calls to the IMDCT.

World’s Dumbest C MDCT Implementation

I validated all this actually works with a small C program. Follows are the MDCT/IMDCT implementations I came up with… ignore the “twid” input, I cache the modulation factors for the FFT which gets called in the dct4 routine:

/* state should contain double the number of elements as the input buffer (N)
 * and should have all elements initialized to zero prior to calling. The
 * output buffer is actually the first N elements of state after calling. */
void mdct(double *state, const double *input, double *twid, unsigned lenbits)
{
    unsigned rl = 1u << lenbits;
    unsigned i;
    /* Alias the input data with the previous block. */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = - input[rl/2+i]      - input[rl/2-i-1];
        state[rl/2+i]   =   state[rl+i]        - state[rl+rl-i-1];
    }
    /* Save the input block */
    for (i = 0; i < rl; i++)
        state[rl+i]     = input[i];
    /* DCT-4 */
    dct4(state, lenbits, twid);
}
 
/* state should contain double the number of elements as the input buffer (N)
 * and should have all elements initialized to zero prior to calling. The
 * output buffer is actually the first N elements of state after calling. */
void imdct(double *state, const double *input, double *twid, unsigned lenbits)
{
    unsigned rl = 1u << lenbits;
    unsigned i;
    /* Collect contributions from the previous frame to the output buffer */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = - state[rl+rl/2-i-1];
        state[rl/2+i]   = - state[rl+i];
    }
    /* Load the input and run the DCT-4 */
    for (i = 0; i < rl; i++)
        state[rl+i]     = input[i];
    dct4(state + rl, lenbits, twid);
    /* Sum contributions from this frame to the output buffer and perform the
     * required scaling. */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = (state[i]      + state[rl+rl/2+i]) / rl;
        state[rl/2+i]   = (state[rl/2+i] - state[rl+rl-i-1]) / rl;
    }
}

Windowed MDCT Implementation

Typical MDCT implementations will window the input and output data (this can also be thought of as windowing the basis functions – which I think is a more helpful way to understand what is happening). It is really important to note that the window function must be carefully chosen to ensure that the basis functions remain orthogonal! The window makes the basis functions always begin and end near zero. The process has the side effect of de-normalising the basis functions (unless the window is rectangular) and means there will be a window-dependent scaling factor which will need to be applied at the output to achieve perfect reconstruction. The following images show the second basis function of the MDCT both un-windowed and windowed with a half-sine window (given at the end of the post).

Second MDCT Basis Function

Second MDCT Basis Function

Sine Windowed Second MDCT Basis Function

Sine Windowed Second MDCT Basis Function

In a lossy codec this windowing process is somewhat necessary because if the start and end points are not close to zero, the output is likely to periodically glitch for even the slightest errors in the reconstructed MDCT data. This glitching will occur at the boundaries of the transform (i.e. every N points).

We can work out the necessary conditions for the window to obtain perfect reconstruction using the previous drawings (I’d steer away from equations for this one – it’s easier to validate the results visually) by applying a window function split into 4 segments to each of the input blocks. I’ll do the generic case for a symmetrical window which is applied to both the input and the output. We split the window (which has a length 2N ) into four segments which will be applied to our original input segments a, b, c and d. Because we are defining this window to be symmetric, we can call the pieces:

    \[ u, v, v_r, u_r \]

Symmetrical Window Impact on MDCT

Symmetrical Window Impact on MDCT

The above illustration shows how our window segments are applied to the input data and the impact that has on the DCT-4 analysed data blob. Following that is the output segments from two sequential IMDCT calls with the windows applied to the output here as well.

We need to make the overlapping terms equal the required output segment i.e.

    \[ c = v_r \left( d_r u + c v_r \right) + u \left( c u - d_r v_r \right) \]

    \[ d = u_r \left( d u_r + c_r v \right) + v \left( d v - c_r u_r \right) \]

It is clear from the above that the necessary condition to achieve reconstruction is for v_r^2 + u^2 = 1 (which implies in this case that v^2 + u_r^2 = 1 must also be true).

A simple solution to this is:

    \[ w_n = \sin \frac{ \pi \left( n + 0.5 \right) }{2N} \]

The output requires a scaling factor of 2 for this window.

Derivation of fast DCT-4 algorithm based on DFT

It’s well known that an N point DCT-4 can be computed using an N/2 point complex FFT. Although the algorithm is widespread, the texts which I have read on the subject have not provided the details as to how it works. I’ve been trying to understand this for a while (I tend not to use algorithms until I understand them) and finally figured it out and thought I would share (please provide comments/links if you can find a better or shorter explanation – this is as good as I could get).

Start with the definition:

    \[ X_k = \displaystyle\sum\limits_{n=0}^{N-1} x_n \cos \frac{ \pi \left( n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N} \]

Trig functions are hard to manipulate in expressions (for me anyway), so knowing that x_n is real, we change the \cos into a complex exponential and obtain the following:

    \[ X_k = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N-1} x_n \mathrm{e}^{\frac{\mathrm{j} \pi \left( n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} \right\} \]

It is worth noting that the sign of the exponent is irrelevant.

We then split the expression up to operate over two half-length sequences composed of x_{2n} and x_{N-1-2n}. The second sequence is reversed and decimated because when the n is replaced by N-1-2n in the n + \frac{1}{2} term of the exponential, the expression is negated rather than the offset being modified. We move the N term into another exponential which becomes trivial as N cancels the denominator. i.e.:

    \[ X_k = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} x_{2n} \mathrm{e}^{\frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} + x_{N-1-2n} \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} \mathrm{e}^{\mathrm{j} \pi \left( k + \frac{1}{2} \right)} \right\} \]

Or:

    \[ X_k = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} x_{2n} \mathrm{e}^{\frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} + \mathrm{j} {\left( -1 \right)}^{k} x_{N-1-2n} \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} \right\} \]

The exponentials still contain no term which resembles an DFT (over length N/2 anyway). To get one, we break up X_k into the terms X_{2k} and X_{N-1-2k}. Again we choose to reverse the second sequence because it will keep the exponential terms in a similar but negated form.

    \[ X_{2k} = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} x_{2n} \mathrm{e}^{\frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} + \mathrm{j} x_{N-1-2n} \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

    \[ X_{N-1-2k} = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} \mathrm{j} x_{2n} \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} - x_{N-1-2n} \mathrm{e}^{\frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

Now because we are only interested in the real terms, we can ignore or conjugate anything which contributes to the imaginary term of the above expressions. This means that we can conjugate the exponentials when they are only modulating a real term. Doing this, we obtain:

    \[ X_{2k} = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} \left( x_{2n} + \mathrm{j} x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

    \[ X_{N-1-2k} = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} \left( \mathrm{j} x_{2n} - x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

Almost there, but to obtain the full benefit we need the inner terms to be the same. If we now multiply the X_{N-1-2k} term by - \mathrm{j}, we get:

    \[ X_{N-1-2k} = - \Im \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} \left( x_{2n} + \mathrm{j} x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

Done. If we expand out the X_{2k} and X_{N-1-2k} terms completely we get:

    \[ X_{2k} = \Re \left\{ \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2k + \frac{1}{2} \right) }{2N}} \displaystyle\sum\limits_{n=0}^{N/2-1} \left( x_{2n} + \mathrm{j} x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi n}{N}} \mathrm{e}^{- \frac{\mathrm{j} 2 \pi n k }{N/2}} \right\} \]

    \[ X_{N-1-2k} = - \Im \left\{ \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2k + \frac{1}{2} \right) }{2N}} \displaystyle\sum\limits_{n=0}^{N/2-1} \left( x_{2n} + \mathrm{j} x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi n}{N}} \mathrm{e}^{- \frac{\mathrm{j} 2 \pi n k }{N/2}} \right\} \]

Which give us the steps for a DFT based DCT-4 algorithm:

  • Transform the N point real sequence x_n into the N/2 point complex sequence y_n = x_{2n} + \mathrm{j} x_{N-1-2n}.
  • Multiply each element of the sequence y_n by \mathrm{e}^{- \frac{\mathrm{j} \pi n}{N}}.
  • Find Y, the DFT of the sequence y.
  • Multiply each element of Y_k by \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2k + \frac{1}{2} \right) }{2N}}.
  • The DCT-4 outputs are given by:

        \[ X_k = \left\{ \begin{array}{l l} \Re \left\{ Y_{k/2} \right\} & \quad \text{for even $k$} \\ - \Im \left\{ Y_{(N-1-k)/2} \right\} & \quad \text{for odd $k$} \end{array} \right. \]