Category Archives: Uncategorized

Given a vector: find N orthogonal vectors.

When writing eigen solvers, it’s useful to be able to generate an orthonormal basis containing a particular vector. Another way of saying this is: given some fixed vector, find other vectors that are perpendicular to it and each other. I am under the impression from some websites that this is also useful in computer graphics (but I know nothing about that – let me know if you do!).

I’ve seen people ask how to do this on the internet (for example: here and here) and in general the recommended process is:

  • If only one perpendicular vector is required, use a heuristic to come up with something perpendicular (see previous links).
  • If two perpendicular vectors are required, create one perpendicular vector using the above heuristic then use the cross-product to come up with the final vector.
  • For more than two perpendicular vectors, generate random vectors and use Gram Schmidt to (hopefully) orthogonalise them.
  • For 3D graphics, see the “Update” heading below…

The first two usually require branching in the heuristic. The second may not work (if you’re really really unlucky) and requires multiple normalisation operations. Follows is an approach which I haven’t seen documented elsewhere which requires one square root for an arbitrary dimension vector set.

What we want is some technique which creates an orthogonal matrix from a vector. This is exactly what the Householder transformation matrix does:

    \[ \mathbf{P} = \mathbf{I} - 2 \mathbf{v} \mathbf{v}^* \]

Where \mathbf{v} is some unit vector. The matrix \mathbf{P} is hermitian and unitary and is hence orthonormal. If we can find a simple expression for \mathbf{v} which will force the first column of \mathbf{P} to be equal to some unit vector \mathbf{q}, we have all the information we need to create a basis containing \mathbf{q}. This turns out to be trivial:

    \[ \mathbf{q} = \left( \mathbf{I} - 2 \mathbf{v} \mathbf{v}^* \right) \mathbf{e}_1 \]

    \[ \mathbf{q} = \mathbf{e}_1 - 2 \mathbf{v} \mathbf{v}^* \mathbf{e}_1 \]

    \[ \frac{\mathbf{e}_1 - \mathbf{q}}{2} = \mathbf{v} v^*_1 \]

Explicitly, this looks like:

    \[ \begin{pmatrix} \frac{1 - q_1}{2} \\ -\frac{q_2}{2} \\ -\frac{q_3}{2} \\ \vdots \\ -\frac{q_N}{2} \end{pmatrix} = v^*_1 \begin{pmatrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_N \end{pmatrix} \]

If \mathbf{q} is real (i.e. v^*_1 = v_1 ), we set:

    \[ v_n = \left\{   \begin{array}{@{}ll@{}}     \sqrt{\frac{1 - q_1}{2}}, & \text{if}\ n = 1 \\     -\frac{q_n}{2 v_1}, & \text{if}\ n > 1   \end{array}\right. \]

Here is a very inefficient C implementation (the square root is not actually necessary – this can be seen in the fact that two v_n entries are always multiplied together and the term in the square root is constant – also the most inner loop over j should start from 1 because when j==0, the value is simply q_{i+1} ):

/* The first n values in m is a normalised vector which is desired to be
 * part of the basis set. Following this vector, n-1 new vectors of length n
 * will be added which are orthogonal to it. */
void house_ortho(double *m, unsigned n)
{
    const double x = sqrt(1.0 - m[0]);
    /* As m[0] approaches 1, x approaches 0. This detects that and will end
     * up forcing the result to be the identity matrix. */
    const double s = (x < 1e-20) ? 0.0 : (-1.0 / x);
    unsigned i, j;
    /* We first convert the m vector into the v vector. Note that we have
     * incorporated the multiply by 2 present in the Householder operation
     * into v by multiplying every element by sqrt(2). */
    for (m[0] = x, i = 1; i < n; i++)
        m[i] *= s;
    /* Create all of the orthogonal vectors using the definition of the
     * Householder transformation matrix: I - 2 v v*. */
    for (i = 1; i < n; i++) {
        for (j = 0; j < n; j++) {
            m[i*n+j] = ((i == j) ? 1.0 : 0.0) - m[i] * m[j];
        }
    }
    /* Convert the v vector back into m. Probably could have written this
     * code in a way where this didn't need to happen. */
    for (m[0] = 1.0 - x * x, j = 1; j < n; j++)
        m[j] = -m[j] * x;
}

The termination condition for the for loop (over i) in the middle of the code sample could be modified to generate a smaller number of vectors.

This process could be modified to work with a complex valued \mathbf{q} but is slightly more complicated because \mathbf{P} is hermitian (which implies that all values on the diagonal are real). The modification amounts to finding a unit vector which forces q_1 to be real, multiplying \mathbf{q} by this unit vector then dividing all resultant vectors by the factor. I haven’t tested this, but it should work.

Update

I posted a link to this blog on Twitter and reached out to see if anyone else had used this method. Warren Moore helpfully responded and pointed me at this paper “Building an Orthonormal Basis, Revisited” (Pixar) which describes numerical and implementation improvements to an algorithm proposed in this paper “Building an Orthonormal Basis from a 3D Unit Vector Without Normalization” (Frisvad, J. R.).

The algorithm which I’ve described in terms of finding the vector v which creates the Householder matrix P with the first column equal to some unit vector q, ends up producing exactly the same equations (given a vector of dimension 3) given in Frisvad’s paper up to a few sign changes – albeit I’ve gone about defining the problem in a slightly different way. We can demonstrate the equivalence by expanding out the resulting householder matrix given the v vector we derived earlier:

    \[ v_n = \left\{   \begin{array}{@{}ll@{}}     \sqrt{\frac{1 - q_1}{2}}, & \text{if}\ n = 1 \\     -\frac{q_n}{2 v_1}, & \text{if}\ n > 1   \end{array}\right. \]

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & 1 - 2 v_2 v_2 & -2 v_2 v_3 \\  q_3 & -2 v_2 v_3 & 1 - 2 v_3 v_3 \\ \end{pmatrix} \]

Note: we don’t actually need v_1 because the householder matrix is symmetric. The matrix can be expressed purely in terms of q as:

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & 1 - \frac{q^2_2}{1-q_1} & -\frac{q_2 q_3}{1-q_1} \\  q_3 & -\frac{q_3 q_2}{1-q_1} & 1 - \frac{q^2_3}{1-q_1} \\ \end{pmatrix} \]

The Pixar paper addresses the singularity which occurs as q_1 tends towards 1. We can do the same thing here by negating the definition of the Householder matrix which still produces a matrix with the same properties. i.e.

    \[ \mathbf{P} = 2 \mathbf{v} \mathbf{v}^* - \mathbf{I} \]

In this case, the vector v is now defined as:

    \[ v_n = \left\{   \begin{array}{@{}ll@{}}     \sqrt{\frac{1 + q_1}{2}}, & \text{if}\ n = 1 \\     \frac{q_n}{2 v_1}, & \text{if}\ n > 1   \end{array}\right. \]

Which moves the singularity to occur as q_1 tends towards -1. The matrix still looks very similar:

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & \frac{q^2_2}{1+q_1} - 1 & \frac{q_2 q_3}{1+q_1} \\  q_3 & \frac{q_3 q_2}{1+q_1} & \frac{q^2_3}{1+q_1} - 1 \\ \end{pmatrix} \]

Efficient implementation of 3-dimension basis finder

As shown in the Pixar paper, there is no need to test the sign of q_1 and pick a code path to follow (which would create a branch prediction performance hit) – we can simply extract the sign and use it as a multiplier to flip signs where they need to be flipped (if they need to be flipped).

Modifying some signs in the explicit matrices given previously, we can re-write the matrices as:

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & \frac{q^2_2}{q_1-1} + 1 & \frac{q_2 q_3}{q_1-1} \\  q_3 & \frac{q_3 q_2}{q_1-1} & \frac{q^2_3}{q_1-1} + 1 \\ \end{pmatrix} \]

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & \frac{q^2_2}{q_1+1} - 1 & \frac{q_2 q_3}{q_1+1} \\  q_3 & \frac{q_3 q_2}{q_1+1} & \frac{q^2_3}{q_1+1} - 1 \\ \end{pmatrix} \]

Or more simply:

    \[ \begin{pmatrix}  q_1 & q_2 & q_3 \\  q_2 & \frac{q^2_2}{q_1+s} - s & \frac{q_2 q_3}{q_1+s} \\  q_3 & \frac{q_3 q_2}{q_1+s} & \frac{q^2_3}{q_1+s} - s \\ \end{pmatrix} \]

Where:

    \[ s = \left\{   \begin{array}{@{}ll@{}}     1, & \text{if}\ q_1 > 0 \\     -1, & \text{if}\ q_1 < 0 \\     \pm 1, & \text{if}\ q_1 = 0   \end{array}\right. \]

And here’s the C implementation:

/* The first three values in v are a three point unit vector. The
 * function produces two more three point vectors of the same coordinate
 * space which are orthogonal. */
void ortho3(float *v)
{
    float q1        = v[0];
    float q2        = v[1];
    float q3        = v[2];
    float sign      = copysignf(1.0f, q1);
    float scale     = 1.0f / (q1 + sign);
    float q2scale   = q2 * scale;
    float q3scale   = q3 * scale;
    float q3q2scale = q2scale * q3;
    v[3]            = q2;
    v[4]            = q2 * q2scale - sign;
    v[5]            = q3q2scale;
    v[6]            = q3;
    v[7]            = q3q2scale;
    v[8]            = q3 * q3scale - sign;
}

Three step digital Butterworth design.

This guide omits a fair amount of detail in order to provide a fairly quick guide.

1) Design in the s-domain

Butterworth filters have their poles evenly spaced on a circlular arc in the left hand plane of the [s/Laplace/analogue]-domain.

For a low-pass filter of order N, the complex valued pole locations are:

    \[ p_n = \omega_0 e^\frac{j \pi (2 n + N + 1)}{2 N} \]

Where 0 <= n < N and \omega_0 is the cutoff frequency in radians.

The transfer function in the s domain is:

    \[ H(s) = \frac{w_0}{\prod\limits_{n=0}^{N-1}(s - p_n)} \]

Because the poles always occur in conjugate pairs for this kind of filter, it can always be expanded out into a polynomial with real-valued coefficients a_n:

    \[ H(s) = \frac{w_0}{\sum\limits_{n=0}^{N-1} a_n s^n} \]

2) Convert the s-domain transfer function to the z-domain.

There are a few methods to convert an s-domain transfer function this into a [z/digital/discrete]-domain transfer function. The most common is the bilinear transform which makes the substitution:

    \[ s = \alpha \frac{z - 1}{z + 1} \]

Where:

    \[ \alpha = \frac{2}{T} \]

Or if you’re performing “frequency warping”:

    \[ \alpha = \frac{\omega_0}{\tan(\frac{\omega_0 T}{2})} \]

I won’t go into verbose details here, but you probably want frequency warping. The bilinear transform is a linear approximation to the function s=\frac{1}{T}\ln(z) which maps the s-domain to the digital z-domain. This linear mapping is analogous to the first term of a series expansion of s=\frac{1}{T}\ln(z) which is most accurate around z=1 (i.e. DC). The frequency warping operation moves the point where most accuracy is preserved to \omega_0. For higher frequency cutoffs, this becomes important. \omega_0 can be set to the cutoff frequency of the filter to improve the frequency response near the transition.

For a 3-rd order system, the conversion yields (see notes at the end of this page about risks):

    \[ \frac{b_3 s^3 + b_2 s^2 + b_1 s^1 + b_0}{a_3 s^3 + a_2 s^2 + a_1 s^1 + a_0} \leftrightarrow \frac{b_3 \alpha^3 (z + 1)^3 + b_2 \alpha^2 (z + 1)^2 (z - 1) + b_1 \alpha (z + 1) (z - 1)^2 + b_0 (z - 1)^3}{a_3 \alpha^3 (z + 1)^3 + a_2 \alpha^2 (z + 1)^2 (z - 1) + a_1 \alpha (z + 1) (z - 1)^2 + a_0 (z - 1)^3} \]

Which can be expanded into a rational polynomial transfer function in z.

3) Convert from the z-domain to a difference equation.

First, you must make your z-domain expression causal. Positive powers of z must be removed otherwise your filter would need to know the future! This is simple, just modify the numerator and denominator of the transfer function by the inverse of the highest power of z. For example:

    \[ \frac{Y(z)}{X(z)} = H(z) = \frac{z + 4}{2z^2 - 3z + 2} \]

Multiply the numerator and denominator by z^{-2}.

    \[ \frac{Y(z)}{X(z)} = \frac{z^{-1} + 4z^{-2}}{2 - 3z^{-1} + 2z^{-2}} \]

Get Y(z) as a function of X(z).

    \[ Y(z)(2 - 3z^{-1} + 2z^{-2}) = X(z)(z^{-1} + 4z^{-2}) \]

The difference equation can be found by substituting z-powers as offsets to the data they are using. i.e. for the above case:

    \[ 2y[n] = x[n-1] + 4x[n-2] + 3y[n-1] - 2y[n-2] \]

That is what you want to implement in your code.

Things to keep in mind

High-order systems can start behaving very poorly when designed in this way. Even on floating point, once a system reaches 4-th order or higher, if there are poles close to the real axis in the z-domain, the output can become severely degraded. There are ways to mitigate this: the simplest of which is to break the implementation into separate 2nd order filters (biquads).

Windows Phone, ownCloud and CardDAV and CalDAV

This is a combination of information from the following two locations:

What is this?

This page gives an outline of how to get your Windows Phone to sync calendars and contacts with an ownCloud instance running on a server using a self-signed certificate. This whole process is a hack and I’m incredibly disappointed that Windows Phone does not support this natively – especially since they have a CalDAV and CardDAV implementation already available (used by iCloud and Google accounts). If this process stops working at some point, that should be expected.

I successfully got this working on a Lumia 735.

Process

I’m assuming that you’ve got ownCloud installed on a server using SSL. When the certificate was set up, the FQDN must really be the FDQN. i.e. if your ownCloud instance is hosted at “a.b.com”, the certificate must be for “a.b.com” – not “b.com”. I had this wrong on my home server and was getting the “80072F0D” error on the phone. If you have not yet set up ownCloud or have not set up SSL for your server yet, there are sites already documenting this process How to Create and Install an Apache Self Signed Certificate.

You then need to get the certificate installed on your phone. If you do not do this, you will get errors when the phone tries to start syncing. You can do this by opening the certificate in Internet Explorer on the phone (you can do this easily by copying the certificate to your ownCloud files, logging into your cloud in IE on the phone and opening the file.

You should then be able to follow the instructions here: Setting up CardDAV and CalDAV on Windows Phone 8.1.

Which roughly reads:

  1. Create a fake iCloud account. Put garbage information in the fields and create it. The phone won’t check anything.
  2. Modify the account, edit the advanced settings then change the servers for CalDAV and CardDAV (respectively) as:
    [domain]/remote.php/caldav/principals/[username]
    [domain]/remote.php/carddav/principals/[username]

Understanding the Modified Discrete Cosine Transform (MDCT)

After playing around with discrete cosine transforms, I thought I would implement an MDCT and document my understanding of how everything works. I use some similar techniques to those used on the Wikipedia page as they are helpful for understanding but will add some scanned drawings which I think help (I’m not even close to being clever enough to get a computer to draw these for me).

Prerequisites

The only real background knowledge which I think is relevant to understanding the MDCT is the data extensions which the DCT-4 transform assumes.

First DCT-4 Basis Function with Shifted 2N Sample Input

First DCT-4 Basis Function with Shifted 2N Sample Input

I’ll refer to the above image in the Forward Transform overview, but for the mean time, only pay attention to the solid quarter wave. This is the first basis function (i.e. k=0 ) for an N length DCT-4. If the basis is continued past N, it has a repeating symmetrical pattern (the dashed line in the image) which repeats every 4N. The symmetry is even around -0.5 and odd around N-0.5 and holds for every basis function of the DCT-4. i.e. The DCT-4 assumes that the input data continues on forever, repeating itself in the following manner: x_n, -x_{N-n-1}, -x_n, x_{N-n-1}.

Forward Transform

The MDCT takes in 2N real data points and produces N real outputs. These inputs are designed to overlap, so the first half of the input data should be the second half of the input data of the previous call. The definition is:

    \[ X_k = \displaystyle\sum\limits_{n=0}^{2N-1} x_n \cos \frac{ \pi \left( n + 0.5 + N/2 \right) \left( k + 0.5 \right) }{N} \]

It should be trivial to see from the above that the MDCT can be computed using a DCT-4 with an extended number of input data points, all of which have been shifted by half a basis. Go back to the crappy drawing and notice the concatenated N/2 length sequences a, b, c and d. The total length of this sequence is 2N and begins at N/2 (or half the length of a basis function). We need to get b, c and d back into the N point region if we want to compute the MDCT using a DCT-4, this can be achieved with the following concatenated sequence (I will subscript these sequences with r to denote a reversal of the sequence):

    \[ - c_r - d , a - b_r \]

If we take the DCT-4 of this concatenated sequence, we have found the MDCT of the input sequence.

Inverse Transform

The inverse MDCT or IMDCT takes in N real data points and produces 2N real outputs. In this transform, the outputs should overlap such that the first half of the output should be added to the second half of the output data in the previous call. The definition is:

    \[ x_n = \frac{1}{N} \displaystyle\sum\limits_{n=0}^{N-1} X_k \cos \frac{ \pi \left( n + 0.5 + N/2 \right) \left( k + 0.5 \right) }{N} \]

Because we know how the DCT-4 assumes the input and output data repeats in a symmetric pattern, we can get this data trivially in exactly the same fashion as we did in the forward transform. In the following Illustration, we take the output from the forward transform and extend it along the basis:

Extended Projection of the MDCT Output on the First DCT-4 Basis

Extended Projection of the MDCT Output on the First DCT-4 Basis

In output row zero, we can see how to extend the input sequence to obtain the 2N points required. We then see in rows two and three how summing the overlapping blocks causes the aliased sequences to cancel in subsequent calls to the IMDCT.

World’s Dumbest C MDCT Implementation

I validated all this actually works with a small C program. Follows are the MDCT/IMDCT implementations I came up with… ignore the “twid” input, I cache the modulation factors for the FFT which gets called in the dct4 routine:

/* state should contain double the number of elements as the input buffer (N)
 * and should have all elements initialized to zero prior to calling. The
 * output buffer is actually the first N elements of state after calling. */
void mdct(double *state, const double *input, double *twid, unsigned lenbits)
{
    unsigned rl = 1u << lenbits;
    unsigned i;
    /* Alias the input data with the previous block. */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = - input[rl/2+i]      - input[rl/2-i-1];
        state[rl/2+i]   =   state[rl+i]        - state[rl+rl-i-1];
    }
    /* Save the input block */
    for (i = 0; i < rl; i++)
        state[rl+i]     = input[i];
    /* DCT-4 */
    dct4(state, lenbits, twid);
}
 
/* state should contain double the number of elements as the input buffer (N)
 * and should have all elements initialized to zero prior to calling. The
 * output buffer is actually the first N elements of state after calling. */
void imdct(double *state, const double *input, double *twid, unsigned lenbits)
{
    unsigned rl = 1u << lenbits;
    unsigned i;
    /* Collect contributions from the previous frame to the output buffer */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = - state[rl+rl/2-i-1];
        state[rl/2+i]   = - state[rl+i];
    }
    /* Load the input and run the DCT-4 */
    for (i = 0; i < rl; i++)
        state[rl+i]     = input[i];
    dct4(state + rl, lenbits, twid);
    /* Sum contributions from this frame to the output buffer and perform the
     * required scaling. */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = (state[i]      + state[rl+rl/2+i]) / rl;
        state[rl/2+i]   = (state[rl/2+i] - state[rl+rl-i-1]) / rl;
    }
}

Windowed MDCT Implementation

Typical MDCT implementations will window the input and output data (this can also be thought of as windowing the basis functions – which I think is a more helpful way to understand what is happening). It is really important to note that the window function must be carefully chosen to ensure that the basis functions remain orthogonal! The window makes the basis functions always begin and end near zero. The process has the side effect of de-normalising the basis functions (unless the window is rectangular) and means there will be a window-dependent scaling factor which will need to be applied at the output to achieve perfect reconstruction. The following images show the second basis function of the MDCT both un-windowed and windowed with a half-sine window (given at the end of the post).

Second MDCT Basis Function

Second MDCT Basis Function

Sine Windowed Second MDCT Basis Function

Sine Windowed Second MDCT Basis Function

In a lossy codec this windowing process is somewhat necessary because if the start and end points are not close to zero, the output is likely to periodically glitch for even the slightest errors in the reconstructed MDCT data. This glitching will occur at the boundaries of the transform (i.e. every N points).

We can work out the necessary conditions for the window to obtain perfect reconstruction using the previous drawings (I’d steer away from equations for this one – it’s easier to validate the results visually) by applying a window function split into 4 segments to each of the input blocks. I’ll do the generic case for a symmetrical window which is applied to both the input and the output. We split the window (which has a length 2N ) into four segments which will be applied to our original input segments a, b, c and d. Because we are defining this window to be symmetric, we can call the pieces:

    \[ u, v, v_r, u_r \]

Symmetrical Window Impact on MDCT

Symmetrical Window Impact on MDCT

The above illustration shows how our window segments are applied to the input data and the impact that has on the DCT-4 analysed data blob. Following that is the output segments from two sequential IMDCT calls with the windows applied to the output here as well.

We need to make the overlapping terms equal the required output segment i.e.

    \[ c = v_r \left( d_r u + c v_r \right) + u \left( c u - d_r v_r \right) \]

    \[ d = u_r \left( d u_r + c_r v \right) + v \left( d v - c_r u_r \right) \]

It is clear from the above that the necessary condition to achieve reconstruction is for v_r^2 + u^2 = 1 (which implies in this case that v^2 + u_r^2 = 1 must also be true).

A simple solution to this is:

    \[ w_n = \sin \frac{ \pi \left( n + 0.5 \right) }{2N} \]

The output requires a scaling factor of 2 for this window.