Derivation of fast DCT-4 algorithm based on DFT

It’s well known that an N point DCT-4 can be computed using an N/2 point complex FFT. Although the algorithm is widespread, the texts which I have read on the subject have not provided the details as to how it works. I’ve been trying to understand this for a while (I tend not to use algorithms until I understand them) and finally figured it out and thought I would share (please provide comments/links if you can find a better or shorter explanation – this is as good as I could get).

Start with the definition:

    \[ X_k = \displaystyle\sum\limits_{n=0}^{N-1} x_n \cos \frac{ \pi \left( n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N} \]

Trig functions are hard to manipulate in expressions (for me anyway), so knowing that x_n is real, we change the \cos into a complex exponential and obtain the following:

    \[ X_k = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N-1} x_n \mathrm{e}^{\frac{\mathrm{j} \pi \left( n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} \right\} \]

It is worth noting that the sign of the exponent is irrelevant.

We then split the expression up to operate over two half-length sequences composed of x_{2n} and x_{N-1-2n}. The second sequence is reversed and decimated because when the n is replaced by N-1-2n in the n + \frac{1}{2} term of the exponential, the expression is negated rather than the offset being modified. We move the N term into another exponential which becomes trivial as N cancels the denominator. i.e.:

    \[ X_k = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} x_{2n} \mathrm{e}^{\frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} + x_{N-1-2n} \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} \mathrm{e}^{\mathrm{j} \pi \left( k + \frac{1}{2} \right)} \right\} \]

Or:

    \[ X_k = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} x_{2n} \mathrm{e}^{\frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} + \mathrm{j} {\left( -1 \right)}^{k} x_{N-1-2n} \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( k + \frac{1}{2} \right) }{N}} \right\} \]

The exponentials still contain no term which resembles an DFT (over length N/2 anyway). To get one, we break up X_k into the terms X_{2k} and X_{N-1-2k}. Again we choose to reverse the second sequence because it will keep the exponential terms in a similar but negated form.

    \[ X_{2k} = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} x_{2n} \mathrm{e}^{\frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} + \mathrm{j} x_{N-1-2n} \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

    \[ X_{N-1-2k} = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} \mathrm{j} x_{2n} \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} - x_{N-1-2n} \mathrm{e}^{\frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

Now because we are only interested in the real terms, we can ignore or conjugate anything which contributes to the imaginary term of the above expressions. This means that we can conjugate the exponentials when they are only modulating a real term. Doing this, we obtain:

    \[ X_{2k} = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} \left( x_{2n} + \mathrm{j} x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

    \[ X_{N-1-2k} = \Re \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} \left( \mathrm{j} x_{2n} - x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

Almost there, but to obtain the full benefit we need the inner terms to be the same. If we now multiply the X_{N-1-2k} term by - \mathrm{j}, we get:

    \[ X_{N-1-2k} = - \Im \left\{ \displaystyle\sum\limits_{n=0}^{N/2-1} \left( x_{2n} + \mathrm{j} x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2n + \frac{1}{2} \right) \left( 2k + \frac{1}{2} \right) }{N}} \right\} \]

Done. If we expand out the X_{2k} and X_{N-1-2k} terms completely we get:

    \[ X_{2k} = \Re \left\{ \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2k + \frac{1}{2} \right) }{2N}} \displaystyle\sum\limits_{n=0}^{N/2-1} \left( x_{2n} + \mathrm{j} x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi n}{N}} \mathrm{e}^{- \frac{\mathrm{j} 2 \pi n k }{N/2}} \right\} \]

    \[ X_{N-1-2k} = - \Im \left\{ \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2k + \frac{1}{2} \right) }{2N}} \displaystyle\sum\limits_{n=0}^{N/2-1} \left( x_{2n} + \mathrm{j} x_{N-1-2n} \right) \mathrm{e}^{- \frac{\mathrm{j} \pi n}{N}} \mathrm{e}^{- \frac{\mathrm{j} 2 \pi n k }{N/2}} \right\} \]

Which give us the steps for a DFT based DCT-4 algorithm:

  • Transform the N point real sequence x_n into the N/2 point complex sequence y_n = x_{2n} + \mathrm{j} x_{N-1-2n}.
  • Multiply each element of the sequence y_n by \mathrm{e}^{- \frac{\mathrm{j} \pi n}{N}}.
  • Find Y, the DFT of the sequence y.
  • Multiply each element of Y_k by \mathrm{e}^{- \frac{\mathrm{j} \pi \left( 2k + \frac{1}{2} \right) }{2N}}.
  • The DCT-4 outputs are given by:

        \[ X_k = \left\{ \begin{array}{l l} \Re \left\{ Y_{k/2} \right\} & \quad \text{for even $k$} \\ - \Im \left\{ Y_{(N-1-k)/2} \right\} & \quad \text{for odd $k$} \end{array} \right. \]

Why you should not use C99 exact-width integer types

Is there really ever a time where you need an integer type containing exactly N-bits? There are C99 types which guarantee at least N-bits. There are even C90 types which guarantee at least 8, 16 and 32 bits (the standard C integer types). Why not use one of those?

I never use C99 exact-width types in code… ever. Chances are that you shouldn’t either because:

Exact width integer types reduce portability

This is because:

1) Exact width integer types do not exist before C99

Sure you could create an abstraction that detects if the standard is less than C99 and introduce the types, but then you would be overriding the POSIX namespace by defining your own integer types suffixed with “_t”. POSIX.1-2008 – The System Interfaces: 2.2.2 The Name Space

GCC also will not like you:

The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names.
GNU libc manual: 1.3.3 Reserved Names

From my own experience using GCC on OS X, the fixed width types are defined even when using --std=c90, meaning you’ll just get errors if you try to redefine them. Bummer.

2) Exact width integer types are not guaranteed to exist at all:

These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32 or 64 bits, it shall define the corresponding typedef names.
ISO/IEC 9899:1999 – 7.18.1.1 Exact-width integer types

Even in C99, the (u)intN_t type does not need to exist unless there is a native integer type of that width. You may argue and say that there are not many platforms which do not have these types – there are: DSPs. If you start using these types, you limit the platforms on which your software can run – and are also probably developing bad habits.

Using exact width integer types could have a negative performance impact

If you need at least N-bits and it does not matter if there are more, why restrict yourself to a type which which could require additional overhead? If you are writing C99 code, use one of the (u)int_fastN_t types. Maybe, you could even use a standard C integer type!

The endianness of exact width integer types is unspecified

I am not not implying that the endianness is specified for other C types. I am just trying to make a point: you cannot even use these types for portable serialisation/de-serialisation without feral-octet-swapping-macro-garbage as the underlying layout of the type is system dependent.

If you are interested in the conditions for when memcpy can be used to copy memory into a particular type, maybe you should check out the abstraction which is part of my digest program. It contains a heap of checks to ensure that memcpy is only used on systems when it is known that it will do the right thing. It tries to deal with potential padding, non 8-bit chars and endianness in a clean way that isn’t broken.

This article deliberately did not discuss the signed variants of these types…

Eaton 5115 powering Ubuntu Server 11.10 with NUT

A couple of months ago, I bought a UPS for my server at home. I’m running more remote services on it and figure that anything I can add to protect it is worth while. I bought an Eaton 5115 500 Watt tower model. Only the “Powerware” branded model is listed on the NUT support page but I was banking on the newer model being compatible. For those out there wondering: yes it is.

  1. Install NUT and then configure as per the instructions given here here. For the Eaton 5115 connected via USB, bcmxcp_usb is the correct driver. On Ubuntu this was just a matter of apt-get’ing the “nut” package and modifying ups.conf, upsd.conf, upsd.users, upsmon.conf and nut.conf as specified on the previously linked documentation page.
  2. Disconnect and reconnect the USB connection from the server to the UPS (I had to do this to get everything working).
  3. Test that everything worked by running:

    $ upsc { ups name }

    Where { ups name } is the name chosen for the UPS in ups.conf.

As for the UPS itself: it seems pretty good. I was concerned when I first plugged it in as the fan whirred quite loudly at full speed for a few minutes – but then it slowed down to a very quiet speed. The voltage regulation feature has kicked in a few times (my line is supposed to be a nominal 240 Volts but more often floats around 250 Volts) which I thought was pretty neat.

I haven’t got figures on battery length… might edit this when I do.