Author Archives: nick

Release Alignment in Sampled Pipe Organs – Part 1

At the most basic level, a sample from a digital pipe organ contains:

  • an attack transient leading into
  • a looped sustain block and
  • a release which will be cross-faded into when the note is released.

The release cross-fade must be fast (otherwise it will not sound natural or transient details may be lost) and it must also be phase-aligned to the point where the cross-fade begins.

The necessity for phase alignment

Without phase aligning the release, disturbing artefacts will likely be introduced. The effects are different with short and long cross-fades but are always unpleasant.

The following image shows an ideal cross-fade into a release sample. The crossfade begins at 0.1 seconds and lasts for 0.05 seconds. The release is aligned properly and the signal looks continuous.

A good crossfade into a release.

A good crossfade into a release.

The following image shows a bad release where the cross-fade is lagging an ideal release offset by half-a-period. Some cancellation occurs during the cross-fade and the result will either sound something like a “pluck” for long cross-fades or a “click” for short cross-fades.

A worst-case crossfade into a release.

A worst-case crossfade into a release.

(The cross-fade used in generating the above data sets was a raised cosine – linear cross-fades can be used but will result in worse distortions).

The problem of aligning release cross-fades in virtual pipe organs is an interesting one. As an example: at the time of writing this article, release alignment in the GrandOrgue project is not particularly good; it uses a lookup-table taking the value and first-order estimated derivative (both quantised heavily) of the last sample of the last played block as keys. This is not optimal as a single sample says nothing about phase and the first-order derivative estimate could be completely incorrect in the presence of noise.

Another approach for handling release alignment

If the pitch a pipe was to be completely stable, known (f=\frac{1}{T}) and we knew one point where the release was perfectly aligned (t_r), we know that we could cross-fade into the start of the release at:

    \[ \forall n \in \mathbb Z, t = t_r + T n \]

Hence, for any sample offset we could compute an offset into the release to cross-fade into.

In reality, pipe pitch wobbles around a bit and so the above would not strictly hold all the time – that being said, it is true for much of the time. If we could take a pipe sample and find all of the points where the release is aligned we could always find the best way to align the release.

It turns out that a simple way to do this is to find the cross-correlation of the attack and sustain segment with a short portion of the release. Taking the whole release would be problematic because as it decays it becomes less similar to the sustaining segment (which leads to an unhelpful correlation signal).

The first 25000 samples of the signal used for the cross-correlation.

The first 25000 samples of the signal used for the cross-correlation.

The above image shows the attack and some sustain of bottom-C of the St. Augustine’s Closed Horn. This shows visually why single sample amplitude and derivative matching is a poor way to align releases. During one period of the closed horn, there are 14 zero crossings and 16 obvious zero crossings in the derivative. One sample gives hardly enough information.

A 1024 sample cut from the start of the release.

A 1024 sample cut from the start of the release.

The above image shows a 1024 sample segment taken from the release marker of the same Closed Horn sample. It contains just over a single period of the horn.

The next image shows the cross-correlation of this release segment with the sample itself. My analysis program does correlation of the left and right channels and sums them to provide an overall correlation. Positive maximums correspond to points where the release will phase-align well. Minimums correspond to points where the signal has the least correlation to the release.

Normalised cross correlation of the signal with the release segment.

Normalised cross correlation of the signal with the release segment.

Using the correlation and a pitch guesstimate, we could construct a function which given any sample offset in the attack/sustain could produce an offset into the release which we should cross-fade into. This is for next time.

wxWidgets, C++ libraries and C++11

Building wxWidgets on OS X targeting libc++

It seems right to put this at the top of the post for easy access (probably for my own reference).

To get a configuration of wxWidgets (I am using version 3.0.0) which will use the libc++ as the standard library implementation, the following command line works (using Apple LLVM version 5.0 clang-500.2.79):


../configure --disable-shared --enable-unicode --with-cocoa --with-macosx-version-min=10.7 --with-macosx-sdk=/Developer/SDKs/MacOSX10.7.sdk CXXFLAGS="-std=c++0x -stdlib=libc++" CPPFLAGS="-stdlib=libc++" LIBS=-lc++

-std=c++0x (I know, this is deprecated syntax) tells the compiler that we want C++11 features.
-stdlib=libc++ tells the compiler we want to use the libc++ standard library implementation (rather than the libstdc++ implementation.

This will produce a static, unicode build of wxWidgets without debug information. The flags will not work with --with-macosx-version-min set to anything less than 10.7 because -stdlib=libc++ requires this as a minimum.

Why build wxWidgets on OS X targeting libc++

OS X currently ships with two C++ libraries, libstdc++ and libc++. libc++ is reasonably new and completely supports C++11. libstdc++ (on OS X anyway) is very old and only supports a subset of C++03. Unless you specify otherwise, building an application with clang will produce object code which expects to link against libstdc++ targeting the C++98 standard. If you are building C++11 code and only add -std=c++0x to your compiler arguments, your application may fail to compile because the standard library might not have all of the features which you require. In short, if you require C++11 support on OS X, you probably want to migrate over to libc++ for your standard library.

If you build a static library with C++98 targeting libstdc++ and try to link it against an application targeting libc++, you are probably going to get errors looking something like (for wxWidgets anyway):


Undefined symbols for architecture x86_64:
"std::basic_string, std::allocator >::find_last_of(wchar_t const*, unsigned long, unsigned long) const", referenced from:
wxFileName::SplitPath(wxString const&, wxString*, wxString*, wxString*, wxString*, bool*, wxPathFormat) in libwx_baseu-3.0.a(baselib_filename.o)
"std::basic_string, std::allocator >::find_first_of(wchar_t const*, unsigned long, unsigned long) const", referenced from:
wxLocale::GetSystemLanguage() in libwx_baseu-3.0.a(baselib_intl.o)
wxFileName::SplitVolume(wxString const&, wxString*, wxString*, wxPathFormat) in libwx_baseu-3.0.a(baselib_filename.o)
wxRegExImpl::Replace(wxString*, wxString const&, unsigned long) const in libwx_baseu-3.0.a(baselib_regex.o)
wxString::find_first_of(char const*, unsigned long) const in libwx_baseu-3.0.a(baselib_mimecmn.o)
wxString::find_first_of(char const*, unsigned long) const in libwx_osx_cocoau_core-3.0.a(corelib_osx_cocoa_button.o)

… which will continue for several hundred lines.

This is because libstdc++ and libc++ are not fully ABI compatible. When your libc++ application tries to link against a library expecting libstdc++, you are going to have major unresolved symbol issues unless you use a very minimal subset of C++11. Bugger.

Edit: I just found this excellent post Marshall’s C++ Musings – Clang and standard libraries on Mac OS X which is very relevant to the topic.

Jaycar Electronics JV60 Speaker Kit Review

I’ll start off this post by saying that my wife and I had a 5.1-channel satellite and subwoofer setup for quite a long time at home. After a few years I decided to upgrade to a stereo setup with decent left and right drivers. You heard right: upgrade to stereo. Our dominant use-case at home is listening to music rather than watching television and movies and satellite systems absolutely suck for music. If you purchased one of these systems with the puny 1 to 2 inch drivers I am so sorry that you have to listen to such inferior quality audio. I took the bookshelf speakers which I was using for my digital-organ and put them in our living area along with the existing subwoofer to create a 2.1 channel setup. Guess what: it was even better with television as well.

After about 6 months, I really wanted to play organ again without needing to use headphones so I started searching for some decent speakers which I could replace the bookshelf speakers with. After plenty of googling around and showing the specs to some of my work colleagues, I decided to DIY and get a Jaycar JV60 Speaker Kit (the cabinets are also available from Jaycar). The kit is a three-way design utilising 2 Vifa P17WJ speakers and a Vifa D25AG tweeter. The supplied crossover is designed such that one woofer provides only sub-200 Hz frequencies and the other provides sub 3 kHz frequencies i.e. there is an overlap in the bass frequency response. The manual states “This has been done to achieve a strong and extended bass…”

Picture of the JV60 kit packaging.

How the JV60 speaker kit is packed.

Picture of all of the included JV60 kit components

Included JV60 components.

Personally, I don’t understand why the tweeter is in the middle of the cabinet rather than the top. I figure that the typical use-case for these speakers would be floor standing in a living space featuring an epic couch. Our couch is by no-means epic, and is actually quite low. Even so, when I am sitting on the couch, my ears are above the height of the tweeters. I would have expected that the most directional speakers would have been placed either on axis or slightly above rather than below the height of even the head of a child sitting on a couch.

Building the Speakers

I thought this was going to be a trivial exercise that would take an hour or so – but it took me quite a bit longer. This is likely because I’ve forgotten how to use a screwdriver. Long ago are the days where I was apprenticing for Peter D Jewkes pipe organ builders… my hands still have blisters.

Cabinets

Picture of the JV60 corner joinery details

The speakers are well built for the price.

Front-on picture of the JV60 cabinets

The cabinets have a gloss finish on the front and matte finish on the rest.

Mounting the Crossover

Picture of the JV60 crossovers exactly as they come with the kit

This is exactly how the JV60 crossovers are shipped with the kit

First step in the manual is mounting the crossover on the back of the speaker. You are meant to pre-drill holes for the screws to mount it inside on the back wall of the cabinet – but good luck doing this unless you have a miniature drill that you can fit inside the cabinet. I managed to get the supplied wood-screws straight into the back panel with a bit of force. The kit was missing the nylon spacers which were meant to separate the crossover from the back wall, but fortunately I had some spares in the garage.

Issues with the Speakers

Dead Tweeter

After building the speakers, I set about testing them out. Test one was loud electronic music – worked beautifully. Bass was clear and the break as mid frequencies lead into the tweeters was fantastic. Second test was some Bach organ chorales, this test did not go so well. I kept hearing some distortions in the audio; at first I wondered if it was the recording or encoding artefacts but then I noticed it was only coming from one speaker. So I fired up Adobe Audition, created a sine sweep and sure enough one of the tweeters was just totally distorting in the 2-5 kHz region. I got the speaker replaced and all was fine.

Damaged Grill

When I got the speakers, one grill had damaged connectors which attach to the pins on the speaker cabinets (see the pictures). The plastic connections look like somebody had just hacked at them with a hammer and forced the grill on – even though it wasn’t even close to being aligned to the pins. I managed to fix these up by inserting a screwdriver into the connectors for a while to restore their original shape… but I was still pretty annoyed that they were so broken when I got the cabinets.

Summary

These speakers are amazing… once they are built and working. I doubt that I would be able to get a better sounding pair of speakers for less than 2 to 3 times the price I paid for these. However, if you decide to build this kit: insist on checking that all of the components are present and that there are no defects in the cabinets/grills. Make sure you do heaps of listening once you’ve built them to ensure the speakers are all working and there are no unpleasant distortions or leaks in the box.

What I really love about these speakers is how the mid woofer covers such a large frequency response and how well that response leads into the tweeter region. The drivers themselves are clearly of an epic quality: the tweeter is not harsh at all and both the woofers and the tweeter don’t appear to have any horrible peaks in the frequency response. Even soft organ reeds with frequency content which covers the whole audible spectrum sound really natural.

Understanding the Modified Discrete Cosine Transform (MDCT)

After playing around with discrete cosine transforms, I thought I would implement an MDCT and document my understanding of how everything works. I use some similar techniques to those used on the Wikipedia page as they are helpful for understanding but will add some scanned drawings which I think help (I’m not even close to being clever enough to get a computer to draw these for me).

Prerequisites

The only real background knowledge which I think is relevant to understanding the MDCT is the data extensions which the DCT-4 transform assumes.

First DCT-4 Basis Function with Shifted 2N Sample Input

First DCT-4 Basis Function with Shifted 2N Sample Input

I’ll refer to the above image in the Forward Transform overview, but for the mean time, only pay attention to the solid quarter wave. This is the first basis function (i.e. k=0 ) for an N length DCT-4. If the basis is continued past N, it has a repeating symmetrical pattern (the dashed line in the image) which repeats every 4N. The symmetry is even around -0.5 and odd around N-0.5 and holds for every basis function of the DCT-4. i.e. The DCT-4 assumes that the input data continues on forever, repeating itself in the following manner: x_n, -x_{N-n-1}, -x_n, x_{N-n-1}.

Forward Transform

The MDCT takes in 2N real data points and produces N real outputs. These inputs are designed to overlap, so the first half of the input data should be the second half of the input data of the previous call. The definition is:

    \[ X_k = \displaystyle\sum\limits_{n=0}^{2N-1} x_n \cos \frac{ \pi \left( n + 0.5 + N/2 \right) \left( k + 0.5 \right) }{N} \]

It should be trivial to see from the above that the MDCT can be computed using a DCT-4 with an extended number of input data points, all of which have been shifted by half a basis. Go back to the crappy drawing and notice the concatenated N/2 length sequences a, b, c and d. The total length of this sequence is 2N and begins at N/2 (or half the length of a basis function). We need to get b, c and d back into the N point region if we want to compute the MDCT using a DCT-4, this can be achieved with the following concatenated sequence (I will subscript these sequences with r to denote a reversal of the sequence):

    \[ - c_r - d , a - b_r \]

If we take the DCT-4 of this concatenated sequence, we have found the MDCT of the input sequence.

Inverse Transform

The inverse MDCT or IMDCT takes in N real data points and produces 2N real outputs. In this transform, the outputs should overlap such that the first half of the output should be added to the second half of the output data in the previous call. The definition is:

    \[ x_n = \frac{1}{N} \displaystyle\sum\limits_{n=0}^{N-1} X_k \cos \frac{ \pi \left( n + 0.5 + N/2 \right) \left( k + 0.5 \right) }{N} \]

Because we know how the DCT-4 assumes the input and output data repeats in a symmetric pattern, we can get this data trivially in exactly the same fashion as we did in the forward transform. In the following Illustration, we take the output from the forward transform and extend it along the basis:

Extended Projection of the MDCT Output on the First DCT-4 Basis

Extended Projection of the MDCT Output on the First DCT-4 Basis

In output row zero, we can see how to extend the input sequence to obtain the 2N points required. We then see in rows two and three how summing the overlapping blocks causes the aliased sequences to cancel in subsequent calls to the IMDCT.

World’s Dumbest C MDCT Implementation

I validated all this actually works with a small C program. Follows are the MDCT/IMDCT implementations I came up with… ignore the “twid” input, I cache the modulation factors for the FFT which gets called in the dct4 routine:

/* state should contain double the number of elements as the input buffer (N)
 * and should have all elements initialized to zero prior to calling. The
 * output buffer is actually the first N elements of state after calling. */
void mdct(double *state, const double *input, double *twid, unsigned lenbits)
{
    unsigned rl = 1u << lenbits;
    unsigned i;
    /* Alias the input data with the previous block. */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = - input[rl/2+i]      - input[rl/2-i-1];
        state[rl/2+i]   =   state[rl+i]        - state[rl+rl-i-1];
    }
    /* Save the input block */
    for (i = 0; i < rl; i++)
        state[rl+i]     = input[i];
    /* DCT-4 */
    dct4(state, lenbits, twid);
}
 
/* state should contain double the number of elements as the input buffer (N)
 * and should have all elements initialized to zero prior to calling. The
 * output buffer is actually the first N elements of state after calling. */
void imdct(double *state, const double *input, double *twid, unsigned lenbits)
{
    unsigned rl = 1u << lenbits;
    unsigned i;
    /* Collect contributions from the previous frame to the output buffer */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = - state[rl+rl/2-i-1];
        state[rl/2+i]   = - state[rl+i];
    }
    /* Load the input and run the DCT-4 */
    for (i = 0; i < rl; i++)
        state[rl+i]     = input[i];
    dct4(state + rl, lenbits, twid);
    /* Sum contributions from this frame to the output buffer and perform the
     * required scaling. */
    for (i = 0; i < rl / 2; i++) {
        state[i]        = (state[i]      + state[rl+rl/2+i]) / rl;
        state[rl/2+i]   = (state[rl/2+i] - state[rl+rl-i-1]) / rl;
    }
}

Windowed MDCT Implementation

Typical MDCT implementations will window the input and output data (this can also be thought of as windowing the basis functions – which I think is a more helpful way to understand what is happening). It is really important to note that the window function must be carefully chosen to ensure that the basis functions remain orthogonal! The window makes the basis functions always begin and end near zero. The process has the side effect of de-normalising the basis functions (unless the window is rectangular) and means there will be a window-dependent scaling factor which will need to be applied at the output to achieve perfect reconstruction. The following images show the second basis function of the MDCT both un-windowed and windowed with a half-sine window (given at the end of the post).

Second MDCT Basis Function

Second MDCT Basis Function

Sine Windowed Second MDCT Basis Function

Sine Windowed Second MDCT Basis Function

In a lossy codec this windowing process is somewhat necessary because if the start and end points are not close to zero, the output is likely to periodically glitch for even the slightest errors in the reconstructed MDCT data. This glitching will occur at the boundaries of the transform (i.e. every N points).

We can work out the necessary conditions for the window to obtain perfect reconstruction using the previous drawings (I’d steer away from equations for this one – it’s easier to validate the results visually) by applying a window function split into 4 segments to each of the input blocks. I’ll do the generic case for a symmetrical window which is applied to both the input and the output. We split the window (which has a length 2N ) into four segments which will be applied to our original input segments a, b, c and d. Because we are defining this window to be symmetric, we can call the pieces:

    \[ u, v, v_r, u_r \]

Symmetrical Window Impact on MDCT

Symmetrical Window Impact on MDCT

The above illustration shows how our window segments are applied to the input data and the impact that has on the DCT-4 analysed data blob. Following that is the output segments from two sequential IMDCT calls with the windows applied to the output here as well.

We need to make the overlapping terms equal the required output segment i.e.

    \[ c = v_r \left( d_r u + c v_r \right) + u \left( c u - d_r v_r \right) \]

    \[ d = u_r \left( d u_r + c_r v \right) + v \left( d v - c_r u_r \right) \]

It is clear from the above that the necessary condition to achieve reconstruction is for v_r^2 + u^2 = 1 (which implies in this case that v^2 + u_r^2 = 1 must also be true).

A simple solution to this is:

    \[ w_n = \sin \frac{ \pi \left( n + 0.5 \right) }{2N} \]

The output requires a scaling factor of 2 for this window.