Category Archives: Modelling

On the perils of cross-fading loops in organ samples

One common strategy when looping problematic organ samples is to employ a cross-fade. This is an irreversible audio modification that gradually transitions the samples leading up to the end of a loop to equal the samples that were leading into the start. The goal is to completely eliminate any sort of impulsive glitch and hopefully also create a “good spectral match”. While the ability to eliminate glitches is possible, creating a good spectral match might not be so simple.

We can think of an organ sample as being a linear combination of two signals:

  • A voiced/correlated/predictable component (choose your word) which represents the tonal part of the sample. Strictly speaking, the tonal component is not entirely predictable – there are continuous subtle variations in the speech of a pipe… but they are typically small and we will assume predictability.
  • An unvoiced/uncorrelated/unpredictable component which represents the pipe-noise of the sample.

Both of these components are necessary for realism in a sample.

The following image shows a cross-fade of two entirely correlated signals. The top picture contains the input signals and the cross-fade transition that will be used, the bottom contains the input signals with the cross-faded signal overlaid on top. There is nothing interesting about this picture: the two input signals were the same. The cross-fade transitions between two identical signals i.e. the output signal is equal to both of the input signals.

Crossfade of correlated signal

Crossfade of correlated signal

The next image shows a cross-fade of the same correlated signal with some uniform white noise overlaid on top. What is going on in the middle of output signal? It looks like it’s been attenuated a little bit and doesn’t appear to have a uniform distribution anymore.

Crossfade of correlated signal with uniform noise

Crossfade of correlated signal with uniform noise

This final image is a cross-fade purely consisting of two uniform random signals to add some more detail.

Crossfade of uniform noise

Crossfade of uniform noise

It turns out that summing two uniformly distributed random variables yields a new random variable with a triangular distribution (this is how noise for triangular dither gets generated). During the cross-fade, the distribution of the uncorrelated components of the signal is actually changed. Not only that, but the power of the signal is reduced in the middle of the transition. Here is an audio sample of two white noise streams being cross-faded over 2 seconds:

Listen for the level drop in the middle.

If the cross-fade is too long when looping a sample, it could make the pipe wind-noise duck over the duration of the cross-fade. While the effect is subtle, I have heard it in some samples and it’s not particularly natural. I suppose the TLDR advice I can give is:

  • If you can avoid cross-fading – avoid it.
  • If you cannot avoid cross-fading – make the cross-fade as short as possible (milliseconds) to avoid an obvious level-drop of noise during the transition.

Release Alignment in Sampled Pipe Organs – Part 1

At the most basic level, a sample from a digital pipe organ contains:

  • an attack transient leading into
  • a looped sustain block and
  • a release which will be cross-faded into when the note is released.

The release cross-fade must be fast (otherwise it will not sound natural or transient details may be lost) and it must also be phase-aligned to the point where the cross-fade begins.

The necessity for phase alignment

Without phase aligning the release, disturbing artefacts will likely be introduced. The effects are different with short and long cross-fades but are always unpleasant.

The following image shows an ideal cross-fade into a release sample. The crossfade begins at 0.1 seconds and lasts for 0.05 seconds. The release is aligned properly and the signal looks continuous.

A good crossfade into a release.

A good crossfade into a release.

The following image shows a bad release where the cross-fade is lagging an ideal release offset by half-a-period. Some cancellation occurs during the cross-fade and the result will either sound something like a “pluck” for long cross-fades or a “click” for short cross-fades.

A worst-case crossfade into a release.

A worst-case crossfade into a release.

(The cross-fade used in generating the above data sets was a raised cosine – linear cross-fades can be used but will result in worse distortions).

The problem of aligning release cross-fades in virtual pipe organs is an interesting one. As an example: at the time of writing this article, release alignment in the GrandOrgue project is not particularly good; it uses a lookup-table taking the value and first-order estimated derivative (both quantised heavily) of the last sample of the last played block as keys. This is not optimal as a single sample says nothing about phase and the first-order derivative estimate could be completely incorrect in the presence of noise.

Another approach for handling release alignment

If the pitch a pipe was to be completely stable, known (f=\frac{1}{T}) and we knew one point where the release was perfectly aligned (t_r), we know that we could cross-fade into the start of the release at:

    \[ \forall n \in \mathbb Z, t = t_r + T n \]

Hence, for any sample offset we could compute an offset into the release to cross-fade into.

In reality, pipe pitch wobbles around a bit and so the above would not strictly hold all the time – that being said, it is true for much of the time. If we could take a pipe sample and find all of the points where the release is aligned we could always find the best way to align the release.

It turns out that a simple way to do this is to find the cross-correlation of the attack and sustain segment with a short portion of the release. Taking the whole release would be problematic because as it decays it becomes less similar to the sustaining segment (which leads to an unhelpful correlation signal).

The first 25000 samples of the signal used for the cross-correlation.

The first 25000 samples of the signal used for the cross-correlation.

The above image shows the attack and some sustain of bottom-C of the St. Augustine’s Closed Horn. This shows visually why single sample amplitude and derivative matching is a poor way to align releases. During one period of the closed horn, there are 14 zero crossings and 16 obvious zero crossings in the derivative. One sample gives hardly enough information.

A 1024 sample cut from the start of the release.

A 1024 sample cut from the start of the release.

The above image shows a 1024 sample segment taken from the release marker of the same Closed Horn sample. It contains just over a single period of the horn.

The next image shows the cross-correlation of this release segment with the sample itself. My analysis program does correlation of the left and right channels and sums them to provide an overall correlation. Positive maximums correspond to points where the release will phase-align well. Minimums correspond to points where the signal has the least correlation to the release.

Normalised cross correlation of the signal with the release segment.

Normalised cross correlation of the signal with the release segment.

Using the correlation and a pitch guesstimate, we could construct a function which given any sample offset in the attack/sustain could produce an offset into the release which we should cross-fade into. This is for next time.