Why you should not use C99 exact-width integer types

Is there really ever a time where you need an integer type containing exactly N-bits? There are C99 types which guarantee at least N-bits. There are even C90 types which guarantee at least 8, 16 and 32 bits (the standard C integer types). Why not use one of those?

I never use C99 exact-width types in code… ever. Chances are that you shouldn’t either because:

Exact width integer types reduce portability

This is because:

1) Exact width integer types do not exist before C99

Sure you could create an abstraction that detects if the standard is less than C99 and introduce the types, but then you would be overriding the POSIX namespace by defining your own integer types suffixed with “_t”. POSIX.1-2008 – The System Interfaces: 2.2.2 The Name Space

GCC also will not like you:

The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names.
GNU libc manual: 1.3.3 Reserved Names

From my own experience using GCC on OS X, the fixed width types are defined even when using --std=c90, meaning you’ll just get errors if you try to redefine them. Bummer.

2) Exact width integer types are not guaranteed to exist at all:

These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32 or 64 bits, it shall define the corresponding typedef names.
ISO/IEC 9899:1999 – Exact-width integer types

Even in C99, the (u)intN_t type does not need to exist unless there is a native integer type of that width. You may argue and say that there are not many platforms which do not have these types – there are: DSPs. If you start using these types, you limit the platforms on which your software can run – and are also probably developing bad habits.

Using exact width integer types could have a negative performance impact

If you need at least N-bits and it does not matter if there are more, why restrict yourself to a type which which could require additional overhead? If you are writing C99 code, use one of the (u)int_fastN_t types. Maybe, you could even use a standard C integer type!

The endianness of exact width integer types is unspecified

I am not not implying that the endianness is specified for other C types. I am just trying to make a point: you cannot even use these types for portable serialisation/de-serialisation without feral-octet-swapping-macro-garbage as the underlying layout of the type is system dependent.

If you are interested in the conditions for when memcpy can be used to copy memory into a particular type, maybe you should check out the abstraction which is part of my digest program. It contains a heap of checks to ensure that memcpy is only used on systems when it is known that it will do the right thing. It tries to deal with potential padding, non 8-bit chars and endianness in a clean way that isn’t broken.

This article deliberately did not discuss the signed variants of these types…

Eaton 5115 powering Ubuntu Server 11.10 with NUT

A couple of months ago, I bought a UPS for my server at home. I’m running more remote services on it and figure that anything I can add to protect it is worth while. I bought an Eaton 5115 500 Watt tower model. Only the “Powerware” branded model is listed on the NUT support page but I was banking on the newer model being compatible. For those out there wondering: yes it is.

  1. Install NUT and then configure as per the instructions given here here. For the Eaton 5115 connected via USB, bcmxcp_usb is the correct driver. On Ubuntu this was just a matter of apt-get’ing the “nut” package and modifying ups.conf, upsd.conf, upsd.users, upsmon.conf and nut.conf as specified on the previously linked documentation page.
  2. Disconnect and reconnect the USB connection from the server to the UPS (I had to do this to get everything working).
  3. Test that everything worked by running:

    $ upsc { ups name }

    Where { ups name } is the name chosen for the UPS in ups.conf.

As for the UPS itself: it seems pretty good. I was concerned when I first plugged it in as the fan whirred quite loudly at full speed for a few minutes – but then it slowed down to a very quiet speed. The voltage regulation feature has kicked in a few times (my line is supposed to be a nominal 240 Volts but more often floats around 250 Volts) which I thought was pretty neat.

I haven’t got figures on battery length… might edit this when I do.

Thank you DrayTek

Our home is a wireless nightmare. We have several simultaneous dual-band routers which we used to spread the wifi throughout the house. All of these are running dd-wrt because their stock firmware cannot do what I want them to do. We have:

  • A Cisco E4200 which is the main internet router and is positioned in the middle of the house. It provides two separate wireless networks.
  • A Cisco E3000 which is bridged to the N network. This provides another G network and also acts as our print server (printer is connected to a USB port on the device).
  • A Netgear WNDR3700 which is bridged to the N network and provides yet another G network. Our TV is also connected to this box so it can talk to our media server.

All of these devices are fantastic. I have never needed to reset any of them (yes, never). They just work all the time. However, the above description of my wireless setup is not the point of this post. While I have had a good experience with the reliability of my “routers”, my experience with modem/router combo devices tends to have been less positive: they have never been particularly reliable when placed under load. Using a router, however, means that a separate modem is required to connect to the internet.

Over the last few years, I have had a bunch of different modems and almost all of them have been complete rubbish. Here are the last four which I have had:

  • A Netgear DGND3300. This is actually a modem/router, but it can be placed into a modem only mode via a hidden page (http://(ROUTER_IP_ADDRESS)/setup.cgi?next_file=mode.htm). I used the device initially in the normal modem/router mode until one of its wireless devices started to fail. At which point, I bought the WNDR3700 and continued to use this device as a modem. It lasted probably a year and then it started to crash (the modem had to be power cycled on a more-than-daily basis) but I have a feeling this was triggered by a surge. I give this device a 3/10 – I don’t blame Netgear for the modem component of the device failing, but I do blame them for the wireless giving up so quickly.
  • A Netgear DM111P. This thing was a complete pile of rubbish. From the day it was bought, it needed to be power cycled at least daily. I give this device a 0/10 – it was almost totally useless. This is also why I started to buy Cisco gear.
  • A Cisco X2000. Again, this is another modem/router which can be placed into bridge mode (much more easily than the DGND3700). I thought it was going to be great when it held the connection for 3 days… then it crashed. We kept the device for a few months and it would last anywhere up to 7 days before crashing. I give it a 1/10. It gets one point just to make it clear that it was “better” than the DM111P.
  • A DrayTek Vigor 120. This is the latest modem I have bought. I’ve had it for a couple of weeks now and it has been rock solid. I get good speeds and it holds the ADSL connection excellently.

Unstable modems suck – especially if you are running remote services from a home server. If you go away from home and the modem dies – you’re stuck being unable to log in until you can manually reset the modem. So yeah, thanks DrayTek for making my wireless setup something I don’t need to think about anymore.

Wave File Format Implementation Errors

I’ve read through and written many wave reader/writer implementations over the years and most of them are wrong. I tend to blame this on a vast number of websites which incorrectly “document” the wave format rather than point to the original IBM/Microsoft specifications (which, to be fair, are also pretty poor).

This post describes some of the main points of contention which exist in the wave format and points out some of the common issues which I’ve seen in implementations. It is a long post but I hope it will be useful to people writing software which uses wave files.

I will refer to the following three documents in this post:

  • IBM and Microsoft’s “Multimedia Programming Interface and Data Specifications” version 1.0 dated August 1991.
  • Microsoft’s “New Multimedia Data Types and Data Techniques” version 3.0 dated April 1994.
  • The MSDN article “Multiple Channel Audio Data and WAVE Files” dated 7th March 2007.

All of these documents (including the MSDN page) are available as PDF downloads here.

Wave chunks are two-byte aligned

All chunks (even the main RIFF chunk) must be two-byte aligned (“Multimedia Programming Interface and Data Specifications”, pp. 11). A broken wave implementation which fails to do this will most of the time still input or output wave files correctly, unless:

  • Writing an 8 bit, mono wave file with an odd number of samples that has chunks following the data chunk
  • Potentially when reading/writing a compressed wave format or
  • Writing more complicated chunks like “associated data lists” which may contain strings with odd numbers of characters.

The format chunk is not a fixed sized record

The format chunk contains the structure given below followed by an optional set of “format-specific-fields” (“Multimedia Programming Interface and Data Specifications”, pp. 56):

struct {
WORD wFormatTag;        // Format category
WORD wChannels;         // Number of channels
DWORD dwSamplesPerSec;  // Sampling rate
DWORD dwAvgBytesPerSec; // For buffer estimation
WORD wBlockAlign;       // Data block size

For WAVE_FORMAT_PCM types, the “format-specific-fields” is simply a WORD-type named wBitsPerSample which:

… specifies the number of bits of data used to represent each sample of each channel
(“Multimedia Programming Interface and Data Specifications”, pp. 58)

Be aware that this definition is vague and could mean either:

  • The number of valid resolution bits or
  • The bits used by the sample container

The WAVE_FORMAT_EXTENSIBLE format tag solves this confusion by defining wBitsPerSample to be the container size and providing an additional wValidBitsPerSample field (“Multiple Channel Audio Data and WAVE Files”, pp. 3-4).

The original specification did not specify what “format-specific-fields” was for any other format. However in the 1994 update, the use of WAVEFORMATEX became mandated for any wFormatTag which is not WAVE_FORMAT_PCM (“New Multimedia Data Types and Data Techniques”, pp. 19). This structure is defined as follows:

/* general extended waveform format structure */
/* Use this for all NON PCM formats */
/* (information common to all formats) */
typedef struct waveformat_extended_tag {
WORD wFormatTag;       /* format type */
WORD nChannels;        /* number of channels (i.e. mono, stereo...) */
DWORD nSamplesPerSec;  /* sample rate */
DWORD nAvgBytesPerSec; /* for buffer estimation */
WORD nBlockAlign;      /* block size of data */
WORD wBitsPerSample;   /* Number of bits per sample of mono data */
WORD cbSize;           /* The count in bytes of the extra size */

What this means is that all wave formats contain the members up to and including wBitsPerSample. Only wave files with formats which are not WAVE_FORMAT_PCM are required to have the cbSize member.

When WAVE_FORMAT_EXTENSIBLE should be used

WAVE_FORMAT_EXTENSIBLE should be used when:

  • The channel configuration of the wave file is not mono or left/right. This is because other channel configurations are ambiguous unless WAVE_FORMAT_EXTENSIBLE is used.
  • The valid data bits per sample is not a multiple of 8. This is because the meaning of wBitsPerSample in the wave format is ambiguous unless WAVE_FORMAT_EXTENSIBLE is used.

WAVE_FORMAT_EXTENSIBLE should not be used when:

  • Compatibility with ancient, incorrect and/or broken wave reading implementations is required.

FACT chunks are required for any wave format which is not WAVE_FORMAT_PCM

The following dot points are taken from “Multimedia Programming Interface and Data Specifications” (pp. 61) and “New Multimedia Data Types and Data Techniques” (pp. 12) in relation to the ‘fact’ chunk:

  • 1991) “The ‘fact’ chunk is required if the waveform data is contained in a ‘wavl’ LIST chunk and for all compressed audio formats. The chunk is not required for PCM files using the ‘data’ chunk format.”
  • 1994) “The fact chunk is required for all new WAVE formats. The chunk is not required for the standard WAVE_FORMAT_PCM files.”
  • 1991 and 1994) “The ‘fact’ chunk will be expanded to include any other information required by future WAVE formats. Added fields will appear following the dwSampleLength field. Applications can use the chunk size field to determine which fields are present.”

From this information, it is always necessary to include a fact chunk when the wave format chunk has wFormatTag set to anything other than WAVE_FORMAT_PCM (this includes WAVE_FORMAT_IEEE_FLOAT). From recollection, Windows Media Player will not play floating point wave files which do not have a fact chunk – however I am not prepared to install Windows to validate if this is still true.

On the ordering of wave chunks

I saved the worst point for the end…

Almost every website which I have seen which attempts to define the wave format states: wave chunks can be in any order but the format chunk must precede the data chunk. One website in particular even stated (although it did not recommend) that the format chunk could come after the data chunk – this is plain wrong (“Multimedia Programming Interface and Data Specifications”, pp. 56). Unfortunately in relation to all other chunks, the specification appears to be inconsistent. The original specifications states while defining the RIFF format:

… Following the form-type code is a series of subchunks. Which subchunks are present depends on the form type. The definition of a particular RIFF form typically includes the following:

  • A unique four-character code identifying the form type
  • A list of mandatory chunks
  • A list of optional chunks
  • Possibly, a required order for the chunks

Multimedia Programming Interface and Data Specifications, page 12

The presence of the fourth point hints that maybe the ordering of the chunks does not matter. However, on reading the “Extended Notation for Representing RIFF Form Definitions” (page 17 onwards), many examples are given which seem to suggest that mandatory ordering is implied through the grammar. See the sections on <name:type>, [elements], element… and [element]….

The form of a RIFF/WAVE file is defined on page 56 of “Multimedia Programming Interface and Data Specifications” and again on page 12 of “New Multimedia Data Types and Data Techniques” as:

<fmt-ck>            // Format
[<fact-ck>]         // Fact chunk
[<cue-ck>]          // Cue points
[<playlist-ck>]     // Playlist
[<assoc-data-list>] // Associated data list
<wave-data>         // Wave data

Given the notation examples, this grammar would suggest that the chunks listed should be supplied in that order. Chunks which are not listed in the grammar, but still defined for the form type (for example, smpl or inst chunks), presumably can be located anywhere.

That being said, if we look at the grammar for the associated data list chunk (pp. 63 of the wave specification):

<labl-ck> // Label
<note-ck> // Note
<ltxt-ck> // Text with data length
<file-ck> // Media file

It would appear that this list must contain exactly one label, one note, one labelled text and one media file chunk. This is clearly incorrect in the specifications given the purpose of the chunk. The definition given in “New Multimedia Data Types and Data Techniques” (pp. 14) is even worse – it is completely mangled on the page. There is a stray close-brace floating towards the end of the definition (which hints that there may have been an attempt to fix the issue) but it is still incorrect. All this tends to suggest that the grammar used to define RIFF/WAVE chunks is unreliable.

In writing an implementation, I would presume that the most correct solution would be to write the chunks in the order supplied by the grammar with any additional chunks being written anywhere. *sigh* If anyone can provide more information from a reliable source to clarify the chunk ordering issue, I would appreciate it.

That’s enough. I’m going to make dinner.