Category Archives: Software

wxWidgets, C++ libraries and C++11

Building wxWidgets on OS X targeting libc++

It seems right to put this at the top of the post for easy access (probably for my own reference).

To get a configuration of wxWidgets (I am using version 3.0.0) which will use the libc++ as the standard library implementation, the following command line works (using Apple LLVM version 5.0 clang-500.2.79):


../configure --disable-shared --enable-unicode --with-cocoa --with-macosx-version-min=10.7 --with-macosx-sdk=/Developer/SDKs/MacOSX10.7.sdk CXXFLAGS="-std=c++0x -stdlib=libc++" CPPFLAGS="-stdlib=libc++" LIBS=-lc++

-std=c++0x (I know, this is deprecated syntax) tells the compiler that we want C++11 features.
-stdlib=libc++ tells the compiler we want to use the libc++ standard library implementation (rather than the libstdc++ implementation.

This will produce a static, unicode build of wxWidgets without debug information. The flags will not work with --with-macosx-version-min set to anything less than 10.7 because -stdlib=libc++ requires this as a minimum.

Why build wxWidgets on OS X targeting libc++

OS X currently ships with two C++ libraries, libstdc++ and libc++. libc++ is reasonably new and completely supports C++11. libstdc++ (on OS X anyway) is very old and only supports a subset of C++03. Unless you specify otherwise, building an application with clang will produce object code which expects to link against libstdc++ targeting the C++98 standard. If you are building C++11 code and only add -std=c++0x to your compiler arguments, your application may fail to compile because the standard library might not have all of the features which you require. In short, if you require C++11 support on OS X, you probably want to migrate over to libc++ for your standard library.

If you build a static library with C++98 targeting libstdc++ and try to link it against an application targeting libc++, you are probably going to get errors looking something like (for wxWidgets anyway):


Undefined symbols for architecture x86_64:
"std::basic_string, std::allocator >::find_last_of(wchar_t const*, unsigned long, unsigned long) const", referenced from:
wxFileName::SplitPath(wxString const&, wxString*, wxString*, wxString*, wxString*, bool*, wxPathFormat) in libwx_baseu-3.0.a(baselib_filename.o)
"std::basic_string, std::allocator >::find_first_of(wchar_t const*, unsigned long, unsigned long) const", referenced from:
wxLocale::GetSystemLanguage() in libwx_baseu-3.0.a(baselib_intl.o)
wxFileName::SplitVolume(wxString const&, wxString*, wxString*, wxPathFormat) in libwx_baseu-3.0.a(baselib_filename.o)
wxRegExImpl::Replace(wxString*, wxString const&, unsigned long) const in libwx_baseu-3.0.a(baselib_regex.o)
wxString::find_first_of(char const*, unsigned long) const in libwx_baseu-3.0.a(baselib_mimecmn.o)
wxString::find_first_of(char const*, unsigned long) const in libwx_osx_cocoau_core-3.0.a(corelib_osx_cocoa_button.o)

… which will continue for several hundred lines.

This is because libstdc++ and libc++ are not fully ABI compatible. When your libc++ application tries to link against a library expecting libstdc++, you are going to have major unresolved symbol issues unless you use a very minimal subset of C++11. Bugger.

Edit: I just found this excellent post Marshall’s C++ Musings – Clang and standard libraries on Mac OS X which is very relevant to the topic.

Linking C Static Libraries With Duplicate Symbols

I came across some interesting linker behaviour today. I was vehemently stating to a colleague that: if I have two static libraries which both contain a symbol “foo” and I try to link those libraries into an executable, I will get a symbol clash and the link should fail. Interestingly, in the test program I wrote this did not happen. I read through “man ld” and it seemed to me like the link should fail so I set about figuring out why my test program linked. I am using GCC 4.6.1 running on Ubuntu 11.10 x64 for all of these results.

Follows are 5 small source files:

/* foo1.c */
int foo(int x)
{
return x;
}
 
/* foo2a.c */
int foo(int x)
{
return x + 1;
}
 
/* foo2b.c */
int foo(int x)
{
return x + 1;
}
int bar(int x)
{
return x + 10;
}
 
/* test2a.c */
#include <stdio.h>
#include <stdlib.h>
extern int foo(int x);
int main(int argc, char *argv[])
{
int x = foo(5);
printf("%d\n", x);
exit(0);
}
 
/* test2b.c */
#include <stdio.h>
#include <stdlib.h>
extern int foo(int x);
extern int bar(int x);
int main(int argc, char *argv[])
{
int x = foo(bar(5));
printf("%d\n", x);
exit(0);
}

foo1.c, foo2a.c and foo2b.c should be archived as follows:

gcc -c foo1.c -o foo1.o
ar rcs libfoo1.a foo1.o
gcc -c foo2a.c -o foo2a.o
ar rcs libfoo2a.a foo2a.o
gcc -c foo2b.c -o foo2b.o
ar rcs libfoo2b.a foo2b.o

This creates three libraries:

  • libfoo1.a – contains an implementation of the function foo() which returns the argument.
  • libfoo2a.a – contains an implementation of the function foo() which returns the argument plus one.
  • libfoo2b.a – contains an implementation of the function foo() which returns the argument plus one as well as a function bar() which returns the argument plus ten.

All of the libraries contain the symbol “foo” so I would expect the linker to fail in any case where I link more than one of these libraries.

The first test program calls foo(5) and prints the return value. For t1, the executable is linked first with libfoo1 then libfoo2a. For t2, libfoo2a then libfoo1.

gcc -c foo1.c -o foo1.o
nappleton@nickvm:~/Desktop$ gcc -c testa.c -o testa.o
nappleton@nickvm:~/Desktop$ gcc -o t1 testa.o -L. -lfoo1 -lfoo2a && ./t1
5
nappleton@nickvm:~/Desktop$ gcc -o t2 testa.o -L. -lfoo2a -lfoo1 && ./t2
6

This is exactly the test setup which I ran for my colleague. It shows that the program does in-fact link and that the ordering of the libraries matters. The first library specified on the command line is the one with the foo() implementation which will be used. The second one appears to be ignored.

The second test case is more interesting. The program calls foo(bar(5)) and prints the value. For t3, the executable is linked first with libfoo1 then libfoo2b. For t4, libfoo2b is linked first then libfoo1.

nappleton@nickvm:~/Desktop$ gcc -c testb.c -o testb.o
nappleton@nickvm:~/Desktop$ gcc -o t3 testb.o -L. -lfoo1 -lfoo2b && ./t3
./libfoo2b.a(foo2b.o): In function `foo':
foo2b.c:(.text+0x0): multiple definition of `foo'
./libfoo1.a(foo1.o):foo1.c:(.text+0x0): first defined here
collect2: ld returned 1 exit status
nappleton@nickvm:~/Desktop$ gcc -o t4 testb.o -L. -lfoo2b -lfoo1 && ./t4
16

Based on these results, I am guessing that the linker stops searching libraries once all unresolved symbols have been found. This behaviour would explain why t1, t2 and t4 build successfully without multiple definition errors. t3 fails to build because after linking against libfoo1, foo is found but bar is still unresolved; foo2b is then searched, foo is found again and the linker explodes.

I’m not sure if the behaviour is necessarily bad. It seems reasonable for a linker to stop searching once all symbols have been found. However, it would be nice to have an option to be informed when I am doing something which is likely to be stupid. A warning “libraries x, y and z were not searched because all symbols were already resolved” might be nice.

An interesting note: on OS X, if the -all_load flag is passed to the linker, all of these programs fail to build as the linker tries to add all symbols from all libraries even if all of the unresolved symbols have been found.

Why you should not use C99 exact-width integer types

Is there really ever a time where you need an integer type containing exactly N-bits? There are C99 types which guarantee at least N-bits. There are even C90 types which guarantee at least 8, 16 and 32 bits (the standard C integer types). Why not use one of those?

I never use C99 exact-width types in code… ever. Chances are that you shouldn’t either because:

Exact width integer types reduce portability

This is because:

1) Exact width integer types do not exist before C99

Sure you could create an abstraction that detects if the standard is less than C99 and introduce the types, but then you would be overriding the POSIX namespace by defining your own integer types suffixed with “_t”. POSIX.1-2008 – The System Interfaces: 2.2.2 The Name Space

GCC also will not like you:

The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names.
GNU libc manual: 1.3.3 Reserved Names

From my own experience using GCC on OS X, the fixed width types are defined even when using --std=c90, meaning you’ll just get errors if you try to redefine them. Bummer.

2) Exact width integer types are not guaranteed to exist at all:

These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32 or 64 bits, it shall define the corresponding typedef names.
ISO/IEC 9899:1999 – 7.18.1.1 Exact-width integer types

Even in C99, the (u)intN_t type does not need to exist unless there is a native integer type of that width. You may argue and say that there are not many platforms which do not have these types – there are: DSPs. If you start using these types, you limit the platforms on which your software can run – and are also probably developing bad habits.

Using exact width integer types could have a negative performance impact

If you need at least N-bits and it does not matter if there are more, why restrict yourself to a type which which could require additional overhead? If you are writing C99 code, use one of the (u)int_fastN_t types. Maybe, you could even use a standard C integer type!

The endianness of exact width integer types is unspecified

I am not not implying that the endianness is specified for other C types. I am just trying to make a point: you cannot even use these types for portable serialisation/de-serialisation without feral-octet-swapping-macro-garbage as the underlying layout of the type is system dependent.

If you are interested in the conditions for when memcpy can be used to copy memory into a particular type, maybe you should check out the abstraction which is part of my digest program. It contains a heap of checks to ensure that memcpy is only used on systems when it is known that it will do the right thing. It tries to deal with potential padding, non 8-bit chars and endianness in a clean way that isn’t broken.

This article deliberately did not discuss the signed variants of these types…

Wave File Format Implementation Errors

I’ve read through and written many wave reader/writer implementations over the years and most of them are wrong. I tend to blame this on a vast number of websites which incorrectly “document” the wave format rather than point to the original IBM/Microsoft specifications (which, to be fair, are also pretty poor).

This post describes some of the main points of contention which exist in the wave format and points out some of the common issues which I’ve seen in implementations. It is a long post but I hope it will be useful to people writing software which uses wave files.

I will refer to the following three documents in this post:

  • IBM and Microsoft’s “Multimedia Programming Interface and Data Specifications” version 1.0 dated August 1991.
  • Microsoft’s “New Multimedia Data Types and Data Techniques” version 3.0 dated April 1994.
  • The MSDN article “Multiple Channel Audio Data and WAVE Files” dated 7th March 2007.

All of these documents (including the MSDN page) are available as PDF downloads here.

Wave chunks are two-byte aligned

All chunks (even the main RIFF chunk) must be two-byte aligned (“Multimedia Programming Interface and Data Specifications”, pp. 11). A broken wave implementation which fails to do this will most of the time still input or output wave files correctly, unless:

  • Writing an 8 bit, mono wave file with an odd number of samples that has chunks following the data chunk
  • Potentially when reading/writing a compressed wave format or
  • Writing more complicated chunks like “associated data lists” which may contain strings with odd numbers of characters.

The format chunk is not a fixed sized record

The format chunk contains the structure given below followed by an optional set of “format-specific-fields” (“Multimedia Programming Interface and Data Specifications”, pp. 56):

struct {
WORD wFormatTag;        // Format category
WORD wChannels;         // Number of channels
DWORD dwSamplesPerSec;  // Sampling rate
DWORD dwAvgBytesPerSec; // For buffer estimation
WORD wBlockAlign;       // Data block size
};

For WAVE_FORMAT_PCM types, the “format-specific-fields” is simply a WORD-type named wBitsPerSample which:

… specifies the number of bits of data used to represent each sample of each channel
(“Multimedia Programming Interface and Data Specifications”, pp. 58)

Be aware that this definition is vague and could mean either:

  • The number of valid resolution bits or
  • The bits used by the sample container

The WAVE_FORMAT_EXTENSIBLE format tag solves this confusion by defining wBitsPerSample to be the container size and providing an additional wValidBitsPerSample field (“Multiple Channel Audio Data and WAVE Files”, pp. 3-4).

The original specification did not specify what “format-specific-fields” was for any other format. However in the 1994 update, the use of WAVEFORMATEX became mandated for any wFormatTag which is not WAVE_FORMAT_PCM (“New Multimedia Data Types and Data Techniques”, pp. 19). This structure is defined as follows:

/* general extended waveform format structure */
/* Use this for all NON PCM formats */
/* (information common to all formats) */
typedef struct waveformat_extended_tag {
WORD wFormatTag;       /* format type */
WORD nChannels;        /* number of channels (i.e. mono, stereo...) */
DWORD nSamplesPerSec;  /* sample rate */
DWORD nAvgBytesPerSec; /* for buffer estimation */
WORD nBlockAlign;      /* block size of data */
WORD wBitsPerSample;   /* Number of bits per sample of mono data */
WORD cbSize;           /* The count in bytes of the extra size */
} WAVEFORMATEX;

What this means is that all wave formats contain the members up to and including wBitsPerSample. Only wave files with formats which are not WAVE_FORMAT_PCM are required to have the cbSize member.

When WAVE_FORMAT_EXTENSIBLE should be used

WAVE_FORMAT_EXTENSIBLE should be used when:

  • The channel configuration of the wave file is not mono or left/right. This is because other channel configurations are ambiguous unless WAVE_FORMAT_EXTENSIBLE is used.
  • The valid data bits per sample is not a multiple of 8. This is because the meaning of wBitsPerSample in the wave format is ambiguous unless WAVE_FORMAT_EXTENSIBLE is used.

WAVE_FORMAT_EXTENSIBLE should not be used when:

  • Compatibility with ancient, incorrect and/or broken wave reading implementations is required.

FACT chunks are required for any wave format which is not WAVE_FORMAT_PCM

The following dot points are taken from “Multimedia Programming Interface and Data Specifications” (pp. 61) and “New Multimedia Data Types and Data Techniques” (pp. 12) in relation to the ‘fact’ chunk:

  • 1991) “The ‘fact’ chunk is required if the waveform data is contained in a ‘wavl’ LIST chunk and for all compressed audio formats. The chunk is not required for PCM files using the ‘data’ chunk format.”
  • 1994) “The fact chunk is required for all new WAVE formats. The chunk is not required for the standard WAVE_FORMAT_PCM files.”
  • 1991 and 1994) “The ‘fact’ chunk will be expanded to include any other information required by future WAVE formats. Added fields will appear following the dwSampleLength field. Applications can use the chunk size field to determine which fields are present.”

From this information, it is always necessary to include a fact chunk when the wave format chunk has wFormatTag set to anything other than WAVE_FORMAT_PCM (this includes WAVE_FORMAT_IEEE_FLOAT). From recollection, Windows Media Player will not play floating point wave files which do not have a fact chunk – however I am not prepared to install Windows to validate if this is still true.

On the ordering of wave chunks

I saved the worst point for the end…

Almost every website which I have seen which attempts to define the wave format states: wave chunks can be in any order but the format chunk must precede the data chunk. One website in particular even stated (although it did not recommend) that the format chunk could come after the data chunk – this is plain wrong (“Multimedia Programming Interface and Data Specifications”, pp. 56). Unfortunately in relation to all other chunks, the specification appears to be inconsistent. The original specifications states while defining the RIFF format:

… Following the form-type code is a series of subchunks. Which subchunks are present depends on the form type. The definition of a particular RIFF form typically includes the following:

  • A unique four-character code identifying the form type
  • A list of mandatory chunks
  • A list of optional chunks
  • Possibly, a required order for the chunks

Multimedia Programming Interface and Data Specifications, page 12

The presence of the fourth point hints that maybe the ordering of the chunks does not matter. However, on reading the “Extended Notation for Representing RIFF Form Definitions” (page 17 onwards), many examples are given which seem to suggest that mandatory ordering is implied through the grammar. See the sections on <name:type>, [elements], element… and [element]….

The form of a RIFF/WAVE file is defined on page 56 of “Multimedia Programming Interface and Data Specifications” and again on page 12 of “New Multimedia Data Types and Data Techniques” as:

RIFF(
'WAVE'
<fmt-ck>            // Format
[<fact-ck>]         // Fact chunk
[<cue-ck>]          // Cue points
[<playlist-ck>]     // Playlist
[<assoc-data-list>] // Associated data list
<wave-data>         // Wave data
)

Given the notation examples, this grammar would suggest that the chunks listed should be supplied in that order. Chunks which are not listed in the grammar, but still defined for the form type (for example, smpl or inst chunks), presumably can be located anywhere.

That being said, if we look at the grammar for the associated data list chunk (pp. 63 of the wave specification):

LIST(
'adtl'
<labl-ck> // Label
<note-ck> // Note
<ltxt-ck> // Text with data length
<file-ck> // Media file
)

It would appear that this list must contain exactly one label, one note, one labelled text and one media file chunk. This is clearly incorrect in the specifications given the purpose of the chunk. The definition given in “New Multimedia Data Types and Data Techniques” (pp. 14) is even worse – it is completely mangled on the page. There is a stray close-brace floating towards the end of the definition (which hints that there may have been an attempt to fix the issue) but it is still incorrect. All this tends to suggest that the grammar used to define RIFF/WAVE chunks is unreliable.

In writing an implementation, I would presume that the most correct solution would be to write the chunks in the order supplied by the grammar with any additional chunks being written anywhere. *sigh* If anyone can provide more information from a reliable source to clarify the chunk ordering issue, I would appreciate it.

That’s enough. I’m going to make dinner.