Category Archives: Software

wxWidgets, C++ libraries and C++11

Building wxWidgets on OS X targeting libc++

It seems right to put this at the top of the post for easy access (probably for my own reference).

To get a configuration of wxWidgets (I am using version 3.0.0) which will use the libc++ as the standard library implementation, the following command line works (using Apple LLVM version 5.0 clang-500.2.79):


../configure --disable-shared --enable-unicode --with-cocoa --with-macosx-version-min=10.7 --with-macosx-sdk=/Developer/SDKs/MacOSX10.7.sdk CXXFLAGS="-std=c++0x -stdlib=libc++" CPPFLAGS="-stdlib=libc++" LIBS=-lc++

-std=c++0x (I know, this is deprecated syntax) tells the compiler that we want C++11 features.
-stdlib=libc++ tells the compiler we want to use the libc++ standard library implementation (rather than the libstdc++ implementation.

This will produce a static, unicode build of wxWidgets without debug information. The flags will not work with --with-macosx-version-min set to anything less than 10.7 because -stdlib=libc++ requires this as a minimum.

Why build wxWidgets on OS X targeting libc++

OS X currently ships with two C++ libraries, libstdc++ and libc++. libc++ is reasonably new and completely supports C++11. libstdc++ (on OS X anyway) is very old and only supports a subset of C++03. Unless you specify otherwise, building an application with clang will produce object code which expects to link against libstdc++ targeting the C++98 standard. If you are building C++11 code and only add -std=c++0x to your compiler arguments, your application may fail to compile because the standard library might not have all of the features which you require. In short, if you require C++11 support on OS X, you probably want to migrate over to libc++ for your standard library.

If you build a static library with C++98 targeting libstdc++ and try to link it against an application targeting libc++, you are probably going to get errors looking something like (for wxWidgets anyway):


Undefined symbols for architecture x86_64:
"std::basic_string, std::allocator >::find_last_of(wchar_t const*, unsigned long, unsigned long) const", referenced from:
wxFileName::SplitPath(wxString const&, wxString*, wxString*, wxString*, wxString*, bool*, wxPathFormat) in libwx_baseu-3.0.a(baselib_filename.o)
"std::basic_string, std::allocator >::find_first_of(wchar_t const*, unsigned long, unsigned long) const", referenced from:
wxLocale::GetSystemLanguage() in libwx_baseu-3.0.a(baselib_intl.o)
wxFileName::SplitVolume(wxString const&, wxString*, wxString*, wxPathFormat) in libwx_baseu-3.0.a(baselib_filename.o)
wxRegExImpl::Replace(wxString*, wxString const&, unsigned long) const in libwx_baseu-3.0.a(baselib_regex.o)
wxString::find_first_of(char const*, unsigned long) const in libwx_baseu-3.0.a(baselib_mimecmn.o)
wxString::find_first_of(char const*, unsigned long) const in libwx_osx_cocoau_core-3.0.a(corelib_osx_cocoa_button.o)

... which will continue for several hundred lines.

This is because libstdc++ and libc++ are not fully ABI compatible. When your libc++ application tries to link against a library expecting libstdc++, you are going to have major unresolved symbol issues unless you use a very minimal subset of C++11. Bugger.

Edit: I just found this excellent post Marshall's C++ Musings - Clang and standard libraries on Mac OS X which is very relevant to the topic.

A short story

A few days ago, I had a lengthy reflection on how I used to "write code" when I was a boy (somewhere between 12 and 15 years old); I cannot remember what initialised the reflection, but it seemed interesting enough so I thought I would share it...

When I was a boy, my programs executed slowly. I don't think this is unexpected from programs developed by a kid, but on many occasions, mine were slow deliberately. I had this insane idea that by making my program do something in a complicated and/or convoluted way, it would be "smarter" or "better". It would be more awesome than the version that completed almost instantly.

I'm not sure what brought on this behaviour. Perhaps it was rooted in a desire to boast? i.e. because nobody I grew up with could program at all, I could boast about how complicated the task was which my program was solving and they wouldn't know any better? Another possible reason was related to computer games: I used to play games on my 486 DX and almost every game which I played had some sort of "loading" screen. The loading screen was where I would be anticipating what was to come - very soon, I was going to be placed into the action of the game. Maybe this instilled some idea that anything worth doing or having requires some period of waiting? Deep.

Thankfully, I abandoned this attitude a long time ago. It is a mind-numbingly stupid attitude.

However, I was lead to ponder the question: are there any adult software developers out there who actually write code like this? Maybe it's not the code that ends up being shipped (because it cannot meet performance requirements), but some internal thing that assists in creating the code which will ship? Maybe they think to themselves: "Ah-ha! I will make this code slow and - because they don't understand it - they will think I am a genius! Nobody will ever know!"? It all goes well for them until somebody proves their solution is so utterly inferior that they could not possibly have created it by accident... even the thought that these people could be employed somewhere makes me cringe.

Linking C Static Libraries With Duplicate Symbols

I came across some interesting linker behaviour today. I was vehemently stating to a colleague that: if I have two static libraries which both contain a symbol "foo" and I try to link those libraries into an executable, I will get a symbol clash and the link should fail. Interestingly, in the test program I wrote this did not happen. I read through "man ld" and it seemed to me like the link should fail so I set about figuring out why my test program linked. I am using GCC 4.6.1 running on Ubuntu 11.10 x64 for all of these results.

Follows are 5 small source files:

/* foo1.c */
int foo(int x)
{
return x;
}
 
/* foo2a.c */
int foo(int x)
{
return x + 1;
}
 
/* foo2b.c */
int foo(int x)
{
return x + 1;
}
int bar(int x)
{
return x + 10;
}
 
/* test2a.c */
#include <stdio.h>
#include <stdlib.h>
extern int foo(int x);
int main(int argc, char *argv[])
{
int x = foo(5);
printf("%d\n", x);
exit(0);
}
 
/* test2b.c */
#include <stdio.h>
#include <stdlib.h>
extern int foo(int x);
extern int bar(int x);
int main(int argc, char *argv[])
{
int x = foo(bar(5));
printf("%d\n", x);
exit(0);
}

foo1.c, foo2a.c and foo2b.c should be archived as follows:

gcc -c foo1.c -o foo1.o
ar rcs libfoo1.a foo1.o
gcc -c foo2a.c -o foo2a.o
ar rcs libfoo2a.a foo2a.o
gcc -c foo2b.c -o foo2b.o
ar rcs libfoo2b.a foo2b.o

This creates three libraries:

  • libfoo1.a - contains an implementation of the function foo() which returns the argument.
  • libfoo2a.a - contains an implementation of the function foo() which returns the argument plus one.
  • libfoo2b.a - contains an implementation of the function foo() which returns the argument plus one as well as a function bar() which returns the argument plus ten.

All of the libraries contain the symbol "foo" so I would expect the linker to fail in any case where I link more than one of these libraries.

The first test program calls foo(5) and prints the return value. For t1, the executable is linked first with libfoo1 then libfoo2a. For t2, libfoo2a then libfoo1.

gcc -c foo1.c -o foo1.o
nappleton@nickvm:~/Desktop$ gcc -c testa.c -o testa.o
nappleton@nickvm:~/Desktop$ gcc -o t1 testa.o -L. -lfoo1 -lfoo2a && ./t1
5
nappleton@nickvm:~/Desktop$ gcc -o t2 testa.o -L. -lfoo2a -lfoo1 && ./t2
6

This is exactly the test setup which I ran for my colleague. It shows that the program does in-fact link and that the ordering of the libraries matters. The first library specified on the command line is the one with the foo() implementation which will be used. The second one appears to be ignored.

The second test case is more interesting. The program calls foo(bar(5)) and prints the value. For t3, the executable is linked first with libfoo1 then libfoo2b. For t4, libfoo2b is linked first then libfoo1.

nappleton@nickvm:~/Desktop$ gcc -c testb.c -o testb.o
nappleton@nickvm:~/Desktop$ gcc -o t3 testb.o -L. -lfoo1 -lfoo2b && ./t3
./libfoo2b.a(foo2b.o): In function `foo':
foo2b.c:(.text+0x0): multiple definition of `foo'
./libfoo1.a(foo1.o):foo1.c:(.text+0x0): first defined here
collect2: ld returned 1 exit status
nappleton@nickvm:~/Desktop$ gcc -o t4 testb.o -L. -lfoo2b -lfoo1 && ./t4
16

Based on these results, I am guessing that the linker stops searching libraries once all unresolved symbols have been found. This behaviour would explain why t1, t2 and t4 build successfully without multiple definition errors. t3 fails to build because after linking against libfoo1, foo is found but bar is still unresolved; foo2b is then searched, foo is found again and the linker explodes.

I'm not sure if the behaviour is necessarily bad. It seems reasonable for a linker to stop searching once all symbols have been found. However, it would be nice to have an option to be informed when I am doing something which is likely to be stupid. A warning "libraries x, y and z were not searched because all symbols were already resolved" might be nice.

An interesting note: on OS X, if the -all_load flag is passed to the linker, all of these programs fail to build as the linker tries to add all symbols from all libraries even if all of the unresolved symbols have been found.

Why you should not use C99 exact-width integer types

Is there really ever a time where you need an integer type containing exactly N-bits? There are C99 types which guarantee at least N-bits. There are even C90 types which guarantee at least 8, 16 and 32 bits (the standard C integer types). Why not use one of those?

I never use C99 exact-width types in code... ever. Chances are that you shouldn't either because:

Exact width integer types reduce portability

This is because:

1) Exact width integer types do not exist before C99

Sure you could create an abstraction that detects if the standard is less than C99 and introduce the types, but then you would be overriding the POSIX namespace by defining your own integer types suffixed with "_t". POSIX.1-2008 - The System Interfaces: 2.2.2 The Name Space

GCC also will not like you:

The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names.
GNU libc manual: 1.3.3 Reserved Names

From my own experience using GCC on OS X, the fixed width types are defined even when using --std=c90, meaning you'll just get errors if you try to redefine them. Bummer.

2) Exact width integer types are not guaranteed to exist at all:

These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32 or 64 bits, it shall define the corresponding typedef names.
ISO/IEC 9899:1999 - 7.18.1.1 Exact-width integer types

Even in C99, the (u)intN_t type does not need to exist unless there is a native integer type of that width. You may argue and say that there are not many platforms which do not have these types - there are: DSPs. If you start using these types, you limit the platforms on which your software can run - and are also probably developing bad habits.

Using exact width integer types could have a negative performance impact

If you need at least N-bits and it does not matter if there are more, why restrict yourself to a type which which could require additional overhead? If you are writing C99 code, use one of the (u)int_fastN_t types. Maybe, you could even use a standard C integer type!

The endianness of exact width integer types is unspecified

I am not not implying that the endianness is specified for other C types. I am just trying to make a point: you cannot even use these types for portable serialisation/de-serialisation without feral-octet-swapping-macro-garbage as the underlying layout of the type is system dependent.

If you are interested in the conditions for when memcpy can be used to copy memory into a particular type, maybe you should check out the abstraction which is part of my digest program. It contains a heap of checks to ensure that memcpy is only used on systems when it is known that it will do the right thing. It tries to deal with potential padding, non 8-bit chars and endianness in a clean way that isn't broken.

This article deliberately did not discuss the signed variants of these types...