Fixing GCC's code generation for the MaverickCrunch FPU

Martin Guy <martinwguy@gmail.com> 8 September 2009
Last updated: 25 July 2013

Contents A blood-and-guts image with title
Image found on the "Gears of War" site during a web search for "Maverick 9312"

Preamble

I've been working on GCC-4 to make it generate working code for the Cirrus Logic MaverickCrunch FPU, as found in their ARM-based EP9302, EP9307, EP9312 and EP9315 chips, making floating point-intensive code between 2.5 and 4 times faster.

This follows on from Hasjim Williams' earlier work with gcc-4.1.2 and 4.2.0, a bundle of his more recent ideas and more hacks from me.

If you want to understand the patches themselves, there is an article about the MaverickCrunch FPU and GCC's problems with it on the Debian wiki and I have added commentary at the top of the individual patch files for gcc-4.3.4 or for gcc-4.2.4.

Discussion about this (and other issues with these chips) happens on the linux-cirrus mailing list.

What it does

The 20090908 version

Correctness tests

The compiler passes the following floating point-intensive test suites:

Speed tests

To compare execution speed of floating point-intensive tasks:

The results, on a 200MHz Cirrus Logic EP9307 revision E1 under Debian "armel":

Compiler/options FFTW
mflops
LAME
seconds
libgsm(*)
seconds
gcc-4.3-crunch -mcpu=ep9312 -O2 -ffast-math (softfloat) 3.59365 23.5
gcc-4.3-crunch -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -O2 -ffast-math -mieee 3.83276 -
gcc-4.3-crunch -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -O2 -ffast-math 5.94145 5.72
gcc-4.2-crunch -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -O2 -ffast-math 6.13138 5.72

In other words, using the full Maverick instruction set, LAME is 2.5 times faster than with softfloat, and when just using the -mieee subset, it runs 25% faster or about half the speed of the full set, and gcc-4.2 produces significantly faster code than gcc-4.3.

(*) Although crunch libgsm is 4 times faster than softfloat, libgsm also has a fixed-point encoder, selected with MULHACK='', which is faster still (the same is true of the speex encoder).

Download

The installable binary tarballs, packages and patches are hosted by SimpleMachines.it.

There are also repositories of prebuilt crunch-accelerated Debian packages. See simplemachines.it/debian

Using it

To get MaverickCrunch instructions you have to add the three options:
  gcc-4.3-crunch -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp
Other relevant options are:
-fno-signed-zeros
(gcc-4.3 only) If your program does not care about the difference between 0 and -0, you can use this flag to enable the Maverick 'negate' instructions for a little extra speed.
-ffinite-math-only
This tells the compiler that NaNs and infinities do not need to be handled; this allows further speed and optimization.
-funsafe-math-optimizations
This enables even more optimizations that may give results not in accordance with the strict IEEE-754 math standard. Among others, It enables -fno-signed-zeros and in GCC-4.2 is the least invasive way to enable the Crunch negate instructions.
-ffast-math
This is the most aggressive math optimization flag, enabling all of the above and more.
-mieee
Most of Crunch's instructions take denormal values as zero; this flag only enables the ones that work at full IEEE precision (just multiply and compare).
-mcirrus-di
The FPU also has 64-bit integer instructions but they appear to be buggy. This flag enables them (load, store, add, subtract, convert to/from 32 bit and logical shifts by up to 31 places).
When running configure scripts, I normally use:
  ./configure CC=gcc-4.3-crunch CFLAGS="-mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -ffast-math -O2"
However it's usually less trouble to make a directory of wrapper scripts replacing all of GCC's command names with the crunch version:
mkdir ~/crunch
cat > ~/crunch/gcc << EOF
#! /bin/sh

exec gcc-4.3-crunch -mcpu=ep9312 -mfpu=maverick -mfloat-abi=softfp -fno-signed-zeros "$@"
EOF
chmod 755 ~/crunch/gcc
ln -s gcc ~/crunch/cc
ln -s gcc ~/crunch/gcc-4.3
ln -s gcc ~/crunch/arm-linux-gnueabi-gcc
and fool the build system into using them
PATH=~/crunch:$PATH ./configure
PATH=~/crunch:$PATH make
PATH=~/crunch:$PATH make install
or, to build accelerated Debian packages:
apt-get source foobar
sudo apt-get build-dep foobar
cd foobar-*
PATH=~/crunch:$PATH dpkg-buildpackage -rfakeroot -B
cd ..
dpkg -i foobar*.deb

Building it from source

Resource requirements

GCC keeps on growing. One of gcc-4.3's C source files, automatically generated during the build, insn-recog.c, is now over 4 MB in size and gcc-4.3 requires 219MB of virtual memory to compile it with normal optimization.

Memory: If you have less than 160MB of physical RAM plus 64MB swap, you will need to stop the compilation, compile that one file without optimisation by saying make CFLAGS=-g and then interrupt it and carry on as usual when that one file has been done.

Disk space: The full sources unpack to 500MB (360MB for gcc-4.2) and a further 200MB (140MB for gcc-4.2) are needed to build the C compiler. If you have less space, you can fetch a "gcc-core" source tarball instead, which only contains the C compiler and unpacks to about 200MB, for a total of 400MB when built.

Build procedure

I go:
    wget ftp://sourceware.org/pub/gcc/releases/gcc-4.3.4/gcc-4.3.4.tar.bz2
    tar xjf gcc-4.3.4.tar.bz2
    cd gcc-4.3.4

    for a in `cat ../gcc-4.3.4-patches/series`
    do
	patch -p1 < ../gcc-4.3.4-patches/$a
    done

    cd ..
    mkdir gcc-4.3.4-build
    cd gcc-4.3.4-build

    # The same basic configuration as Debian
    ../gcc-4.3.4/configure --disable-nls --enable-shared --with-system-zlib \
        --without-included-gettext --enable-threads=posix \
        --enable-clocale=gnu --enable-mpfr --disable-libssp \
        --disable-bootstrap --enable-languages=c --with-arch=armv4t \
        --program-suffix=-4.3-crunch armv4tl-crunch-linux-gnueabi
    make
    ../s/install
    ../s/tarball gcc-4.3-crunch
The tarball script dumps a .tar.gz of the essential installed files and another of the source patchset in the ../packages directory.

There is also a test directory here with some program fragments that I used to probe hardware bug presence and characteristics.

Patches for other packages

The patches for GCC work fine for all C software that I've tried. Some other software packages are known to need Crunch tweaks as well:

binutils

glibc

gdb

There are fixes to glibc and binutils to solve these and other issues in a message to the linux-cirrus mailing list, though I haven't verified whether this solves the sin() looping problem or not.

Thanks

Thanks to Hasjim Williams for the work that this is based on, to SimpleMachines for funding the initial work on these patches and for hosting the tarballs, and to to Arenque Software for encouraging me to complete them.

If you find this useful, please make a donation.


Martin Guy <martinwguy@gmail.com> Useful? Donate!