Using later GCC with 2.4 kernels

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Russell King - ARM Linux
Date:  
To: linux-arm-kernel
Subject: Using later GCC with 2.4 kernels
This is a note concerning the use of later versions of GCC with 2.4 kernels.
The comments in here apply to any version of kernel compiled with a version
of GCC later than the version used by the x86 community to build it.

For those who want to read an executive summary:

:: If you build a kernel with a later version of gcc than the rest of
:: the community, you could find that you have really obscure problems
:: that take a long time to track down and solve, and you may be on
:: your own.


Ok, on to the real thing. Here are several points to note:

1. Both GCC and the Linux kernel contain a lot of code, some of it
dates back around a decade (although the amount that is that old
is gradually getting smaller.)

2. Bugs get fixed and new features get added to both gcc and the kernel.
With "new features in gcc", these can be new methods of optimising,
and arranging the code.

3. There are such things as "recommended compilers" for Linux kernels.
These are versions which have been used throughout the stabilising
period of the kernel and have proven themselves.

4. The kernel will work around known bugs in the "recommended compilers".

5. Annoyingly for some, there is no published list of "recommended
compilers".

I'd like to bring to peoples attention the one major risk they run building
a kernel with a later compiler than the "recommended compiler." This is
only a risk though; whether it is acceptable or not is something that you
have to decide for yourself.

Take a well tested, stable kernel source tree. Build it with the
recommended compiler. Chances are it'll be stable.

Now take the same kernel source tree and build it with your brand new
GCC 3.2.1. It might appear that it is stable on the surface, but is
it really stable?

To highlight this point, lets take Deepak's GCC 3.2.1 problem.
The kernel contains the following code:

        add_wait_queue(&wq, &__wait);
        for (;;) {
                do { current->state = TASK_UNINTERRUPTIBLE; } while (0)
                if (condition)
                        break;
                schedule();
        }
        current->state = TASK_RUNNING;
        remove_wait_queue(&wq, &__wait);


The above is part of the code that is used throughout the kernel to
put a thread to sleep, and wait for a "condition" to become true.
"condition" could be a flag indicating that a particular NFS packet
has been received, you've touched the touchscreen, a character has
been received on the serial port, or whatever.

Lets take a moment to think about how the above code works, and what
happens when a thread gets interrupted to service a device that
will ultimately satisfy the condition. The following are for a
kernel built with gcc 3.2.0 or earlier:

    Thread                    Interrupt
    ---------------------------------------------------------------------
    add_wait_queue()
                        wake_up
                        current->state = TASK_RUNNING
                        condition = TRUE
    current->state = TASK_UNINTERRUPTIBLE
    if (condition) break
    /* condition is true */
    current->state = TASK_RUNNING
    remove_wait_queue()


=> ok.

    Thread                    Interrupt
    ---------------------------------------------------------------------
    add_wait_queue()
    current->state = TASK_UNINTERRUPTIBLE
                        wake_up
                        current->state = TASK_RUNNING
                        condition = TRUE
    if (condition) break
    /* condition is true */
    current->state = TASK_RUNNING
    remove_wait_queue()


=> ok.

    Thread                    Interrupt
    ---------------------------------------------------------------------
    add_wait_queue()
    current->state = TASK_UNINTERRUPTIBLE
    if (condition) break
    /* condition is false */
                        wake_up
                        current->state = TASK_RUNNING
                        condition = TRUE
    schedule()
    /* schedule returns because current->state = TASK_RUNNING */
    current->state = TASK_RUNNING
    remove_wait_queue()


=> ok.

    Thread                    Interrupt
    ---------------------------------------------------------------------
    add_wait_queue()
    current->state = TASK_UNINTERRUPTIBLE
    if (condition) break
    /* condition is false */
    schedule()
                        wake_up
                        current->state = TASK_RUNNING
                        condition = TRUE
    /* schedule returns only after current->state = TASK_RUNNING */
    current->state = TASK_RUNNING
    remove_wait_queue()


=> ok.

Now lets see what happens when a new compiler (eg, gcc 3.2.1) is used.
gcc 3.2.1 decides to make an optimisation, and change the order of the
load of condition from memory, and the store of current->state:

    Thread                    Interrupt
    ---------------------------------------------------------------------
    add_wait_queue()
    register = condition
                        wake_up
                        current->state = TASK_RUNNING
                        condition = TRUE
    current->state = TASK_UNINTERRUPTIBLE
    if (register) break
    /* condition is false */
    schedule()
    /*
     * schedule returns only after current->state = TASK_RUNNING
     * but its set to TASK_UNINTERRUPTIBLE
     */
    current->state = TASK_RUNNING
    remove_wait_queue()


Oh dear. We're now waiting for an event that may never ever come.

The thing to realise is that this is a _perfectly legal optimisation_ for
the compiler to make, but that optimisation has broken some code that has
been well tested. This same C code is, after all, being used by everyone
who is running a 2.4 kernel on no matter what architecture. But because
you've used a compiler that performs a new optimisation, it has broken.

In Deepak's case, it is exactly this that caused the problem:

        bl      add_wait_queue
.L444:                    @ for (;;) {
        ldrb    r3, [r4, #127]        @   read "condition" into register
        bic     r5, sp, #8128
        tst     r3, #1
        bic     r5, r5, #63
        mov     r3, #2            @   TASK_UNINTERRUPTIBLE
        str     r3, [r5, #0]        @   set current->state
        mov     sl, #0
        beq     .L445            @   if (register) break;
        bl      schedule        @   schedule();
        b       .L444            @ }
.L445:
        ldr     r7, .L483
        mov     r0, r6
        mov     r1, r8
        str     sl, [r5, #0]        @ current->state = TASK_RUNNING
        bl      remove_wait_queue


I hope that people see that building large sections of code with new
compiler versions is not something that should be taken lightly. It
should be done in full knowledge that these types of problems may occur,
which will require you to get stuck in and understand the code in any
part of the kernel, or provide someone else with detailed information,
maybe even the actual code.

If you're not fully happy with that, then please take note of the risk
you're taking and remember that there's always earlier compilers that
might be more suitable for the task you're performing.


The compiler versions _I_ trust for 2.5 are:

    gcc 2.95.3-3_nw1    (NetWinder gcc)
    gcc 3.2            (x86 -> ARM)
    Red Hat rawhide gcc 3.2-4 built for ARM


For 2.4:

    egcs-2.91.66
    gcc 2.95.1
    gcc 2.95.3-3_nw1


How do you get a trustworthy compiler? By building _lots_ and _lots_
of code with it, and finding the reason for problems with resulting
code, and listening to other peoples experiences with the same compiler
version building the same code.

One last point. I'm not saying "You must use compiler version X". If
that happened, we'd never move forward. What I'm trying to do here is
open peoples eyes to the fact that "the latest and greatest compiler"
is probably not the best thing to compile a kernel with _unless_ you
have very good reasons to be using that version of the compiler.