On i386 and gcc-4.7 I found that __bound_local_new was miscompiled -
look:
#ifdef __i386__
/* return the frame pointer of the caller */
#define GET_CALLER_FP(fp)\
{\
unsigned long *fp1;\
__asm__ __volatile__ ("movl %%ebp,%0" :"=g" (fp1));\
fp = fp1[0];\
}
#endif
/* called when entering a function to add all the local regions */
void FASTCALL __bound_local_new(void *p1)
{
unsigned long addr, size, fp, *p = p1;
GET_CALLER_FP(fp);
for(;;) {
addr = p[0];
if (addr == 0)
break;
addr += fp;
size = p[1];
p += 2;
__bound_new_region((void *)addr, size);
}
}
__bound_local_new:
.LFB40:
.cfi_startproc
pushl %esi
.cfi_def_cfa_offset 8
.cfi_offset 6, -8
pushl %ebx
.cfi_def_cfa_offset 12
.cfi_offset 3, -12
subl $8, %esp // NOTE prologue does not touch %ebp
.cfi_def_cfa_offset 20
#APP
# 235 "lib/bcheck.c" 1
movl %ebp,%edx // %ebp -> fp1
# 0 "" 2
#NO_APP
movl (%edx), %esi // fp1[0] -> fp
movl (%eax), %edx
movl %eax, %ebx
testl %edx, %edx
je .L167
.p2align 2,,3
.L173:
movl 4(%ebx), %eax
addl $8, %ebx
movl %eax, 4(%esp)
addl %esi, %edx
movl %edx, (%esp)
call __bound_new_region
movl (%ebx), %edx
testl %edx, %edx
jne .L173
.L167:
addl $8, %esp
.cfi_def_cfa_offset 12
popl %ebx
.cfi_restore 3
.cfi_def_cfa_offset 8
popl %esi
.cfi_restore 6
.cfi_def_cfa_offset 4
ret
here GET_CALLER_FP() assumed that its using function setups it's stack
frame, i.e. first save, then set %ebp to stack frame start, and then it
has to do perform two lookups: 1) to get current stack frame through
%ebp, and 2) get caller stack frame through (%ebp).
And here is the problem: gcc decided not to setup %ebp for
__bound_local_new and in such case GET_CALLER_FP actually becomes
GET_CALLER_CALLER_FP and oops, wrong regions are registered in bcheck
tables...
The solution is to stop using hand written assembly and rely on gcc's
__builtin_frame_address(1) to get callers frame stack(*). I think for the
builtin gcc should generate correct code, independent of whether it
decides or not to omit frame pointer in using function - it knows it.
(*) judging by gcc history, __builtin_frame_address was there almost
from the beginning - at least it is present in 1992 as seen from the
following commit:
http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=be07f7bdbac76d87d3006c89855491504d5d6202
so we can rely on it being supported by all versions of gcc.
In my environment the assembly of __bound_local_new changes as follows:
diff --git a/bcheck0.s b/bcheck1.s
index 4c02a5f..ef68918 100644
--- a/bcheck0.s
+++ b/bcheck1.s
@@ -1409,20 +1409,17 @@ __bound_init:
__bound_local_new:
.LFB40:
.cfi_startproc
- pushl %esi
+ pushl %ebp // NOTE prologue saves %ebp ...
.cfi_def_cfa_offset 8
- .cfi_offset 6, -8
+ .cfi_offset 5, -8
+ movl %esp, %ebp // ... and reset it to local stack frame
+ .cfi_def_cfa_register 5
+ pushl %esi
pushl %ebx
- .cfi_def_cfa_offset 12
- .cfi_offset 3, -12
subl $8, %esp
- .cfi_def_cfa_offset 20
-#APP
-# 235 "lib/bcheck.c" 1
- movl %ebp,%edx
-# 0 "" 2
-#NO_APP
- movl (%edx), %esi
+ .cfi_offset 6, -12
+ .cfi_offset 3, -16
+ movl 0(%ebp), %esi // stkframe -> stkframe.parent -> fp
movl (%eax), %edx
movl %eax, %ebx
testl %edx, %edx
@@ -1440,13 +1437,13 @@ __bound_local_new:
jne .L173
.L167:
addl $8, %esp
- .cfi_def_cfa_offset 12
popl %ebx
.cfi_restore 3
- .cfi_def_cfa_offset 8
popl %esi
.cfi_restore 6
- .cfi_def_cfa_offset 4
+ popl %ebp
+ .cfi_restore 5
+ .cfi_def_cfa 4, 4
ret
.cfi_endproc
i.e. now it compiles correctly.
Though I do not have x86_64 to test, my guess is that
__builtin_frame_address(1) should work there too. If not - please revert
only x86_64 part of the patch. Thanks.
Cc: Michael Matz <matz@suse.de>
On i386 and gcc-4.7 I found that libc_malloc was miscompiled - look:
static void *libc_malloc(size_t size)
{
void *ptr;
restore_malloc_hooks(); // __malloc_hook = saved_malloc_hook
ptr = malloc(size);
install_malloc_hooks(); // saved_malloc_hook = __malloc_hook, __malloc_hook = __bound_malloc
return ptr;
}
.type libc_malloc, @function
libc_malloc:
.LFB56:
.cfi_startproc
pushl %edx
.cfi_def_cfa_offset 8
movl %eax, (%esp)
call malloc
movl $__bound_malloc, __malloc_hook
movl $__bound_free, __free_hook
movl $__bound_realloc, __realloc_hook
movl $__bound_memalign, __memalign_hook
popl %ecx
.cfi_def_cfa_offset 4
ret
Here gcc inlined both restore_malloc_hooks() and install_malloc_hooks()
and decided that
saved_malloc_hook -> __malloc_hook -> saved_malloc_hook
stores are not needed and could be ommitted. Only it did not know
__molloc_hook affects malloc()...
So add compiler barrier to both install and restore hooks functions and
be done with it - the code is now ok:
diff --git a/bcheck0.s b/bcheck1.s
index 5f50293..4c02a5f 100644
--- a/bcheck0.s
+++ b/bcheck1.s
@@ -42,8 +42,24 @@ libc_malloc:
.cfi_startproc
pushl %edx
.cfi_def_cfa_offset 8
+ movl saved_malloc_hook, %edx
+ movl %edx, __malloc_hook
+ movl saved_free_hook, %edx
+ movl %edx, __free_hook
+ movl saved_realloc_hook, %edx
+ movl %edx, __realloc_hook
+ movl saved_memalign_hook, %edx
+ movl %edx, __memalign_hook
movl %eax, (%esp)
call malloc
+ movl __malloc_hook, %edx
+ movl %edx, saved_malloc_hook
+ movl __free_hook, %edx
+ movl %edx, saved_free_hook
+ movl __realloc_hook, %edx
+ movl %edx, saved_realloc_hook
+ movl __memalign_hook, %edx
+ movl %edx, saved_memalign_hook
movl $__bound_malloc, __malloc_hook
movl $__bound_free, __free_hook
movl $__bound_realloc, __realloc_hook
For barrier I use
__asm__ __volatile__ ("": : : "memory")
which is used as compiler barrier by Linux kernel, and mentioned in gcc
docs and in wikipedia [1].
Without this patch any program compiled with tcc -b crashes in startup
because of infinite recursion in libc_malloc.
[1] http://en.wikipedia.org/wiki/Memory_ordering#Compiler_memory_barrier
Other tests still have issues, currently with weak linking.
One of the primary stumbling blocks on OSX is the lack of support for
mach-o binaries. Therefore all tcc usage on OSX has to be limited to elf
binaries, presumably produced by tcc itself.
Therefore I had to enable building of tiny_libmaker for OSX. Then changed
the make to use tcc and tiny_libmaker to compile the tcclib1.
In order to compile the tests, specifically the parts that use weak linking,
I have had to define MACOSX_DEPLOYMENT_TARGET to 10.2, which seems like a
hack, but extensive searching seems to indicate that this is the only way
to make apple gcc allow weak linking. Using any other value, bigger or smaller
breaks weak linking.
Also added _ANSI_SOURCE define required by some OSX headers, and some cosmetic
gitignore changes. I believe these changes should not impact other platforms.
to the appropriate CFLAGS. In addition, memory hooks are very different
on OSX, so build of bcheck.c had to be disabled for now.
Change of the CFLAGS does affect builds on other platforms, and this needs
to be tested.
- add __builtin_va_arg_types to check how arguments were passed
- move most code of stdarg into libtcc1.c
- remove __builtin_malloc and __builtin_free
- add a test case based on the bug report
(http://www.mail-archive.com/tinycc-devel@nongnu.org/msg03036.html)
(Because GNU's alloca.h unconditionally #undef's alloca)
Also, remove gcc specific sections in headers. and
instead change tests such that gcc does not use them.