On i386 and gcc-4.7 I found that __bound_local_new was miscompiled -
look:
#ifdef __i386__
/* return the frame pointer of the caller */
#define GET_CALLER_FP(fp)\
{\
unsigned long *fp1;\
__asm__ __volatile__ ("movl %%ebp,%0" :"=g" (fp1));\
fp = fp1[0];\
}
#endif
/* called when entering a function to add all the local regions */
void FASTCALL __bound_local_new(void *p1)
{
unsigned long addr, size, fp, *p = p1;
GET_CALLER_FP(fp);
for(;;) {
addr = p[0];
if (addr == 0)
break;
addr += fp;
size = p[1];
p += 2;
__bound_new_region((void *)addr, size);
}
}
__bound_local_new:
.LFB40:
.cfi_startproc
pushl %esi
.cfi_def_cfa_offset 8
.cfi_offset 6, -8
pushl %ebx
.cfi_def_cfa_offset 12
.cfi_offset 3, -12
subl $8, %esp // NOTE prologue does not touch %ebp
.cfi_def_cfa_offset 20
#APP
# 235 "lib/bcheck.c" 1
movl %ebp,%edx // %ebp -> fp1
# 0 "" 2
#NO_APP
movl (%edx), %esi // fp1[0] -> fp
movl (%eax), %edx
movl %eax, %ebx
testl %edx, %edx
je .L167
.p2align 2,,3
.L173:
movl 4(%ebx), %eax
addl $8, %ebx
movl %eax, 4(%esp)
addl %esi, %edx
movl %edx, (%esp)
call __bound_new_region
movl (%ebx), %edx
testl %edx, %edx
jne .L173
.L167:
addl $8, %esp
.cfi_def_cfa_offset 12
popl %ebx
.cfi_restore 3
.cfi_def_cfa_offset 8
popl %esi
.cfi_restore 6
.cfi_def_cfa_offset 4
ret
here GET_CALLER_FP() assumed that its using function setups it's stack
frame, i.e. first save, then set %ebp to stack frame start, and then it
has to do perform two lookups: 1) to get current stack frame through
%ebp, and 2) get caller stack frame through (%ebp).
And here is the problem: gcc decided not to setup %ebp for
__bound_local_new and in such case GET_CALLER_FP actually becomes
GET_CALLER_CALLER_FP and oops, wrong regions are registered in bcheck
tables...
The solution is to stop using hand written assembly and rely on gcc's
__builtin_frame_address(1) to get callers frame stack(*). I think for the
builtin gcc should generate correct code, independent of whether it
decides or not to omit frame pointer in using function - it knows it.
(*) judging by gcc history, __builtin_frame_address was there almost
from the beginning - at least it is present in 1992 as seen from the
following commit:
http://gcc.gnu.org/git/?p=gcc.git;a=commit;h=be07f7bdbac76d87d3006c89855491504d5d6202
so we can rely on it being supported by all versions of gcc.
In my environment the assembly of __bound_local_new changes as follows:
diff --git a/bcheck0.s b/bcheck1.s
index 4c02a5f..ef68918 100644
--- a/bcheck0.s
+++ b/bcheck1.s
@@ -1409,20 +1409,17 @@ __bound_init:
__bound_local_new:
.LFB40:
.cfi_startproc
- pushl %esi
+ pushl %ebp // NOTE prologue saves %ebp ...
.cfi_def_cfa_offset 8
- .cfi_offset 6, -8
+ .cfi_offset 5, -8
+ movl %esp, %ebp // ... and reset it to local stack frame
+ .cfi_def_cfa_register 5
+ pushl %esi
pushl %ebx
- .cfi_def_cfa_offset 12
- .cfi_offset 3, -12
subl $8, %esp
- .cfi_def_cfa_offset 20
-#APP
-# 235 "lib/bcheck.c" 1
- movl %ebp,%edx
-# 0 "" 2
-#NO_APP
- movl (%edx), %esi
+ .cfi_offset 6, -12
+ .cfi_offset 3, -16
+ movl 0(%ebp), %esi // stkframe -> stkframe.parent -> fp
movl (%eax), %edx
movl %eax, %ebx
testl %edx, %edx
@@ -1440,13 +1437,13 @@ __bound_local_new:
jne .L173
.L167:
addl $8, %esp
- .cfi_def_cfa_offset 12
popl %ebx
.cfi_restore 3
- .cfi_def_cfa_offset 8
popl %esi
.cfi_restore 6
- .cfi_def_cfa_offset 4
+ popl %ebp
+ .cfi_restore 5
+ .cfi_def_cfa 4, 4
ret
.cfi_endproc
i.e. now it compiles correctly.
Though I do not have x86_64 to test, my guess is that
__builtin_frame_address(1) should work there too. If not - please revert
only x86_64 part of the patch. Thanks.
Cc: Michael Matz <matz@suse.de>
On i386 and gcc-4.7 I found that libc_malloc was miscompiled - look:
static void *libc_malloc(size_t size)
{
void *ptr;
restore_malloc_hooks(); // __malloc_hook = saved_malloc_hook
ptr = malloc(size);
install_malloc_hooks(); // saved_malloc_hook = __malloc_hook, __malloc_hook = __bound_malloc
return ptr;
}
.type libc_malloc, @function
libc_malloc:
.LFB56:
.cfi_startproc
pushl %edx
.cfi_def_cfa_offset 8
movl %eax, (%esp)
call malloc
movl $__bound_malloc, __malloc_hook
movl $__bound_free, __free_hook
movl $__bound_realloc, __realloc_hook
movl $__bound_memalign, __memalign_hook
popl %ecx
.cfi_def_cfa_offset 4
ret
Here gcc inlined both restore_malloc_hooks() and install_malloc_hooks()
and decided that
saved_malloc_hook -> __malloc_hook -> saved_malloc_hook
stores are not needed and could be ommitted. Only it did not know
__molloc_hook affects malloc()...
So add compiler barrier to both install and restore hooks functions and
be done with it - the code is now ok:
diff --git a/bcheck0.s b/bcheck1.s
index 5f50293..4c02a5f 100644
--- a/bcheck0.s
+++ b/bcheck1.s
@@ -42,8 +42,24 @@ libc_malloc:
.cfi_startproc
pushl %edx
.cfi_def_cfa_offset 8
+ movl saved_malloc_hook, %edx
+ movl %edx, __malloc_hook
+ movl saved_free_hook, %edx
+ movl %edx, __free_hook
+ movl saved_realloc_hook, %edx
+ movl %edx, __realloc_hook
+ movl saved_memalign_hook, %edx
+ movl %edx, __memalign_hook
movl %eax, (%esp)
call malloc
+ movl __malloc_hook, %edx
+ movl %edx, saved_malloc_hook
+ movl __free_hook, %edx
+ movl %edx, saved_free_hook
+ movl __realloc_hook, %edx
+ movl %edx, saved_realloc_hook
+ movl __memalign_hook, %edx
+ movl %edx, saved_memalign_hook
movl $__bound_malloc, __malloc_hook
movl $__bound_free, __free_hook
movl $__bound_realloc, __realloc_hook
For barrier I use
__asm__ __volatile__ ("": : : "memory")
which is used as compiler barrier by Linux kernel, and mentioned in gcc
docs and in wikipedia [1].
Without this patch any program compiled with tcc -b crashes in startup
because of infinite recursion in libc_malloc.
[1] http://en.wikipedia.org/wiki/Memory_ordering#Compiler_memory_barrier
Revert commit 891dfcdf3f since it assumes
*all* architectures supported by tcc have GOT offsets aligned on 2. A
rework of this commit is being done since without it all PLT entries
grow by 4 bytes.
Since commit c6630ef92a, Call to a veneer
when the final symbol to be reached is thumb is made through a blx
instruction. This is a mistake since veneers are ARM instructions and
should thus be called with a simple bl. This commit prevent the bl ->
blx conversion when a veneer is used.
Source fortification now works correctly : it compiles without warning
except unused result and the resulting tcc is working fine. Hence let's
stop disabling source fortification and hide unused result instead.
Generate PLT thumb stub for an ARM PLT entry only when at least one
Thumb instruction branches to that entry.
Warning: To save space, this commit reuses the bit 0 of entries of
got_offsets array. The GOT offset is thus saved in a 31 bit value.
Make sure to divide by 2 (right shift by 1) an offset before storing it
there and conversely to multiply the value by 2 (left shift by 1) before
using it.
the absence of a clean target in tests2/Makefile make the clean target
in the main Makefile fails to complete. This commit create such a target
which removes the only file created when tests pass successfully.
Add support for relocations R_ARM_THM_JUMP24 and R_ARM_THM_CALL. These
are encountered with gcc when compiling for armv6 or greater with
-mthumb flag and a call (conditional or not) is done.
Currently, VLA are not forbidden for static variable. This leads to
problems even if for fixed-size array when the size expression uses the
ternary operator (cond ? then-value : else-value) because it is parsed
as a general expression which leads to code generated in this case.
This commit solve the problem by forbidding VLA for static variables.
Although not required for the fix, avoiding code generation when the
expression is constant would be a nice addition though.
Introduce ARM version for the target architecture in order to determine
if blx instruction can be used or not. Availability of blx instruction
allows for more scenarii supported in R_ARM_CALL relocation. It should
also be useful when introducing support for the R_ARM_THM_CALL
relocation.
With R_ARM_CALL, if target function is to be entered in Thumb mode, the
relocation is supposed to transform bl in blx. This is not the case
actually so this commit is there to fix it.
Add support for relocations R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS as well
as their Thumb2 counterpart R_ARM_THM_MOVW_ABS_NC and
R_ARM_THM_MOVT_ABS. These are encountered with gcc when compiling for
armv7-a and a data is loaded in a register, either in arm or Thumb2
mode. The first half of the data is loaded with movw ; the second half
is loaded with movt.
We are now compatible with the 0.9,25 version though. A special
value for the second (ptr) argument is used to get the simple
behavior as with the 0.9.24 version.
Arm hardfloat variant uses a different ABI than arm and uses thus a
different multiarch directory for headers and libraries. This commit
detect whether the system uses the hardfloat variant and configure the
multiarch directory accordingly.
The code for shifts is now similar to code for binary arithmetic operations,
except that only the first argument is considered, as required by the ISO C
standard.
The intent is for 'make test' to pass cleanly on each platform, and thus easier
spotting of regressions. Linux is best supported by most tests running and
passing. Mac OSX passes mosts tests that do not make/link with binary files,
due to lack of mach-o file support.
!!! I have very limited knowledge of Windows platform, and cannot comment why
all tests(1) fail. I have posted to newsgroup asking for someone to test
Windows platform.
On 2012-06-26 15:07:57 +0200, Vincent Lefevre wrote:
> ISO C99 TC3 says: [6.5.7#3] "The integer promotions are performed on
> each of the operands. The type of the result is that of the promoted
> left operand."
I've written a patch (attached). Now the shift problems no longer
occur with the testcase and with GNU MPFR's "make check".
--
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
The tests are taken almost verbatim from the open source project PicoC. It can
be found at https://code.google.com/p/picoc/.
The tests range from very simple/trivial ones to more complicated. My view is
that the more tests the better. Without tests like this I was very reluctant to
make any changes to tcc for the fear of breaking things.
The tests pass on Win32, OSX, Linux x86 and x86_64. One or two tests fail on
each platform due to differences in the runtime library.
Evaluate configure arguments to reproduce autotools behavior. Autotools
actually only expands a few variable and do it at make time but it makes
the change much simpler.
Using /usr/bin/env tcc doesn't work as it was reported. Revert to
using the full path which fails if the user installs tcc in non-default
location. Then again this is just an example.
This reverts commit 27a428cd0f.
I probably broke that myself earlier. In any case parse_args
needs to start with index 0 because it is is used also recursively
to expand the shebang command from scripts such as
#!/usr/local/bin/tcc -run -L/usr/X11R6/lib -lX11
which arrives at tcc as only two argv's
"tcc" "-run -L/usr/X11R6/lib -lX11"
When using gcc compiler (as opposed to llvm) to build 32 bit tcc, compiler flags
-mpreferred-stack-boundary=2, -march=i386 and -falign-functions=2 were being
used. -march is redundant as -m32 is already being used. The other two seem to
be corrupting stack. I am not sure why this is the case, as the explanation of
the flags states that only running code size should be affected, but it does.
I think that is is safe to remove these flags altogether for all compilers and
platforms, especially since they are not being used for 64 bit builds. However
I do not want to apply such wide change without agreement from the people on the
mailing list.
Loads of VT_LLOCAL values (which effectively represent saved
addresses of lvalues) were done in VT_INT type, loosing the upper
32 bits. Needs to be done in VT_PTR type.
* Add multiarch directories for arm and i386
* Fix detection of biarch: /lib64/ld-linux-x86-64.so.2 is mandated by
ABI and is thus always present, even if there is no biarch
* Define CONFIG_LDDIR directly with the right value in case of multiarch
instead of defining it to /lib and then redifining it.
This patch fix 2 bugs in CONFIG_LDDIR usage:
* CONFIG_LDDIR used for 2 purposes
there is confusion between the directory to find libraries, crt* files
and headers and the directory in which the program interpreter is.
These two directories are not related. The latter is specified by the
ABI and should not be configurable while the former depends on the
system (single arch, biarch, multiarch). This end a longstanding issue
with amd64 program interpreter later propagated to other architecture
interpreters.
* If multiarch is in effect, then the library directory should be /lib.
/lib64 denotes biarch architecture, everything which is here would be
in /lib/x86_64-linux-gnu instead.
TCCs make dependency generator is incompatible with the GNU
depcomp script which is widely used. For TCC it has to go over
the output of -MD, (it detects it as ICC compatible), but the
sed commands it uses are confused by tabs in the output, so that
some rewrites aren't done.
Those tabs will then finally confuse make itself when the
generated .d files are included. It reads them as goal commands
(leading tab), and is totally lost then.
Short of changing depcomp (hard because distributed with all kinds
of software), simply emit spaces for -MD.
Sometimes the result of a comparison is not directly used in a jump,
but in arithmetic or further comparisons. If those further things
do a vswap() with the VT_CMP as current top, and then generate
instructions for the new top, this most probably destroys the flags
(e.g. if it's a bitfield load like in the example).
vswap() must do the same like vsetc() and not allow VT_CMP vtops
to be moved down.
This matters when sizeof is directly used in arithmetic,
ala "uintptr_t t; t &= -sizeof(long)" (for alignment). When sizeof
isn't size_t (as it's specified to be) this masking will truncate
the high bits of the uintptr_t object (if uintptr_t is larger than
uint).
This needs to be accepted:
typedef int foo;
void f (void) { foo: return; }
namespaces for labels and types are different. The problem is that
the block parser always tries to find a decl first and that routine
doesn't peek enough to detect this case. Needs some adjustments
to unget_tok() so that we can call it even when we already called
it once, but next() didn't come around restoring the buffer yet.
(It lazily does so not when the buffer becomes empty, but rather
when the next call detects that the buffer is empty, i.e. it requires
two next() calls until the unget buffer gets switched back).