Previously, long longs were 'lexpand'ed into two registers
always.
Now, it expands
- constants into two constants (lo-part, hi-part)
- variables into two lvalues with offset+4 for the hi-part.
This makes long long operations look a bit nicer.
Also: don't apply i386 'inc/dec' optimization if carry
generation is wanted.
tccgen.c:gv() when loading long long from lvalue, before
was saving all registers which caused problems in the arm
function call register parameter preparation, as with
void foo(long long y, int x);
int main(void)
{
unsigned int *xx[1], x;
unsigned long long *yy[1], y;
foo(**yy, **xx);
return 0;
}
Now only the modified register is saved if necessary,
as in this case where it is used to store the result
of the post-inc:
long long *p, v, **pp;
v = 1;
p = &v;
p[0]++;
printf("another long long spill test : %lld\n", *p);
i386-gen.c :
- found a similar problem with TOK_UMULL caused by the
vstack juggle in tccgen:gen_opl()
(bug seen only when using EBX as 4th register)
BUGZILLA:
interfacing with other compilers
extend the return value to the whole register if necessary.
visual studio and gcc do not always set the whole eax register
when assigning the return value of a function.
We've encountered wrong execution results on i386 platforms with an
application that uses both code compiled with TCC and code compiled
with other compilers (namely: Visual Studio on Windows, and GCC on
Linux).
When calling a function that returns an integer value shorter than 32
bits, TCC reads the return value from the whole EAX register,
although the code generated by the other compilers can only sets AL
for 8 bit values or AX for 16 bits values, and the rest of EAX can be
anything.
We worked around this with the attached patch on i386 for the version
0.9.26, but we did not look at other platforms to find if there are
similar issues.
* Documentation is now in "docs".
* Source code is now in "src".
* Misc. fixes here and there so that everything still works.
I think I got everything in this commit, but I only tested this
on Linux (Make) and Windows (CMake), so I might've messed
something up on other platforms...
Jsut for testing. It works for me (don't break anything)
Small fixes for x86_64-gen.c in "tccpp: fix issues, add tests"
are dropped in flavor of this patch.
Pip Cet:
Okay, here's a first patch that fixes the problem (but I've found
another bug, yet unfixed, in the process), though it's not
particularly pretty code (I tried hard to keep the changes to the
minimum necessary). If we decide to actually get rid of VT_QLONG and
VT_QFLOAT (please, can we?), there are some further simplifications in
tccgen.c that might offset some of the cost of this patch.
The idea is that an integer is no longer enough to describe how an
argument is stored in registers. There are a number of possibilities
(none, integer register, two integer registers, float register, two
float registers, integer register plus float register, float register
plus integer register), and instead of enumerating them I've
introduced a RegArgs type that stores the offsets for each of our
registers (for the other architectures, it's simply an int specifying
the number of registers). If someone strongly prefers an enum, we
could do that instead, but I believe this is a place where keeping
things general is worth it, because this way it should be doable to
add SSE or AVX support.
There is one line in the patch that looks suspicious:
} else {
addr = (addr + align - 1) & -align;
param_addr = addr;
addr += size;
- sse_param_index += reg_count;
}
break;
However, this actually fixes one half of a bug we have when calling a
function with eight double arguments "interrupted" by a two-double
structure after the seventh double argument:
f(double,double,double,double,double,double,double,struct { double
x,y; },double);
In this case, the last argument should be passed in %xmm7. This patch
fixes the problem in gfunc_prolog, but not the corresponding problem
in gfunc_call, which I'll try tackling next.
* fix some macro expansion issues
* add some pp tests in tests/pp
* improved tcc -E output for better diff'ability
* remove -dD feature (quirky code, exotic feature,
didn't work well)
Based partially on ideas / researches from PipCet
Some issues remain with VA_ARGS macros (if used in a
rather tricky way).
Also, to keep it simple, the pp doesn't automtically
add any extra spaces to separate tokens which otherwise
would form wrong tokens if re-read from tcc -E output
(such as '+' '=') GCC does that, other compilers don't.
* cleanups
- #line 01 "file" / # 01 "file" processing
- #pragma comment(lib,"foo")
- tcc -E: forward some pragmas to output (pack, comment(lib))
- fix macro parameter list parsing mess from
a3fc543459a715d7143d
(some coffee might help, next time ;)
- introduce TOK_PPSTR - to have character constants as
written in the file (similar to TOK_PPNUM)
- allow '\' appear in macros
- new functions begin/end_macro to:
- fix switching macro levels during expansion
- allow unget_tok to unget more than one tok
- slight speedup by using bitflags in isidnum_table
Also:
- x86_64.c : fix decl after statements
- i386-gen,c : fix a vstack leak with VLA on windows
- configure/Makefile : build on windows (MSYS) was broken
- tcc_warning: fflush stderr to keep output order (win32)
Author: Philip <pipcet@gmail.com>
Our VLA code can be made a lot simpler (simple enough for
even me to understand it) by giving up on the optimization idea, which
is very tempting. There's a patch to do that attached, feel free to
test and commit it if you like. (It passes all the tests, at least
On Linux 32: sizeof(long)=32 == sizeof(void *)=32
on Linux 64: sizeof(long)=64 == sizeof(void *)=64
on Windows 64: sizeof(long)=32 != sizeof(void *)=64
The common code to move a returned structure packed into
registers into memory on the caller side didn't take the
register size into account when allocating local storage,
so sometimes that lead to stack overwrites (e.g. in 73_arm64.c),
on x86_64. This fixes it by generally making gfunc_sret also return
the register size.
A test program:
/* result of the new version inroduced in 4ad186c5ef: t2a = 44100312 */
#include<stdio.h>
int main() {
int t1 = 176401255;
float f = 0.25f;
int t2a = (int)(t1 * f); // must be 44100313
int t2b = (int)(t1 * (float)0.25f);
printf("t2a=%d t2b=%d \n",t2a,t2b);
return 0;
}
Author: Thomas Preud'homme <robotux@celest.fr>
Date: Tue Dec 31 23:51:20 2013 +0800
Move logic for if (int value) to tccgen.c
Move the logic to do a test of an integer value (ex if (0)) out of
arch-specific code to tccgen.c to avoid code duplication. This also
fixes test of long long value which was only testing the bottom half of
such values on 32 bits architectures.
I don't understand why if () in gtst(i) was removed.
This patch allows to compile a linux kernel v.2.4.26
W/o this patch a tcc simply crashes.
Refactoring (no logical changes):
- use memcpy in tccgen.c:ieee_finite(double d)
- use union to store attribute flags in Sym
Makefile: "CFLAGS+=-fno-strict-aliasing" basically not necessary
anymore but I left it for now because gcc sometimes behaves
unexpectedly without.
Also:
- configure: back to mode 100755
- tcc.h: remove unused variables tdata/tbss_section
- x86_64-gen.c: adjust gfunc_sret for prototype
Variants __fixsfdi/__fixxfdi are not needed for now because
the value is converted to double always.
Also:
- remove __tcc_fpinit for unix as it seems redundant by the
__setfpucw call in the startup code
- avoid reference to s->runtime_main in cross compilers
- configure: fix --with-libgcc help
- tcctok.h: cleanup
The procedure calling standard for ARM architecture mandate the use of
the base standard for variadic function. Therefore, hgen float aggregate
must be returned via stack when greater than 4 bytes and via core
registers else in case of variadic function.
This patch improve gfunc_sret() to take into account whether the
function is variadic or not and make use of gfunc_sret() return value to
determine whether to pass a structure via stack in gfunc_prolog(). It
also take advantage of knowing if a function is variadic or not move
float result value from VFP register to core register in gfunc_epilog().
Move the logic to do a test of an integer value (ex if (0)) out of
arch-specific code to tccgen.c to avoid code duplication. This also
fixes test of long long value which was only testing the bottom half of
such values on 32 bits architectures.
Use comisd / fcompp for float comparison (except TOK_EQ and TOK_NE)
instead of ucomisd / fucompp to detect NaN comparison.
Thanks Vincent Lefèvre for the bug report and for also giving the
solution.
- avoid assumption "ret_align == register_size" which is
false for non-arm targets
- rename symbol "sret" to more descriptive "ret_nregs"
This fixes commit dcec8673f2
Also:
- remove multiple definitions in win32/include/math.h
On ARM with hardfloat calling convention, structure containing 4 fields
or less of the same float type are returned via float registers. This
means that a structure can be returned in up to 4 double registers in a
structure is composed of 4 doubles. This commit adds support for return
of structures in several registers.
- Use runtime function for conversion
- Also initialize fp with tcc -run on windows
This fixes a bug where
double x = 1.0;
double y = 1.0000000000000001;
double z = x < y ? 0 : sqrt (x*x - y*y);
caused a bad sqrt because rounding precision for the x < y comparison
was different to the one used within the sqrt function.
This also fixes a bug where
printf("%d, %d", (int)pow(10, 2), (int)pow(10, 2));
would print
100, 99
Unrelated:
win32: document relative include & lib lookup
win32: normalize_slashes: do not mirror silly gcc behavior
This reverts part of commit 8a81f9e103
winapi: add missing WINAPI decl. for some functions
VLA storage is now freed when it goes out of scope. This makes it
possible to use a VLA inside a loop without consuming an unlimited
amount of memory.
Combining VLAs with alloca() should work as in GCC - when a VLA is
freed, memory allocated by alloca() after the VLA was created is also
freed. There are some exceptions to this rule when using goto: if a VLA
is in scope at the goto, jumping to a label will reset the stack pointer
to where it was immediately after the last VLA was created prior to the
label, or to what it was before the first VLA was created if the label
is outside the scope of any VLA. This means that in some cases combining
alloca() and VLAs will free alloca() memory where GCC would not.
I expect that Linux-x86 is probably fine. All other architectures
except ARM are definitely broken since I haven't yet implemented
gfunc_sret for these, although replicating the current behaviour
should be straightforward.