This improves commit 5cbe03b9c4 by
avoiding a double transfer when the default float ABI is already softfp.
It's also more clean by expliciting that the ABI is simply changed for
runtime ABI functions.
EABI functions to convert an int to a double register take the integer
value in core registers and also give the result in core registers.
It is thus necessary to move the result back to VFP register after the
function call. This only affected integer to double conversion because
integer to float conversion used a VFP instruction to do the conversion
and this obviously left the result in VFP register. Note that the
behavior is left untouched for !EABI as the correct behavior in this
case is unknown to the author of this patch.
The procedure calling standard for ARM architecture mandate the use of
the base standard for variadic function. Therefore, hgen float aggregate
must be returned via stack when greater than 4 bytes and via core
registers else in case of variadic function.
This patch improve gfunc_sret() to take into account whether the
function is variadic or not and make use of gfunc_sret() return value to
determine whether to pass a structure via stack in gfunc_prolog(). It
also take advantage of knowing if a function is variadic or not move
float result value from VFP register to core register in gfunc_epilog().
Move the logic to do a test of an integer value (ex if (0)) out of
arch-specific code to tccgen.c to avoid code duplication. This also
fixes test of long long value which was only testing the bottom half of
such values on 32 bits architectures.
Add infrastructure to support special calling convention for runtime ABI
function no matter what is the current calling convention. This involve
2 changes:
- behave as per base standard in gfunc_call
- move result back in VFP register in gen_cvt_itof1
Fix the address on stack where a structure is copied when it is a
parameter of a function call. This address must be computed from the
stack pointer and a possible padding offset.
On ARM with hardfloat calling convention, structure containing 4 fields
or less of the same float type are returned via float registers. This
means that a structure can be returned in up to 4 double registers in a
structure is composed of 4 doubles. This commit adds support for return
of structures in several registers.
Fix in gfunc_prolog for ARM the counting of the highest numbered VFP
float register used for parameter passing, rounded to 2. It can be
computed from the range of VFP float register with the highest range
start and adding the number of VFP float register occupied. This ensure
that parameter of type struct that spans over more than 2 float
registers are correctly taken into account.
Prior to this commit, params could use some registers that do not appear
in the value stack. Therefore when generating function call, one of such
register could be reused, leading to wrong parameter content. This
happens when a structure is passed via core register, as only the first
register would appear in the value stack.
* Correctly align stack in case of structure split between core
registers and stack
* Correctly reclaim stack space after function call in the case where
the stack needed padding to be aligned at function call.
Wrap runtime_main as per its declaration in tcc.h.
Fix preprocessor check for TCC_ARM_EABI macro definition.
Signed-off-by: Joseph Poirier <jdpoirier@gmail.com>
Allocation of struct in core and/or VFP registers on ARM is made by
manipulating the value stack to create 3 distinct zones: parameters
allocated on stack, parameters of type struct allocated in core
registers and parameters of type struct allocated in VFP registers.
Parameters of primitive type can be in any zone. This commit change the
order of the zones from stack, VFP, core to stack, core, VFP (from
highest addresses to lowest ones) in order to correctly deal the
situation when structures are allocated both in core and VFP registers.
VLA storage is now freed when it goes out of scope. This makes it
possible to use a VLA inside a loop without consuming an unlimited
amount of memory.
Combining VLAs with alloca() should work as in GCC - when a VLA is
freed, memory allocated by alloca() after the VLA was created is also
freed. There are some exceptions to this rule when using goto: if a VLA
is in scope at the goto, jumping to a label will reset the stack pointer
to where it was immediately after the last VLA was created prior to the
label, or to what it was before the first VLA was created if the label
is outside the scope of any VLA. This means that in some cases combining
alloca() and VLAs will free alloca() memory where GCC would not.
abitest now passes; however test1-3 fail in init_test. All other tests
pass. I need to re-test Win32 and Linux-x86.
I've added a dummy implementation of gfunc_sret to c67-gen.c so it
should now compile, and I think it should behave as before I created
gfunc_sret.
I expect that Linux-x86 is probably fine. All other architectures
except ARM are definitely broken since I haven't yet implemented
gfunc_sret for these, although replicating the current behaviour
should be straightforward.
Should fix some warnings wrt. access out of array bounds.
tccelf.c: fix "static function unused" warning
x86_64-gen.c: fix "ctype.ref uninitialzed" warning and cleanup
tcc-win32.txt: remove obsolete limitation notes.
Fix initialization of args_size before doing register allocation.
When adding hardfloat calling convention the initialization stopped
being performed when !defined (TCC_ARM_EABI).
Change the linking of the frames on ARM. Instead of having fp points 12
bytes above where the old fp is stored, let fp points where the old fp
is stored. That is, we switch from:
| . |
| . |
| . |
| |
| params | <-- fp
--------
| oldlr |
--------
| oldip |
--------
| oldfp |
--------
to:
| . |
| . |
| . |
| |
| params |
--------
| oldlr |
--------
| oldip |
--------
| oldfp | <-- fp
--------
A line in gfunc_call in arm-gen.c is referencing vfpr unconditionally.
Yet, this function is only available when TCC_ARM_VFP is set. While this
code is only triggered when TCC_ARM_VFP, it fails at compile time. This
commit fix the problem.
This was already possible using
make NOTALLINONE=1
and is now the default.
To build as previously from one big source, use
make ONE_SOURCE=1
Cross compilers are still build from one source because using
separate objects requires separate build directories one per
platform which currently is not (yet) supported by the makefile.
We could probably use gnu-makeish target variables like
$(I386_CROSS): OUTDIR=build/i386
$(X64_CROSS): OUTDIR=build/x86-64
and so on ...
Also NEED_FLOAT_TYPES for arm-gen is removed. It was about
variables that are referenced from outside (libtcc, tccgen).
We could declare them in tcc.h (as with reg_classes) or have
them twice in arm-gen.c. I chose option 2.
The loop constructs to iterate over the non-overlapping, even
positions of two or three bytes in a word were broken.
This patch fixes the loops. It has been verified to generate the
72 combinations for two and the 80 combinations for three bytes.