mirror of
https://github.com/mirror/tinycc.git
synced 2024-12-28 04:00:06 +08:00
955 lines
26 KiB
Plaintext
955 lines
26 KiB
Plaintext
\input texinfo @c -*- texinfo -*-
|
|
|
|
@settitle Tiny C Compiler Reference Documentation
|
|
@titlepage
|
|
@sp 7
|
|
@center @titlefont{Tiny C Compiler Reference Documentation}
|
|
@sp 3
|
|
@end titlepage
|
|
|
|
@chapter Introduction
|
|
|
|
TinyCC (aka TCC) is a small but hyper fast C compiler. Unlike other C
|
|
compilers, it is meant to be self-suffisant: you do not need an
|
|
external assembler or linker because TCC does that for you.
|
|
|
|
TCC compiles so @emph{fast} that even for big projects @code{Makefile}s may
|
|
not be necessary.
|
|
|
|
TCC not only supports ANSI C, but also most of the new ISO C99
|
|
standard and many GNUC extensions including inline assembly.
|
|
|
|
TCC can also be used to make @emph{C scripts}, i.e. pieces of C source
|
|
that you run as a Perl or Python script. Compilation is so fast that
|
|
your script will be as fast as if it was an executable.
|
|
|
|
TCC can also automatically generate memory and bound checks
|
|
(@xref{bounds}) while allowing all C pointers operations. TCC can do
|
|
these checks even if non patched libraries are used.
|
|
|
|
With @code{libtcc}, you can use TCC as a backend for dynamic code
|
|
generation (@xref{libtcc}).
|
|
|
|
@node invoke
|
|
@chapter Command line invocation
|
|
|
|
@section Quick start
|
|
|
|
@example
|
|
usage: tcc [-c] [-o outfile] [-Bdir] [-bench] [-Idir] [-Dsym[=val]] [-Usym]
|
|
[-g] [-b] [-bt N] [-Ldir] [-llib] [-shared] [-static]
|
|
[--] infile1 [infile2... --] [infile_args...]
|
|
@end example
|
|
|
|
TCC options are a very much like gcc. The main difference is that TCC
|
|
can also execute directly the resulting program and give it runtime
|
|
arguments.
|
|
|
|
Here are some examples to understand the logic:
|
|
|
|
@table @code
|
|
@item tcc a.c
|
|
Compile a.c and execute it directly
|
|
|
|
@item tcc a.c arg1
|
|
Compile a.c and execute it directly. arg1 is given as first argument to
|
|
the @code{main()} of a.c.
|
|
|
|
@item tcc -- a.c b.c -- arg1
|
|
Compile a.c and b.c, link them together and execute them. arg1 is given
|
|
as first argument to the @code{main()} of the resulting program. Because
|
|
multiple C files are specified, @code{--} are necessary to clearly separate the
|
|
program arguments from the TCC options.
|
|
|
|
@item tcc -o myprog a.c b.c
|
|
Compile a.c and b.c, link them and generate the executable myprog.
|
|
|
|
@item tcc -o myprog a.o b.o
|
|
link a.o and b.o together and generate the executable myprog.
|
|
|
|
@item tcc -c a.c
|
|
Compile a.c and generate object file a.o
|
|
|
|
@item tcc -c asmfile.S
|
|
Preprocess with C preprocess and assemble asmfile.S and generate
|
|
object file asmfile.o.
|
|
|
|
@item tcc -c asmfile.s
|
|
Assemble (but not preprocess) asmfile.s and generate object file
|
|
asmfile.o.
|
|
|
|
@item tcc -r -o ab.o a.c b.c
|
|
Compile a.c and b.c, link them together and generate the object file ab.o.
|
|
|
|
@end table
|
|
|
|
Scripting:
|
|
|
|
TCC can be invoked from @emph{scripts}, just as shell scripts. You just
|
|
need to add @code{#!/usr/local/bin/tcc} at the start of your C source:
|
|
|
|
@example
|
|
#!/usr/local/bin/tcc
|
|
#include <stdio.h>
|
|
|
|
int main()
|
|
{
|
|
printf("Hello World\n");
|
|
return 0;
|
|
}
|
|
@end example
|
|
|
|
@section Option summary
|
|
|
|
General Options:
|
|
|
|
@table @samp
|
|
@item -c
|
|
Generate an object file (@samp{-o} option must also be given).
|
|
|
|
@item -o outfile
|
|
Put object file, executable, or dll into output file @file{outfile}.
|
|
|
|
@item -Bdir
|
|
Set the path where the tcc internal libraries can be found (default is
|
|
@file{PREFIX/lib/tcc}).
|
|
|
|
@item -bench
|
|
Output compilation statistics.
|
|
@end table
|
|
|
|
Preprocessor options:
|
|
|
|
@table @samp
|
|
@item -Idir
|
|
Specify an additionnal include path. Include paths are searched in the
|
|
order they are specified.
|
|
|
|
System include paths are always searched after. The default system
|
|
include paths are: @file{/usr/local/include}, @file{/usr/include}
|
|
and @file{PREFIX/lib/tcc/include}. (@code{PREFIX} is usually
|
|
@file{/usr} or @file{/usr/local}).
|
|
|
|
@item -Dsym[=val]
|
|
Define preprocessor symbol 'sym' to
|
|
val. If val is not present, its value is '1'. Function-like macros can
|
|
also be defined: @code{'-DF(a)=a+1'}
|
|
|
|
@item -Usym
|
|
Undefine preprocessor symbol 'sym'.
|
|
@end table
|
|
|
|
Linker options:
|
|
|
|
@table @samp
|
|
@item -Ldir
|
|
Specify an additionnal static library path for the @samp{-l} option. The
|
|
default library paths are @file{/usr/local/lib}, @file{/usr/lib} and @file{/lib}.
|
|
|
|
@item -lxxx
|
|
Link your program with dynamic library libxxx.so or static library
|
|
libxxx.a. The library is searched in the paths specified by the
|
|
@samp{-L} option.
|
|
|
|
@item -shared
|
|
Generate a shared library instead of an executable (@samp{-o} option
|
|
must also be given).
|
|
|
|
@item -static
|
|
Generate a statically linked executable (default is a shared linked
|
|
executable) (@samp{-o} option must also be given).
|
|
|
|
@item -r
|
|
Generate an object file combining all input files (@samp{-o} option must
|
|
also be given).
|
|
|
|
@end table
|
|
|
|
Debugger options:
|
|
|
|
@table @samp
|
|
@item -g
|
|
Generate run time debug information so that you get clear run time
|
|
error messages: @code{ test.c:68: in function 'test5()': dereferencing
|
|
invalid pointer} instead of the laconic @code{Segmentation
|
|
fault}.
|
|
|
|
@item -b
|
|
Generate additionnal support code to check
|
|
memory allocations and array/pointer bounds. @samp{-g} is implied. Note
|
|
that the generated code is slower and bigger in this case.
|
|
|
|
@item -bt N
|
|
Display N callers in stack traces. This is useful with @samp{-g} or
|
|
@samp{-b}.
|
|
|
|
@end table
|
|
|
|
Note: GCC options @samp{-Ox}, @samp{-Wx}, @samp{-fx} and @samp{-mx} are
|
|
ignored.
|
|
|
|
@chapter C language support
|
|
|
|
@section ANSI C
|
|
|
|
TCC implements all the ANSI C standard, including structure bit fields
|
|
and floating point numbers (@code{long double}, @code{double}, and
|
|
@code{float} fully supported).
|
|
|
|
@section ISOC99 extensions
|
|
|
|
TCC implements many features of the new C standard: ISO C99. Currently
|
|
missing items are: complex and imaginary numbers and variable length
|
|
arrays.
|
|
|
|
Currently implemented ISOC99 features:
|
|
|
|
@itemize
|
|
|
|
@item 64 bit @code{'long long'} types are fully supported.
|
|
|
|
@item The boolean type @code{'_Bool'} is supported.
|
|
|
|
@item @code{'__func__'} is a string variable containing the current
|
|
function name.
|
|
|
|
@item Variadic macros: @code{__VA_ARGS__} can be used for
|
|
function-like macros:
|
|
@example
|
|
#define dprintf(level, __VA_ARGS__) printf(__VA_ARGS__)
|
|
@end example
|
|
@code{dprintf} can then be used with a variable number of parameters.
|
|
|
|
@item Declarations can appear anywhere in a block (as in C++).
|
|
|
|
@item Array and struct/union elements can be initialized in any order by
|
|
using designators:
|
|
@example
|
|
struct { int x, y; } st[10] = { [0].x = 1, [0].y = 2 };
|
|
|
|
int tab[10] = { 1, 2, [5] = 5, [9] = 9};
|
|
@end example
|
|
|
|
@item Compound initializers are supported:
|
|
@example
|
|
int *p = (int []){ 1, 2, 3 };
|
|
@end example
|
|
to initialize a pointer pointing to an initialized array. The same
|
|
works for structures and strings.
|
|
|
|
@item Hexadecimal floating point constants are supported:
|
|
@example
|
|
double d = 0x1234p10;
|
|
@end example
|
|
is the same as writing
|
|
@example
|
|
double d = 4771840.0;
|
|
@end example
|
|
|
|
@item @code{'inline'} keyword is ignored.
|
|
|
|
@item @code{'restrict'} keyword is ignored.
|
|
@end itemize
|
|
|
|
@section GNU C extensions
|
|
|
|
TCC implements some GNU C extensions:
|
|
|
|
@itemize
|
|
|
|
@item array designators can be used without '=':
|
|
@example
|
|
int a[10] = { [0] 1, [5] 2, 3, 4 };
|
|
@end example
|
|
|
|
@item Structure field designators can be a label:
|
|
@example
|
|
struct { int x, y; } st = { x: 1, y: 1};
|
|
@end example
|
|
instead of
|
|
@example
|
|
struct { int x, y; } st = { .x = 1, .y = 1};
|
|
@end example
|
|
|
|
@item @code{'\e'} is ASCII character 27.
|
|
|
|
@item case ranges : ranges can be used in @code{case}s:
|
|
@example
|
|
switch(a) {
|
|
case 1 ... 9:
|
|
printf("range 1 to 9\n");
|
|
break;
|
|
default:
|
|
printf("unexpected\n");
|
|
break;
|
|
}
|
|
@end example
|
|
|
|
@item The keyword @code{__attribute__} is handled to specify variable or
|
|
function attributes. The following attributes are supported:
|
|
@itemize
|
|
@item @code{aligned(n)}: align data to n bytes (must be a power of two).
|
|
|
|
@item @code{section(name)}: generate function or data in assembly
|
|
section name (name is a string containing the section name) instead
|
|
of the default section.
|
|
|
|
@item @code{unused}: specify that the variable or the function is unused.
|
|
|
|
@item @code{cdecl}: use standard C calling convention.
|
|
|
|
@item @code{stdcall}: use Pascal-like calling convention.
|
|
|
|
@end itemize
|
|
|
|
Here are some examples:
|
|
@example
|
|
int a __attribute__ ((aligned(8), section(".mysection")));
|
|
@end example
|
|
|
|
align variable @code{'a'} to 8 bytes and put it in section @code{.mysection}.
|
|
|
|
@example
|
|
int my_add(int a, int b) __attribute__ ((section(".mycodesection")))
|
|
{
|
|
return a + b;
|
|
}
|
|
@end example
|
|
|
|
generate function @code{'my_add'} in section @code{.mycodesection}.
|
|
|
|
@item GNU style variadic macros:
|
|
@example
|
|
#define dprintf(fmt, args...) printf(fmt, ## args)
|
|
|
|
dprintf("no arg\n");
|
|
dprintf("one arg %d\n", 1);
|
|
@end example
|
|
|
|
@item @code{__FUNCTION__} is interpreted as C99 @code{__func__}
|
|
(so it has not exactly the same semantics as string literal GNUC
|
|
where it is a string literal).
|
|
|
|
@item The @code{__alignof__} keyword can be used as @code{sizeof}
|
|
to get the alignment of a type or an expression.
|
|
|
|
@item The @code{typeof(x)} returns the type of @code{x}.
|
|
@code{x} is an expression or a type.
|
|
|
|
@item Computed gotos: @code{&&label} returns a pointer of type
|
|
@code{void *} on the goto label @code{label}. @code{goto *expr} can be
|
|
used to jump on the pointer resulting from @code{expr}.
|
|
|
|
@item Inline assembly with asm instruction:
|
|
@example
|
|
static inline void * my_memcpy(void * to, const void * from, size_t n)
|
|
{
|
|
int d0, d1, d2;
|
|
__asm__ __volatile__(
|
|
"rep ; movsl\n\t"
|
|
"testb $2,%b4\n\t"
|
|
"je 1f\n\t"
|
|
"movsw\n"
|
|
"1:\ttestb $1,%b4\n\t"
|
|
"je 2f\n\t"
|
|
"movsb\n"
|
|
"2:"
|
|
: "=&c" (d0), "=&D" (d1), "=&S" (d2)
|
|
:"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from)
|
|
: "memory");
|
|
return (to);
|
|
}
|
|
@end example
|
|
|
|
TCC includes its own x86 inline assembler with a @code{gas}-like (GNU
|
|
assembler) syntax. No intermediate files are generated. GCC 3.x named
|
|
operands are supported.
|
|
|
|
@end itemize
|
|
|
|
@section TinyCC extensions
|
|
|
|
@itemize
|
|
|
|
@item @code{__TINYC__} is a predefined macro to @code{'1'} to
|
|
indicate that you use TCC.
|
|
|
|
@item @code{'#!'} at the start of a line is ignored to allow scripting.
|
|
|
|
@item Binary digits can be entered (@code{'0b101'} instead of
|
|
@code{'5'}).
|
|
|
|
@item @code{__BOUNDS_CHECKING_ON} is defined if bound checking is activated.
|
|
|
|
@end itemize
|
|
|
|
@chapter TinyCC Assembler
|
|
|
|
Since version 0.9.16, TinyCC integrates its own assembler. TinyCC
|
|
assembler supports a gas-like syntax (GNU assembler). You can
|
|
desactivate assembler support if you want a smaller TinyCC executable
|
|
(the C compiler does not rely on the assembler).
|
|
|
|
TinyCC Assembler is used to handle files with @file{.S} (C
|
|
preprocessed assembler) and @file{.s} extensions. It is also used to
|
|
handle the GNU inline assembler with the @code{asm} keyword.
|
|
|
|
@section Syntax
|
|
|
|
TinyCC Assembler supports most of the gas syntax. The tokens are the
|
|
same as C.
|
|
|
|
@itemize
|
|
|
|
@item C and C++ comments are supported.
|
|
|
|
@item Identifiers are the same as C, so you cannot use '.' or '$'.
|
|
|
|
@item Only 32 bit integer numbers are supported.
|
|
|
|
@end itemize
|
|
|
|
@section Expressions
|
|
|
|
@itemize
|
|
|
|
@item Integers in decimal, octal and hexa are supported.
|
|
|
|
@item Unary operators: +, -, ~.
|
|
|
|
@item Binary operators in decreasing priority order:
|
|
|
|
@enumerate
|
|
@item *, /, %
|
|
@item &, |, ^
|
|
@item +, -
|
|
@end enumerate
|
|
|
|
@item A value is either an absolute number or a label plus an offset.
|
|
All operators accept absolute values except '+' and '-'. '+' or '-' can be
|
|
used to add an offset to a label. '-' supports two labels only if they
|
|
are the same or if they are both defined and in the same section.
|
|
|
|
@end itemize
|
|
|
|
@section Labels
|
|
|
|
@itemize
|
|
|
|
@item All labels are considered as local, except undefined ones.
|
|
|
|
@item Numeric labels can be used as local @code{gas}-like labels.
|
|
They can be defined several times in the same source. Use 'b'
|
|
(backward) or 'f' (forward) as suffix to reference them:
|
|
|
|
@example
|
|
1:
|
|
jmp 1b /* jump to '1' label before */
|
|
jmp 1f /* jump to '1' label after */
|
|
1:
|
|
@end example
|
|
|
|
@end itemize
|
|
|
|
@section Directives
|
|
|
|
All directives are preceeded by a '.'. The following directives are
|
|
supported:
|
|
|
|
@itemize
|
|
@item .align n[,value]
|
|
@item .skip n[,value]
|
|
@item .space n[,value]
|
|
@item .byte value1[,value2...]
|
|
@item .word value1[,value2...]
|
|
@item .short value1[,value2...]
|
|
@item .int value1[,value2...]
|
|
@item .long value1[,value2...]
|
|
@end itemize
|
|
|
|
@section X86 Assembler
|
|
|
|
All X86 opcodes are supported. Only ATT syntax is supported (source
|
|
then destination operand order). If no size suffix is given, TinyCC
|
|
tries to guess it from the operand sizes.
|
|
|
|
Currently, MMX opcodes are supported but not SSE ones.
|
|
|
|
@chapter TinyCC Linker
|
|
|
|
@section ELF file generation
|
|
|
|
TCC can directly output relocatable ELF files (object files),
|
|
executable ELF files and dynamic ELF libraries without relying on an
|
|
external linker.
|
|
|
|
Dynamic ELF libraries can be output but the C compiler does not generate
|
|
position independant code (PIC) code. It means that the dynamic librairy
|
|
code generated by TCC cannot be factorized among processes yet.
|
|
|
|
TCC linker cannot currently suppress unused object code. But TCC
|
|
will soon integrate a novel feature not found in GNU tools: unused code
|
|
will be suppressed at the function or variable level, provided you only
|
|
use TCC to compile your files.
|
|
|
|
@section ELF file loader
|
|
|
|
TCC can load ELF object files, archives (.a files) and dynamic
|
|
libraries (.so).
|
|
|
|
@section GNU Linker Scripts
|
|
|
|
Because on many Linux systems some dynamic libraries (such as
|
|
@file{/usr/lib/libc.so}) are in fact GNU ld link scripts (horrible!),
|
|
TCC linker also support a subset of GNU ld scripts.
|
|
|
|
The @code{GROUP} and @code{FILE} commands are supported.
|
|
|
|
Example from @file{/usr/lib/libc.so}:
|
|
@example
|
|
/* GNU ld script
|
|
Use the shared library, but some functions are only in
|
|
the static library, so try that secondarily. */
|
|
GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a )
|
|
@end example
|
|
|
|
@node bounds
|
|
@chapter TinyCC Memory and Bound checks
|
|
|
|
This feature is activated with the @code{'-b'} (@xref{invoke}).
|
|
|
|
Note that pointer size is @emph{unchanged} and that code generated
|
|
with bound checks is @emph{fully compatible} with unchecked
|
|
code. When a pointer comes from unchecked code, it is assumed to be
|
|
valid. Even very obscure C code with casts should work correctly.
|
|
|
|
To have more information about the ideas behind this method, check at
|
|
@url{http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html}.
|
|
|
|
Here are some examples of catched errors:
|
|
|
|
@table @asis
|
|
|
|
@item Invalid range with standard string function:
|
|
@example
|
|
{
|
|
char tab[10];
|
|
memset(tab, 0, 11);
|
|
}
|
|
@end example
|
|
|
|
@item Bound error in global or local arrays:
|
|
@example
|
|
{
|
|
int tab[10];
|
|
for(i=0;i<11;i++) {
|
|
sum += tab[i];
|
|
}
|
|
}
|
|
@end example
|
|
|
|
@item Bound error in allocated data:
|
|
@example
|
|
{
|
|
int *tab;
|
|
tab = malloc(20 * sizeof(int));
|
|
for(i=0;i<21;i++) {
|
|
sum += tab4[i];
|
|
}
|
|
free(tab);
|
|
}
|
|
@end example
|
|
|
|
@item Access to a freed region:
|
|
@example
|
|
{
|
|
int *tab;
|
|
tab = malloc(20 * sizeof(int));
|
|
free(tab);
|
|
for(i=0;i<20;i++) {
|
|
sum += tab4[i];
|
|
}
|
|
}
|
|
@end example
|
|
|
|
@item Freeing an already freed region:
|
|
@example
|
|
{
|
|
int *tab;
|
|
tab = malloc(20 * sizeof(int));
|
|
free(tab);
|
|
free(tab);
|
|
}
|
|
@end example
|
|
|
|
@end table
|
|
|
|
@node libtcc
|
|
@chapter The @code{libtcc} library
|
|
|
|
The @code{libtcc} library enables you to use TCC as a backend for
|
|
dynamic code generation.
|
|
|
|
Read the @file{libtcc.h} to have an overview of the API. Read
|
|
@file{libtcc_test.c} to have a very simple example.
|
|
|
|
The idea consists in giving a C string containing the program you want
|
|
to compile directly to @code{libtcc}. Then you can access to any global
|
|
symbol (function or variable) defined.
|
|
|
|
@chapter Developper's guide
|
|
|
|
This chapter gives some hints to understand how TCC works. You can skip
|
|
it if you do not intend to modify the TCC code.
|
|
|
|
@section File reading
|
|
|
|
The @code{BufferedFile} structure contains the context needed to read a
|
|
file, including the current line number. @code{tcc_open()} opens a new
|
|
file and @code{tcc_close()} closes it. @code{inp()} returns the next
|
|
character.
|
|
|
|
@section Lexer
|
|
|
|
@code{next()} reads the next token in the current
|
|
file. @code{next_nomacro()} reads the next token without macro
|
|
expansion.
|
|
|
|
@code{tok} contains the current token (see @code{TOK_xxx})
|
|
constants. Identifiers and keywords are also keywords. @code{tokc}
|
|
contains additionnal infos about the token (for example a constant value
|
|
if number or string token).
|
|
|
|
@section Parser
|
|
|
|
The parser is hardcoded (yacc is not necessary). It does only one pass,
|
|
except:
|
|
|
|
@itemize
|
|
|
|
@item For initialized arrays with unknown size, a first pass
|
|
is done to count the number of elements.
|
|
|
|
@item For architectures where arguments are evaluated in
|
|
reverse order, a first pass is done to reverse the argument order.
|
|
|
|
@end itemize
|
|
|
|
@section Types
|
|
|
|
The types are stored in a single 'int' variable. It was choosen in the
|
|
first stages of development when tcc was much simpler. Now, it may not
|
|
be the best solution.
|
|
|
|
@example
|
|
#define VT_INT 0 /* integer type */
|
|
#define VT_BYTE 1 /* signed byte type */
|
|
#define VT_SHORT 2 /* short type */
|
|
#define VT_VOID 3 /* void type */
|
|
#define VT_PTR 4 /* pointer */
|
|
#define VT_ENUM 5 /* enum definition */
|
|
#define VT_FUNC 6 /* function type */
|
|
#define VT_STRUCT 7 /* struct/union definition */
|
|
#define VT_FLOAT 8 /* IEEE float */
|
|
#define VT_DOUBLE 9 /* IEEE double */
|
|
#define VT_LDOUBLE 10 /* IEEE long double */
|
|
#define VT_BOOL 11 /* ISOC99 boolean type */
|
|
#define VT_LLONG 12 /* 64 bit integer */
|
|
#define VT_LONG 13 /* long integer (NEVER USED as type, only
|
|
during parsing) */
|
|
#define VT_BTYPE 0x000f /* mask for basic type */
|
|
#define VT_UNSIGNED 0x0010 /* unsigned type */
|
|
#define VT_ARRAY 0x0020 /* array type (also has VT_PTR) */
|
|
#define VT_BITFIELD 0x0040 /* bitfield modifier */
|
|
|
|
#define VT_STRUCT_SHIFT 16 /* structure/enum name shift (16 bits left) */
|
|
@end example
|
|
|
|
When a reference to another type is needed (for pointers, functions and
|
|
structures), the @code{32 - VT_STRUCT_SHIFT} high order bits are used to
|
|
store an identifier reference.
|
|
|
|
The @code{VT_UNSIGNED} flag can be set for chars, shorts, ints and long
|
|
longs.
|
|
|
|
Arrays are considered as pointers @code{VT_PTR} with the flag
|
|
@code{VT_ARRAY} set.
|
|
|
|
The @code{VT_BITFIELD} flag can be set for chars, shorts, ints and long
|
|
longs. If it is set, then the bitfield position is stored from bits
|
|
VT_STRUCT_SHIFT to VT_STRUCT_SHIFT + 5 and the bit field size is stored
|
|
from bits VT_STRUCT_SHIFT + 6 to VT_STRUCT_SHIFT + 11.
|
|
|
|
@code{VT_LONG} is never used except during parsing.
|
|
|
|
During parsing, the storage of an object is also stored in the type
|
|
integer:
|
|
|
|
@example
|
|
#define VT_EXTERN 0x00000080 /* extern definition */
|
|
#define VT_STATIC 0x00000100 /* static variable */
|
|
#define VT_TYPEDEF 0x00000200 /* typedef definition */
|
|
@end example
|
|
|
|
@section Symbols
|
|
|
|
All symbols are stored in hashed symbol stacks. Each symbol stack
|
|
contains @code{Sym} structures.
|
|
|
|
@code{Sym.v} contains the symbol name (remember
|
|
an idenfier is also a token, so a string is never necessary to store
|
|
it). @code{Sym.t} gives the type of the symbol. @code{Sym.r} is usually
|
|
the register in which the corresponding variable is stored. @code{Sym.c} is
|
|
usually a constant associated to the symbol.
|
|
|
|
Four main symbol stacks are defined:
|
|
|
|
@table @code
|
|
|
|
@item define_stack
|
|
for the macros (@code{#define}s).
|
|
|
|
@item global_stack
|
|
for the global variables, functions and types.
|
|
|
|
@item local_stack
|
|
for the local variables, functions and types.
|
|
|
|
@item global_label_stack
|
|
for the local labels (for @code{goto}).
|
|
|
|
@item label_stack
|
|
for GCC block local labels (see the @code{__label__} keyword).
|
|
|
|
@end table
|
|
|
|
@code{sym_push()} is used to add a new symbol in the local symbol
|
|
stack. If no local symbol stack is active, it is added in the global
|
|
symbol stack.
|
|
|
|
@code{sym_pop(st,b)} pops symbols from the symbol stack @var{st} until
|
|
the symbol @var{b} is on the top of stack. If @var{b} is NULL, the stack
|
|
is emptied.
|
|
|
|
@code{sym_find(v)} return the symbol associated to the identifier
|
|
@var{v}. The local stack is searched first from top to bottom, then the
|
|
global stack.
|
|
|
|
@section Sections
|
|
|
|
The generated code and datas are written in sections. The structure
|
|
@code{Section} contains all the necessary information for a given
|
|
section. @code{new_section()} creates a new section. ELF file semantics
|
|
is assumed for each section.
|
|
|
|
The following sections are predefined:
|
|
|
|
@table @code
|
|
|
|
@item text_section
|
|
is the section containing the generated code. @var{ind} contains the
|
|
current position in the code section.
|
|
|
|
@item data_section
|
|
contains initialized data
|
|
|
|
@item bss_section
|
|
contains uninitialized data
|
|
|
|
@item bounds_section
|
|
@itemx lbounds_section
|
|
are used when bound checking is activated
|
|
|
|
@item stab_section
|
|
@itemx stabstr_section
|
|
are used when debugging is actived to store debug information
|
|
|
|
@item symtab_section
|
|
@itemx strtab_section
|
|
contain the exported symbols (currently only used for debugging).
|
|
|
|
@end table
|
|
|
|
@section Code generation
|
|
|
|
@subsection Introduction
|
|
|
|
The TCC code generator directly generates linked binary code in one
|
|
pass. It is rather unusual these days (see gcc for example which
|
|
generates text assembly), but it allows to be very fast and surprisingly
|
|
not so complicated.
|
|
|
|
The TCC code generator is register based. Optimization is only done at
|
|
the expression level. No intermediate representation of expression is
|
|
kept except the current values stored in the @emph{value stack}.
|
|
|
|
On x86, three temporary registers are used. When more registers are
|
|
needed, one register is flushed in a new local variable.
|
|
|
|
@subsection The value stack
|
|
|
|
When an expression is parsed, its value is pushed on the value stack
|
|
(@var{vstack}). The top of the value stack is @var{vtop}. Each value
|
|
stack entry is the structure @code{SValue}.
|
|
|
|
@code{SValue.t} is the type. @code{SValue.r} indicates how the value is
|
|
currently stored in the generated code. It is usually a CPU register
|
|
index (@code{REG_xxx} constants), but additionnal values and flags are
|
|
defined:
|
|
|
|
@example
|
|
#define VT_CONST 0x00f0
|
|
#define VT_LLOCAL 0x00f1
|
|
#define VT_LOCAL 0x00f2
|
|
#define VT_CMP 0x00f3
|
|
#define VT_JMP 0x00f4
|
|
#define VT_JMPI 0x00f5
|
|
#define VT_LVAL 0x0100
|
|
#define VT_SYM 0x0200
|
|
#define VT_MUSTCAST 0x0400
|
|
#define VT_MUSTBOUND 0x0800
|
|
#define VT_BOUNDED 0x8000
|
|
#define VT_LVAL_BYTE 0x1000
|
|
#define VT_LVAL_SHORT 0x2000
|
|
#define VT_LVAL_UNSIGNED 0x4000
|
|
#define VT_LVAL_TYPE (VT_LVAL_BYTE | VT_LVAL_SHORT | VT_LVAL_UNSIGNED)
|
|
@end example
|
|
|
|
@table @code
|
|
|
|
@item VT_CONST
|
|
indicates that the value is a constant. It is stored in the union
|
|
@code{SValue.c}, depending on its type.
|
|
|
|
@item VT_LOCAL
|
|
indicates a local variable pointer at offset @code{SValue.c.i} in the
|
|
stack.
|
|
|
|
@item VT_CMP
|
|
indicates that the value is actually stored in the CPU flags (i.e. the
|
|
value is the consequence of a test). The value is either 0 or 1. The
|
|
actual CPU flags used is indicated in @code{SValue.c.i}.
|
|
|
|
If any code is generated which destroys the CPU flags, this value MUST be
|
|
put in a normal register.
|
|
|
|
@item VT_JMP
|
|
@itemx VT_JMPI
|
|
indicates that the value is the consequence of a jmp. For VT_JMP, it is
|
|
1 if the jump is taken, 0 otherwise. For VT_JMPI it is inverted.
|
|
|
|
These values are used to compile the @code{||} and @code{&&} logical
|
|
operators.
|
|
|
|
If any code is generated, this value MUST be put in a normal
|
|
register. Otherwise, the generated code won't be executed if the jump is
|
|
taken.
|
|
|
|
@item VT_LVAL
|
|
is a flag indicating that the value is actually an lvalue (left value of
|
|
an assignment). It means that the value stored is actually a pointer to
|
|
the wanted value.
|
|
|
|
Understanding the use @code{VT_LVAL} is very important if you want to
|
|
understand how TCC works.
|
|
|
|
@item VT_LVAL_BYTE
|
|
@itemx VT_LVAL_SHORT
|
|
@itemx VT_LVAL_UNSIGNED
|
|
if the lvalue has an integer type, then these flags give its real
|
|
type. The type alone is not suffisant in case of cast optimisations.
|
|
|
|
@item VT_LLOCAL
|
|
is a saved lvalue on the stack. @code{VT_LLOCAL} should be suppressed
|
|
ASAP because its semantics are rather complicated.
|
|
|
|
@item VT_MUSTCAST
|
|
indicates that a cast to the value type must be performed if the value
|
|
is used (lazy casting).
|
|
|
|
@item VT_SYM
|
|
indicates that the symbol @code{SValue.sym} must be added to the constant.
|
|
|
|
@item VT_MUSTBOUND
|
|
@itemx VT_BOUNDED
|
|
are only used for optional bound checking.
|
|
|
|
@end table
|
|
|
|
@subsection Manipulating the value stack
|
|
|
|
@code{vsetc()} and @code{vset()} pushes a new value on the value
|
|
stack. If the previous @code{vtop} was stored in a very unsafe place(for
|
|
example in the CPU flags), then some code is generated to put the
|
|
previous @code{vtop} in a safe storage.
|
|
|
|
@code{vpop()} pops @code{vtop}. In some cases, it also generates cleanup
|
|
code (for example if stacked floating point registers are used as on
|
|
x86).
|
|
|
|
The @code{gv(rc)} function generates code to evaluate @code{vtop} (the
|
|
top value of the stack) into registers. @var{rc} selects in which
|
|
register class the value should be put. @code{gv()} is the @emph{most
|
|
important function} of the code generator.
|
|
|
|
@code{gv2()} is the same as @code{gv()} but for the top two stack
|
|
entries.
|
|
|
|
@subsection CPU dependent code generation
|
|
|
|
See the @file{i386-gen.c} file to have an example.
|
|
|
|
@table @code
|
|
|
|
@item load()
|
|
must generate the code needed to load a stack value into a register.
|
|
|
|
@item store()
|
|
must generate the code needed to store a register into a stack value
|
|
lvalue.
|
|
|
|
@item gfunc_start()
|
|
@itemx gfunc_param()
|
|
@itemx gfunc_call()
|
|
should generate a function call
|
|
|
|
@item gfunc_prolog()
|
|
@itemx gfunc_epilog()
|
|
should generate a function prolog/epilog.
|
|
|
|
@item gen_opi(op)
|
|
must generate the binary integer operation @var{op} on the two top
|
|
entries of the stack which are guaranted to contain integer types.
|
|
|
|
The result value should be put on the stack.
|
|
|
|
@item gen_opf(op)
|
|
same as @code{gen_opi()} for floating point operations. The two top
|
|
entries of the stack are guaranted to contain floating point values of
|
|
same types.
|
|
|
|
@item gen_cvt_itof()
|
|
integer to floating point conversion.
|
|
|
|
@item gen_cvt_ftoi()
|
|
floating point to integer conversion.
|
|
|
|
@item gen_cvt_ftof()
|
|
floating point to floating point of different size conversion.
|
|
|
|
@item gen_bounded_ptr_add()
|
|
@item gen_bounded_ptr_deref()
|
|
are only used for bound checking.
|
|
|
|
@end table
|
|
|
|
@section Optimizations done
|
|
|
|
Constant propagation is done for all operations. Multiplications and
|
|
divisions are optimized to shifts when appropriate. Comparison
|
|
operators are optimized by maintaining a special cache for the
|
|
processor flags. &&, || and ! are optimized by maintaining a special
|
|
'jump target' value. No other jump optimization is currently performed
|
|
because it would require to store the code in a more abstract fashion.
|
|
|