| \input texinfo @c -*-texinfo-*- |
| @c %**start of header |
| @setfilename g++int.info |
| @settitle G++ internals |
| @setchapternewpage odd |
| @c %**end of header |
| |
| @node Top, Limitations of g++, (dir), (dir) |
| @chapter Internal Architecture of the Compiler |
| |
| This is meant to describe the C++ front-end for gcc in detail. |
| Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}. |
| |
| @menu |
| * Limitations of g++:: |
| * Routines:: |
| * Implementation Specifics:: |
| * Glossary:: |
| * Macros:: |
| * Typical Behavior:: |
| * Coding Conventions:: |
| * Templates:: |
| * Access Control:: |
| * Error Reporting:: |
| * Parser:: |
| * Exception Handling:: |
| * Free Store:: |
| * Mangling:: Function name mangling for C++ and Java |
| * Concept Index:: |
| @end menu |
| |
| @node Limitations of g++, Routines, Top, Top |
| @section Limitations of g++ |
| |
| @itemize @bullet |
| @item |
| Limitations on input source code: 240 nesting levels with the parser |
| stacksize (YYSTACKSIZE) set to 500 (the default), and requires around |
| 16.4k swap space per nesting level. The parser needs about 2.09 * |
| number of nesting levels worth of stackspace. |
| |
| @cindex pushdecl_class_level |
| @item |
| I suspect there are other uses of pushdecl_class_level that do not call |
| set_identifier_type_value in tandem with the call to |
| pushdecl_class_level. It would seem to be an omission. |
| |
| @cindex access checking |
| @item |
| Access checking is unimplemented for nested types. |
| |
| @cindex @code{volatile} |
| @item |
| @code{volatile} is not implemented in general. |
| |
| @end itemize |
| |
| @node Routines, Implementation Specifics, Limitations of g++, Top |
| @section Routines |
| |
| This section describes some of the routines used in the C++ front-end. |
| |
| @code{build_vtable} and @code{prepare_fresh_vtable} is used only within |
| the @file{cp-class.c} file, and only in @code{finish_struct} and |
| @code{modify_vtable_entries}. |
| |
| @code{build_vtable}, @code{prepare_fresh_vtable}, and |
| @code{finish_struct} are the only routines that set @code{DECL_VPARENT}. |
| |
| @code{finish_struct} can steal the virtual function table from parents, |
| this prohibits related_vslot from working. When finish_struct steals, |
| we know that |
| |
| @example |
| get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0) |
| @end example |
| |
| @noindent |
| will get the related binfo. |
| |
| @code{layout_basetypes} does something with the VIRTUALS. |
| |
| Supposedly (according to Tiemann) most of the breadth first searching |
| done, like in @code{get_base_distance} and in @code{get_binfo} was not |
| because of any design decision. I have since found out the at least one |
| part of the compiler needs the notion of depth first binfo searching, I |
| am going to try and convert the whole thing, it should just work. The |
| term left-most refers to the depth first left-most node. It uses |
| @code{MAIN_VARIANT == type} as the condition to get left-most, because |
| the things that have @code{BINFO_OFFSET}s of zero are shared and will |
| have themselves as their own @code{MAIN_VARIANT}s. The non-shared right |
| ones, are copies of the left-most one, hence if it is its own |
| @code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is |
| a non-left-most one. |
| |
| @code{get_base_distance}'s path and distance matters in its use in: |
| |
| @itemize @bullet |
| @item |
| @code{prepare_fresh_vtable} (the code is probably wrong) |
| @item |
| @code{init_vfields} Depends upon distance probably in a safe way, |
| build_offset_ref might use partial paths to do further lookups, |
| hack_identifier is probably not properly checking access. |
| |
| @item |
| @code{get_first_matching_virtual} probably should check for |
| @code{get_base_distance} returning -2. |
| |
| @item |
| @code{resolve_offset_ref} should be called in a more deterministic |
| manner. Right now, it is called in some random contexts, like for |
| arguments at @code{build_method_call} time, @code{default_conversion} |
| time, @code{convert_arguments} time, @code{build_unary_op} time, |
| @code{build_c_cast} time, @code{build_modify_expr} time, |
| @code{convert_for_assignment} time, and |
| @code{convert_for_initialization} time. |
| |
| But, there are still more contexts it needs to be called in, one was the |
| ever simple: |
| |
| @example |
| if (obj.*pmi != 7) |
| @dots{} |
| @end example |
| |
| Seems that the problems were due to the fact that @code{TREE_TYPE} of |
| the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type |
| of the referent (like @code{INTEGER_TYPE}). This problem was fixed by |
| changing @code{default_conversion} to check @code{TREE_CODE (x)}, |
| instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it |
| was @code{OFFSET_TYPE}. |
| |
| @end itemize |
| |
| @node Implementation Specifics, Glossary, Routines, Top |
| @section Implementation Specifics |
| |
| @itemize @bullet |
| @item Explicit Initialization |
| |
| The global list @code{current_member_init_list} contains the list of |
| mem-initializers specified in a constructor declaration. For example: |
| |
| @example |
| foo::foo() : a(1), b(2) @{@} |
| @end example |
| |
| @noindent |
| will initialize @samp{a} with 1 and @samp{b} with 2. |
| @code{expand_member_init} places each initialization (a with 1) on the |
| global list. Then, when the fndecl is being processed, |
| @code{emit_base_init} runs down the list, initializing them. It used to |
| be the case that g++ first ran down @code{current_member_init_list}, |
| then ran down the list of members initializing the ones that weren't |
| explicitly initialized. Things were rewritten to perform the |
| initializations in order of declaration in the class. So, for the above |
| example, @samp{a} and @samp{b} will be initialized in the order that |
| they were declared: |
| |
| @example |
| class foo @{ public: int b; int a; foo (); @}; |
| @end example |
| |
| @noindent |
| Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be |
| initialized with 1, regardless of how they're listed in the mem-initializer. |
| |
| @item The Explicit Keyword |
| |
| The use of @code{explicit} on a constructor is used by @code{grokdeclarator} |
| to set the field @code{DECL_NONCONVERTING_P}. That value is used by |
| @code{build_method_call} and @code{build_user_type_conversion_1} to decide |
| if a particular constructor should be used as a candidate for conversions. |
| |
| @end itemize |
| |
| @node Glossary, Macros, Implementation Specifics, Top |
| @section Glossary |
| |
| @table @r |
| @item binfo |
| The main data structure in the compiler used to represent the |
| inheritance relationships between classes. The data in the binfo can be |
| accessed by the BINFO_ accessor macros. |
| |
| @item vtable |
| @itemx virtual function table |
| |
| The virtual function table holds information used in virtual function |
| dispatching. In the compiler, they are usually referred to as vtables, |
| or vtbls. The first index is not used in the normal way, I believe it |
| is probably used for the virtual destructor. |
| |
| @item vfield |
| |
| vfields can be thought of as the base information needed to build |
| vtables. For every vtable that exists for a class, there is a vfield. |
| See also vtable and virtual function table pointer. When a type is used |
| as a base class to another type, the virtual function table for the |
| derived class can be based upon the vtable for the base class, just |
| extended to include the additional virtual methods declared in the |
| derived class. The virtual function table from a virtual base class is |
| never reused in a derived class. @code{is_normal} depends upon this. |
| |
| @item virtual function table pointer |
| |
| These are @code{FIELD_DECL}s that are pointer types that point to |
| vtables. See also vtable and vfield. |
| @end table |
| |
| @node Macros, Typical Behavior, Glossary, Top |
| @section Macros |
| |
| This section describes some of the macros used on trees. The list |
| should be alphabetical. Eventually all macros should be documented |
| here. |
| |
| @table @code |
| @item BINFO_BASETYPES |
| A vector of additional binfos for the types inherited by this basetype. |
| The binfos are fully unshared (except for virtual bases, in which |
| case the binfo structure is shared). |
| |
| If this basetype describes type D as inherited in C, |
| and if the basetypes of D are E anf F, |
| then this vector contains binfos for inheritance of E and F by C. |
| |
| Has values of: |
| |
| TREE_VECs |
| |
| |
| @item BINFO_INHERITANCE_CHAIN |
| Temporarily used to represent specific inheritances. It usually points |
| to the binfo associated with the lesser derived type, but it can be |
| reversed by reverse_path. For example: |
| |
| @example |
| Z ZbY least derived |
| | |
| Y YbX |
| | |
| X Xb most derived |
| |
| TYPE_BINFO (X) == Xb |
| BINFO_INHERITANCE_CHAIN (Xb) == YbX |
| BINFO_INHERITANCE_CHAIN (Yb) == ZbY |
| BINFO_INHERITANCE_CHAIN (Zb) == 0 |
| @end example |
| |
| Not sure is the above is really true, get_base_distance has is point |
| towards the most derived type, opposite from above. |
| |
| Set by build_vbase_path, recursive_bounded_basetype_p, |
| get_base_distance, lookup_field, lookup_fnfields, and reverse_path. |
| |
| What things can this be used on: |
| |
| TREE_VECs that are binfos |
| |
| |
| @item BINFO_OFFSET |
| The offset where this basetype appears in its containing type. |
| BINFO_OFFSET slot holds the offset (in bytes) from the base of the |
| complete object to the base of the part of the object that is allocated |
| on behalf of this `type'. This is always 0 except when there is |
| multiple inheritance. |
| |
| Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example. |
| |
| |
| @item BINFO_VIRTUALS |
| A unique list of functions for the virtual function table. See also |
| TYPE_BINFO_VIRTUALS. |
| |
| What things can this be used on: |
| |
| TREE_VECs that are binfos |
| |
| |
| @item BINFO_VTABLE |
| Used to find the VAR_DECL that is the virtual function table associated |
| with this binfo. See also TYPE_BINFO_VTABLE. To get the virtual |
| function table pointer, see CLASSTYPE_VFIELD. |
| |
| What things can this be used on: |
| |
| TREE_VECs that are binfos |
| |
| Has values of: |
| |
| VAR_DECLs that are virtual function tables |
| |
| |
| @item BLOCK_SUPERCONTEXT |
| In the outermost scope of each function, it points to the FUNCTION_DECL |
| node. It aids in better DWARF support of inline functions. |
| |
| |
| @item CLASSTYPE_TAGS |
| CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a |
| class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans |
| these and calls pushtag on them.) |
| |
| finish_struct scans these to produce TYPE_DECLs to add to the |
| TYPE_FIELDS of the type. |
| |
| It is expected that name found in the TREE_PURPOSE slot is unique, |
| resolve_scope_to_name is one such place that depends upon this |
| uniqueness. |
| |
| |
| @item CLASSTYPE_METHOD_VEC |
| The following is true after finish_struct has been called (on the |
| class?) but not before. Before finish_struct is called, things are |
| different to some extent. Contains a TREE_VEC of methods of the class. |
| The TREE_VEC_LENGTH is the number of differently named methods plus one |
| for the 0th entry. The 0th entry is always allocated, and reserved for |
| ctors and dtors. If there are none, TREE_VEC_ELT(N,0) == NULL_TREE. |
| Each entry of the TREE_VEC is a FUNCTION_DECL. For each FUNCTION_DECL, |
| there is a DECL_CHAIN slot. If the FUNCTION_DECL is the last one with a |
| given name, the DECL_CHAIN slot is NULL_TREE. Otherwise it is the next |
| method that has the same name (but a different signature). It would |
| seem that it is not true that because the DECL_CHAIN slot is used in |
| this way, we cannot call pushdecl to put the method in the global scope |
| (cause that would overwrite the TREE_CHAIN slot), because they use |
| different _CHAINs. finish_struct_methods setups up one version of the |
| TREE_CHAIN slots on the FUNCTION_DECLs. |
| |
| friends are kept in TREE_LISTs, so that there's no need to use their |
| TREE_CHAIN slot for anything. |
| |
| Has values of: |
| |
| TREE_VECs |
| |
| |
| @item CLASSTYPE_VFIELD |
| Seems to be in the process of being renamed TYPE_VFIELD. Use on types |
| to get the main virtual function table pointer. To get the virtual |
| function table use BINFO_VTABLE (TYPE_BINFO ()). |
| |
| Has values of: |
| |
| FIELD_DECLs that are virtual function table pointers |
| |
| What things can this be used on: |
| |
| RECORD_TYPEs |
| |
| |
| @item DECL_CLASS_CONTEXT |
| Identifies the context that the _DECL was found in. For virtual function |
| tables, it points to the type associated with the virtual function |
| table. See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT. |
| |
| The difference between this and DECL_CONTEXT, is that for virtuals |
| functions like: |
| |
| @example |
| struct A |
| @{ |
| virtual int f (); |
| @}; |
| |
| struct B : A |
| @{ |
| int f (); |
| @}; |
| |
| DECL_CONTEXT (A::f) == A |
| DECL_CLASS_CONTEXT (A::f) == A |
| |
| DECL_CONTEXT (B::f) == A |
| DECL_CLASS_CONTEXT (B::f) == B |
| @end example |
| |
| Has values of: |
| |
| RECORD_TYPEs, or UNION_TYPEs |
| |
| What things can this be used on: |
| |
| TYPE_DECLs, _DECLs |
| |
| |
| @item DECL_CONTEXT |
| Identifies the context that the _DECL was found in. Can be used on |
| virtual function tables to find the type associated with the virtual |
| function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a |
| better access method. Internally the same as DECL_FIELD_CONTEXT, so |
| don't us both. See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and |
| DECL_CLASS_CONTEXT. |
| |
| Has values of: |
| |
| RECORD_TYPEs |
| |
| |
| What things can this be used on: |
| |
| @display |
| VAR_DECLs that are virtual function tables |
| _DECLs |
| @end display |
| |
| |
| @item DECL_FIELD_CONTEXT |
| Identifies the context that the FIELD_DECL was found in. Internally the |
| same as DECL_CONTEXT, so don't us both. See also DECL_CONTEXT, |
| DECL_FCONTEXT and DECL_CLASS_CONTEXT. |
| |
| Has values of: |
| |
| RECORD_TYPEs |
| |
| What things can this be used on: |
| |
| @display |
| FIELD_DECLs that are virtual function pointers |
| FIELD_DECLs |
| @end display |
| |
| |
| @item DECL_NAME |
| |
| Has values of: |
| |
| @display |
| 0 for things that don't have names |
| IDENTIFIER_NODEs for TYPE_DECLs |
| @end display |
| |
| @item DECL_IGNORED_P |
| A bit that can be set to inform the debug information output routines in |
| the back-end that a certain _DECL node should be totally ignored. |
| |
| Used in cases where it is known that the debugging information will be |
| output in another file, or where a sub-type is known not to be needed |
| because the enclosing type is not needed. |
| |
| A compiler constructed virtual destructor in derived classes that do not |
| define an explicit destructor that was defined explicit in a base class |
| has this bit set as well. Also used on __FUNCTION__ and |
| __PRETTY_FUNCTION__ to mark they are ``compiler generated.'' c-decl and |
| c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,'' |
| and ``user-invisible variable.'' |
| |
| Functions built by the C++ front-end such as default destructors, |
| virtual destructors and default constructors want to be marked that |
| they are compiler generated, but unsure why. |
| |
| Currently, it is used in an absolute way in the C++ front-end, as an |
| optimization, to tell the debug information output routines to not |
| generate debugging information that will be output by another separately |
| compiled file. |
| |
| |
| @item DECL_VIRTUAL_P |
| A flag used on FIELD_DECLs and VAR_DECLs. (Documentation in tree.h is |
| wrong.) Used in VAR_DECLs to indicate that the variable is a vtable. |
| It is also used in FIELD_DECLs for vtable pointers. |
| |
| What things can this be used on: |
| |
| FIELD_DECLs and VAR_DECLs |
| |
| |
| @item DECL_VPARENT |
| Used to point to the parent type of the vtable if there is one, else it |
| is just the type associated with the vtable. Because of the sharing of |
| virtual function tables that goes on, this slot is not very useful, and |
| is in fact, not used in the compiler at all. It can be removed. |
| |
| What things can this be used on: |
| |
| VAR_DECLs that are virtual function tables |
| |
| Has values of: |
| |
| RECORD_TYPEs maybe UNION_TYPEs |
| |
| |
| @item DECL_FCONTEXT |
| Used to find the first baseclass in which this FIELD_DECL is defined. |
| See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT. |
| |
| How it is used: |
| |
| Used when writing out debugging information about vfield and |
| vbase decls. |
| |
| What things can this be used on: |
| |
| FIELD_DECLs that are virtual function pointers |
| FIELD_DECLs |
| |
| |
| @item DECL_REFERENCE_SLOT |
| Used to hold the initialize for the reference. |
| |
| What things can this be used on: |
| |
| PARM_DECLs and VAR_DECLs that have a reference type |
| |
| |
| @item DECL_VINDEX |
| Used for FUNCTION_DECLs in two different ways. Before the structure |
| containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a |
| FUNCTION_DECL in a base class which is the FUNCTION_DECL which this |
| FUNCTION_DECL will replace as a virtual function. When the class is |
| laid out, this pointer is changed to an INTEGER_CST node which is |
| suitable to find an index into the virtual function table. See |
| get_vtable_entry as to how one can find the right index into the virtual |
| function table. The first index 0, of a virtual function table it not |
| used in the normal way, so the first real index is 1. |
| |
| DECL_VINDEX may be a TREE_LIST, that would seem to be a list of |
| overridden FUNCTION_DECLs. add_virtual_function has code to deal with |
| this when it uses the variable base_fndecl_list, but it would seem that |
| somehow, it is possible for the TREE_LIST to pursist until method_call, |
| and it should not. |
| |
| |
| What things can this be used on: |
| |
| FUNCTION_DECLs |
| |
| |
| @item DECL_SOURCE_FILE |
| Identifies what source file a particular declaration was found in. |
| |
| Has values of: |
| |
| "<built-in>" on TYPE_DECLs to mean the typedef is built in |
| |
| |
| @item DECL_SOURCE_LINE |
| Identifies what source line number in the source file the declaration |
| was found at. |
| |
| Has values of: |
| |
| @display |
| 0 for an undefined label |
| |
| 0 for TYPE_DECLs that are internally generated |
| |
| 0 for FUNCTION_DECLs for functions generated by the compiler |
| (not yet, but should be) |
| |
| 0 for ``magic'' arguments to functions, that the user has no |
| control over |
| @end display |
| |
| |
| @item TREE_USED |
| |
| Has values of: |
| |
| 0 for unused labels |
| |
| |
| @item TREE_ADDRESSABLE |
| A flag that is set for any type that has a constructor. |
| |
| |
| @item TREE_COMPLEXITY |
| They seem a kludge way to track recursion, poping, and pushing. They only |
| appear in cp-decl.c and cp-decl2.c, so the are a good candidate for |
| proper fixing, and removal. |
| |
| |
| @item TREE_HAS_CONSTRUCTOR |
| A flag to indicate when a CALL_EXPR represents a call to a constructor. |
| If set, we know that the type of the object, is the complete type of the |
| object, and that the value returned is nonnull. When used in this |
| fashion, it is an optimization. Can also be used on SAVE_EXPRs to |
| indicate when they are of fixed type and nonnull. Can also be used on |
| INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor. |
| |
| |
| @item TREE_PRIVATE |
| Set for FIELD_DECLs by finish_struct. But not uniformly set. |
| |
| The following routines do something with PRIVATE access: |
| build_method_call, alter_access, finish_struct_methods, |
| finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType, |
| CWriteUseObject, compute_access, lookup_field, dfs_pushdecl, |
| GNU_xref_member, dbxout_type_fields, dbxout_type_method_1 |
| |
| |
| @item TREE_PROTECTED |
| The following routines do something with PROTECTED access: |
| build_method_call, alter_access, finish_struct, convert_to_aggr, |
| CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject, |
| compute_access, lookup_field, GNU_xref_member, dbxout_type_fields, |
| dbxout_type_method_1 |
| |
| |
| @item TYPE_BINFO |
| Used to get the binfo for the type. |
| |
| Has values of: |
| |
| TREE_VECs that are binfos |
| |
| What things can this be used on: |
| |
| RECORD_TYPEs |
| |
| |
| @item TYPE_BINFO_BASETYPES |
| See also BINFO_BASETYPES. |
| |
| @item TYPE_BINFO_VIRTUALS |
| A unique list of functions for the virtual function table. See also |
| BINFO_VIRTUALS. |
| |
| What things can this be used on: |
| |
| RECORD_TYPEs |
| |
| |
| @item TYPE_BINFO_VTABLE |
| Points to the virtual function table associated with the given type. |
| See also BINFO_VTABLE. |
| |
| What things can this be used on: |
| |
| RECORD_TYPEs |
| |
| Has values of: |
| |
| VAR_DECLs that are virtual function tables |
| |
| |
| @item TYPE_NAME |
| Names the type. |
| |
| Has values of: |
| |
| @display |
| 0 for things that don't have names. |
| should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and |
| ENUM_TYPEs. |
| TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but |
| shouldn't be. |
| TYPE_DECL for typedefs, unsure why. |
| @end display |
| |
| What things can one use this on: |
| |
| @display |
| TYPE_DECLs |
| RECORD_TYPEs |
| UNION_TYPEs |
| ENUM_TYPEs |
| @end display |
| |
| History: |
| |
| It currently points to the TYPE_DECL for RECORD_TYPEs, |
| UNION_TYPEs and ENUM_TYPEs, but it should be history soon. |
| |
| |
| @item TYPE_METHODS |
| Synonym for @code{CLASSTYPE_METHOD_VEC}. Chained together with |
| @code{TREE_CHAIN}. @file{dbxout.c} uses this to get at the methods of a |
| class. |
| |
| |
| @item TYPE_DECL |
| Used to represent typedefs, and used to represent bindings layers. |
| |
| Components: |
| |
| DECL_NAME is the name of the typedef. For example, foo would |
| be found in the DECL_NAME slot when @code{typedef int foo;} is |
| seen. |
| |
| DECL_SOURCE_LINE identifies what source line number in the |
| source file the declaration was found at. A value of 0 |
| indicates that this TYPE_DECL is just an internal binding layer |
| marker, and does not correspond to a user supplied typedef. |
| |
| DECL_SOURCE_FILE |
| |
| @item TYPE_FIELDS |
| A linked list (via @code{TREE_CHAIN}) of member types of a class. The |
| list can contain @code{TYPE_DECL}s, but there can also be other things |
| in the list apparently. See also @code{CLASSTYPE_TAGS}. |
| |
| |
| @item TYPE_VIRTUAL_P |
| A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is |
| a virtual function table or a pointer to one. When used on a |
| @code{FUNCTION_DECL}, indicates that it is a virtual function. When |
| used on an @code{IDENTIFIER_NODE}, indicates that a function with this |
| same name exists and has been declared virtual. |
| |
| When used on types, it indicates that the type has virtual functions, or |
| is derived from one that does. |
| |
| Not sure if the above about virtual function tables is still true. See |
| also info on @code{DECL_VIRTUAL_P}. |
| |
| What things can this be used on: |
| |
| FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs |
| |
| |
| @item VF_BASETYPE_VALUE |
| Get the associated type from the binfo that caused the given vfield to |
| exist. This is the least derived class (the most parent class) that |
| needed a virtual function table. It is probably the case that all uses |
| of this field are misguided, but they need to be examined on a |
| case-by-case basis. See history for more information on why the |
| previous statement was made. |
| |
| Set at @code{finish_base_struct} time. |
| |
| What things can this be used on: |
| |
| TREE_LISTs that are vfields |
| |
| History: |
| |
| This field was used to determine if a virtual function table's |
| slot should be filled in with a certain virtual function, by |
| checking to see if the type returned by VF_BASETYPE_VALUE was a |
| parent of the context in which the old virtual function existed. |
| This incorrectly assumes that a given type _could_ not appear as |
| a parent twice in a given inheritance lattice. For single |
| inheritance, this would in fact work, because a type could not |
| possibly appear more than once in an inheritance lattice, but |
| with multiple inheritance, a type can appear more than once. |
| |
| |
| @item VF_BINFO_VALUE |
| Identifies the binfo that caused this vfield to exist. If this vfield |
| is from the first direct base class that has a virtual function table, |
| then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the |
| direct base where the vfield came from. Can use @code{TREE_VIA_VIRTUAL} |
| on result to find out if it is a virtual base class. Related to the |
| binfo found by |
| |
| @example |
| get_binfo (VF_BASETYPE_VALUE (vfield), t, 0) |
| @end example |
| |
| @noindent |
| where @samp{t} is the type that has the given vfield. |
| |
| @example |
| get_binfo (VF_BASETYPE_VALUE (vfield), t, 0) |
| @end example |
| |
| @noindent |
| will return the binfo for the given vfield. |
| |
| May or may not be set at @code{modify_vtable_entries} time. Set at |
| @code{finish_base_struct} time. |
| |
| What things can this be used on: |
| |
| TREE_LISTs that are vfields |
| |
| |
| @item VF_DERIVED_VALUE |
| Identifies the type of the most derived class of the vfield, excluding |
| the class this vfield is for. |
| |
| Set at @code{finish_base_struct} time. |
| |
| What things can this be used on: |
| |
| TREE_LISTs that are vfields |
| |
| |
| @item VF_NORMAL_VALUE |
| Identifies the type of the most derived class of the vfield, including |
| the class this vfield is for. |
| |
| Set at @code{finish_base_struct} time. |
| |
| What things can this be used on: |
| |
| TREE_LISTs that are vfields |
| |
| |
| @item WRITABLE_VTABLES |
| This is a option that can be defined when building the compiler, that |
| will cause the compiler to output vtables into the data segment so that |
| the vtables maybe written. This is undefined by default, because |
| normally the vtables should be unwritable. People that implement object |
| I/O facilities may, or people that want to change the dynamic type of |
| objects may want to have the vtables writable. Another way of achieving |
| this would be to make a copy of the vtable into writable memory, but the |
| drawback there is that that method only changes the type for one object. |
| |
| @end table |
| |
| @node Typical Behavior, Coding Conventions, Macros, Top |
| @section Typical Behavior |
| |
| @cindex parse errors |
| |
| Whenever seemingly normal code fails with errors like |
| @code{syntax error at `\@{'}, it's highly likely that grokdeclarator is |
| returning a NULL_TREE for whatever reason. |
| |
| @node Coding Conventions, Templates, Typical Behavior, Top |
| @section Coding Conventions |
| |
| It should never be that case that trees are modified in-place by the |
| back-end, @emph{unless} it is guaranteed that the semantics are the same |
| no matter how shared the tree structure is. @file{fold-const.c} still |
| has some cases where this is not true, but rms hypothesizes that this |
| will never be a problem. |
| |
| @node Templates, Access Control, Coding Conventions, Top |
| @section Templates |
| |
| A template is represented by a @code{TEMPLATE_DECL}. The specific |
| fields used are: |
| |
| @table @code |
| @item DECL_TEMPLATE_RESULT |
| The generic decl on which instantiations are based. This looks just |
| like any other decl. |
| |
| @item DECL_TEMPLATE_PARMS |
| The parameters to this template. |
| @end table |
| |
| The generic decl is parsed as much like any other decl as possible, |
| given the parameterization. The template decl is not built up until the |
| generic decl has been completed. For template classes, a template decl |
| is generated for each member function and static data member, as well. |
| |
| Template members of template classes are represented by a TEMPLATE_DECL |
| for the class' parameters around another TEMPLATE_DECL for the member's |
| parameters. |
| |
| All declarations that are instantiations or specializations of templates |
| refer to their template and parameters through DECL_TEMPLATE_INFO. |
| |
| How should I handle parsing member functions with the proper param |
| decls? Set them up again or try to use the same ones? Currently we do |
| the former. We can probably do this without any extra machinery in |
| store_pending_inline, by deducing the parameters from the decl in |
| do_pending_inlines. PRE_PARSED_TEMPLATE_DECL? |
| |
| If a base is a parm, we can't check anything about it. If a base is not |
| a parm, we need to check it for name binding. Do finish_base_struct if |
| no bases are parameterized (only if none, including indirect, are |
| parms). Nah, don't bother trying to do any of this until instantiation |
| -- we only need to do name binding in advance. |
| |
| Always set up method vec and fields, inc. synthesized methods. Really? |
| We can't know the types of the copy folks, or whether we need a |
| destructor, or can have a default ctor, until we know our bases and |
| fields. Otherwise, we can assume and fix ourselves later. Hopefully. |
| |
| @node Access Control, Error Reporting, Templates, Top |
| @section Access Control |
| The function compute_access returns one of three values: |
| |
| @table @code |
| @item access_public |
| means that the field can be accessed by the current lexical scope. |
| |
| @item access_protected |
| means that the field cannot be accessed by the current lexical scope |
| because it is protected. |
| |
| @item access_private |
| means that the field cannot be accessed by the current lexical scope |
| because it is private. |
| @end table |
| |
| DECL_ACCESS is used for access declarations; alter_access creates a list |
| of types and accesses for a given decl. |
| |
| Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return |
| codes of compute_access and were used as a cache for compute_access. |
| Now they are not used at all. |
| |
| TREE_PROTECTED and TREE_PRIVATE are used to record the access levels |
| granted by the containing class. BEWARE: TREE_PUBLIC means something |
| completely unrelated to access control! |
| |
| @node Error Reporting, Parser, Access Control, Top |
| @section Error Reporting |
| |
| The C++ front-end uses a call-back mechanism to allow functions to print |
| out reasonable strings for types and functions without putting extra |
| logic in the functions where errors are found. The interface is through |
| the @code{cp_error} function (or @code{cp_warning}, etc.). The |
| syntax is exactly like that of @code{error}, except that a few more |
| conversions are supported: |
| |
| @itemize @bullet |
| @item |
| %C indicates a value of `enum tree_code'. |
| @item |
| %D indicates a *_DECL node. |
| @item |
| %E indicates a *_EXPR node. |
| @item |
| %L indicates a value of `enum languages'. |
| @item |
| %P indicates the name of a parameter (i.e. "this", "1", "2", ...) |
| @item |
| %T indicates a *_TYPE node. |
| @item |
| %O indicates the name of an operator (MODIFY_EXPR -> "operator ="). |
| |
| @end itemize |
| |
| There is some overlap between these; for instance, any of the node |
| options can be used for printing an identifier (though only @code{%D} |
| tries to decipher function names). |
| |
| For a more verbose message (@code{class foo} as opposed to just @code{foo}, |
| including the return type for functions), use @code{%#c}. |
| To have the line number on the error message indicate the line of the |
| DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want, |
| use @code{%+D}, or it will default to the first. |
| |
| @node Parser, Exception Handling, Error Reporting, Top |
| @section Parser |
| |
| Some comments on the parser: |
| |
| The @code{after_type_declarator} / @code{notype_declarator} hack is |
| necessary in order to allow redeclarations of @code{TYPENAME}s, for |
| instance |
| |
| @example |
| typedef int foo; |
| class A @{ |
| char *foo; |
| @}; |
| @end example |
| |
| In the above, the first @code{foo} is parsed as a @code{notype_declarator}, |
| and the second as a @code{after_type_declarator}. |
| |
| Ambiguities: |
| |
| There are currently four reduce/reduce ambiguities in the parser. They are: |
| |
| 1) Between @code{template_parm} and |
| @code{named_class_head_sans_basetype}, for the tokens @code{aggr |
| identifier}. This situation occurs in code looking like |
| |
| @example |
| template <class T> class A @{ @}; |
| @end example |
| |
| It is ambiguous whether @code{class T} should be parsed as the |
| declaration of a template type parameter named @code{T} or an unnamed |
| constant parameter of type @code{class T}. Section 14.6, paragraph 3 of |
| the January '94 working paper states that the first interpretation is |
| the correct one. This ambiguity results in two reduce/reduce conflicts. |
| |
| 2) Between @code{primary} and @code{type_id} for code like @samp{int()} |
| in places where both can be accepted, such as the argument to |
| @code{sizeof}. Section 8.1 of the pre-San Diego working paper specifies |
| that these ambiguous constructs will be interpreted as @code{typename}s. |
| This ambiguity results in six reduce/reduce conflicts between |
| @samp{absdcl} and @samp{functional_cast}. |
| |
| 3) Between @code{functional_cast} and |
| @code{complex_direct_notype_declarator}, for various token strings. |
| This situation occurs in code looking like |
| |
| @example |
| int (*a); |
| @end example |
| |
| This code is ambiguous; it could be a declaration of the variable |
| @samp{a} as a pointer to @samp{int}, or it could be a functional cast of |
| @samp{*a} to @samp{int}. Section 6.8 specifies that the former |
| interpretation is correct. This ambiguity results in 7 reduce/reduce |
| conflicts. Another aspect of this ambiguity is code like 'int (x[2]);', |
| which is resolved at the '[' and accounts for 6 reduce/reduce conflicts |
| between @samp{direct_notype_declarator} and |
| @samp{primary}/@samp{overqualified_id}. Finally, there are 4 r/r |
| conflicts between @samp{expr_or_declarator} and @samp{primary} over code |
| like 'int (a);', which could probably be resolved but would also |
| probably be more trouble than it's worth. In all, this situation |
| accounts for 17 conflicts. Ack! |
| |
| The second case above is responsible for the failure to parse 'LinppFile |
| ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave |
| Math.h++) as an object declaration, and must be fixed so that it does |
| not resolve until later. |
| |
| 4) Indirectly between @code{after_type_declarator} and @code{parm}, for |
| type names. This occurs in (as one example) code like |
| |
| @example |
| typedef int foo, bar; |
| class A @{ |
| foo (bar); |
| @}; |
| @end example |
| |
| What is @code{bar} inside the class definition? We currently interpret |
| it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an |
| @code{after_type_declarator}. I believe that xlC is correct, in light |
| of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that |
| could possibly be a type name is taken as the @i{decl-specifier-seq} of |
| a @i{declaration}." However, it seems clear that this rule must be |
| violated in the case of constructors. This ambiguity accounts for 8 |
| conflicts. |
| |
| Unlike the others, this ambiguity is not recognized by the Working Paper. |
| |
| @node Exception Handling, Free Store, Parser, Top |
| @section Exception Handling |
| |
| Note, exception handling in g++ is still under development. |
| |
| This section describes the mapping of C++ exceptions in the C++ |
| front-end, into the back-end exception handling framework. |
| |
| The basic mechanism of exception handling in the back-end is |
| unwind-protect a la elisp. This is a general, robust, and language |
| independent representation for exceptions. |
| |
| The C++ front-end exceptions are mapping into the unwind-protect |
| semantics by the C++ front-end. The mapping is describe below. |
| |
| When -frtti is used, rtti is used to do exception object type checking, |
| when it isn't used, the encoded name for the type of the object being |
| thrown is used instead. All code that originates exceptions, even code |
| that throws exceptions as a side effect, like dynamic casting, and all |
| code that catches exceptions must be compiled with either -frtti, or |
| -fno-rtti. It is not possible to mix rtti base exception handling |
| objects with code that doesn't use rtti. The exceptions to this, are |
| code that doesn't catch or throw exceptions, catch (...), and code that |
| just rethrows an exception. |
| |
| Currently we use the normal mangling used in building functions names |
| (int's are "i", const char * is PCc) to build the non-rtti base type |
| descriptors for exception handling. These descriptors are just plain |
| NULL terminated strings, and internally they are passed around as char |
| *. |
| |
| In C++, all cleanups should be protected by exception regions. The |
| region starts just after the reason why the cleanup is created has |
| ended. For example, with an automatic variable, that has a constructor, |
| it would be right after the constructor is run. The region ends just |
| before the finalization is expanded. Since the backend may expand the |
| cleanup multiple times along different paths, once for normal end of the |
| region, once for non-local gotos, once for returns, etc, the backend |
| must take special care to protect the finalization expansion, if the |
| expansion is for any other reason than normal region end, and it is |
| `inline' (it is inside the exception region). The backend can either |
| choose to move them out of line, or it can created an exception region |
| over the finalization to protect it, and in the handler associated with |
| it, it would not run the finalization as it otherwise would have, but |
| rather just rethrow to the outer handler, careful to skip the normal |
| handler for the original region. |
| |
| In Ada, they will use the more runtime intensive approach of having |
| fewer regions, but at the cost of additional work at run time, to keep a |
| list of things that need cleanups. When a variable has finished |
| construction, they add the cleanup to the list, when the come to the end |
| of the lifetime of the variable, the run the list down. If the take a |
| hit before the section finishes normally, they examine the list for |
| actions to perform. I hope they add this logic into the back-end, as it |
| would be nice to get that alternative approach in C++. |
| |
| On an rs6000, xlC stores exception objects on that stack, under the try |
| block. When is unwinds down into a handler, the frame pointer is |
| adjusted back to the normal value for the frame in which the handler |
| resides, and the stack pointer is left unchanged from the time at which |
| the object was thrown. This is so that there is always someplace for |
| the exception object, and nothing can overwrite it, once we start |
| throwing. The only bad part, is that the stack remains large. |
| |
| The below points out some things that work in g++'s exception handling. |
| |
| All completely constructed temps and local variables are cleaned up in |
| all unwinded scopes. Completely constructed parts of partially |
| constructed objects are cleaned up. This includes partially built |
| arrays. Exception specifications are now handled. Thrown objects are |
| now cleaned up all the time. We can now tell if we have an active |
| exception being thrown or not (__eh_type != 0). We use this to call |
| terminate if someone does a throw; without there being an active |
| exception object. uncaught_exception () works. Exception handling |
| should work right if you optimize. Exception handling should work with |
| -fpic or -fPIC. |
| |
| The below points out some flaws in g++'s exception handling, as it now |
| stands. |
| |
| Only exact type matching or reference matching of throw types works when |
| -fno-rtti is used. Only works on a SPARC (like Suns) (both -mflat and |
| -mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000, |
| PowerPC, Alpha, mips, VAX, m68k and z8k machines. SPARC v9 may not |
| work. HPPA is mostly done, but throwing between a shared library and |
| user code doesn't yet work. Some targets have support for data-driven |
| unwinding. Partial support is in for all other machines, but a stack |
| unwinder called __unwind_function has to be written, and added to |
| libgcc2 for them. The new EH code doesn't rely upon the |
| __unwind_function for C++ code, instead it creates per function |
| unwinders right inside the function, unfortunately, on many platforms |
| the definition of RETURN_ADDR_RTX in the tm.h file for the machine port |
| is wrong. See below for details on __unwind_function. RTL_EXPRs for EH |
| cond variables for && and || exprs should probably be wrapped in |
| UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved. |
| |
| We only do pointer conversions on exception matching a la 15.3 p2 case |
| 3: `A handler with type T, const T, T&, or const T& is a match for a |
| throw-expression with an object of type E if [3]T is a pointer type and |
| E is a pointer type that can be converted to T by a standard pointer |
| conversion (_conv.ptr_) not involving conversions to pointers to private |
| or protected base classes.' when -frtti is given. |
| |
| We don't call delete on new expressions that die because the ctor threw |
| an exception. See except/18 for a test case. |
| |
| 15.2 para 13: The exception being handled should be rethrown if control |
| reaches the end of a handler of the function-try-block of a constructor |
| or destructor, right now, it is not. |
| |
| 15.2 para 12: If a return statement appears in a handler of |
| function-try-block of a constructor, the program is ill-formed, but this |
| isn't diagnosed. |
| |
| 15.2 para 11: If the handlers of a function-try-block contain a jump |
| into the body of a constructor or destructor, the program is ill-formed, |
| but this isn't diagnosed. |
| |
| 15.2 para 9: Check that the fully constructed base classes and members |
| of an object are destroyed before entering the handler of a |
| function-try-block of a constructor or destructor for that object. |
| |
| build_exception_variant should sort the incoming list, so that it |
| implements set compares, not exact list equality. Type smashing should |
| smash exception specifications using set union. |
| |
| Thrown objects are usually allocated on the heap, in the usual way. If |
| one runs out of heap space, throwing an object will probably never work. |
| This could be relaxed some by passing an __in_chrg parameter to track |
| who has control over the exception object. Thrown objects are not |
| allocated on the heap when they are pointer to object types. We should |
| extend it so that all small (<4*sizeof(void*)) objects are stored |
| directly, instead of allocated on the heap. |
| |
| When the backend returns a value, it can create new exception regions |
| that need protecting. The new region should rethrow the object in |
| context of the last associated cleanup that ran to completion. |
| |
| The structure of the code that is generated for C++ exception handling |
| code is shown below: |
| |
| @example |
| Ln: throw value; |
| copy value onto heap |
| jump throw (Ln, id, address of copy of value on heap) |
| |
| try @{ |
| +Lstart: the start of the main EH region |
| |... ... |
| +Lend: the end of the main EH region |
| @} catch (T o) @{ |
| ...1 |
| @} |
| Lresume: |
| nop used to make sure there is something before |
| the next region ends, if there is one |
| ... ... |
| |
| jump Ldone |
| [ |
| Lmainhandler: handler for the region Lstart-Lend |
| cleanup |
| ] zero or more, depending upon automatic vars with dtors |
| +Lpartial: |
| | jump Lover |
| +Lhere: |
| rethrow (Lhere, same id, same obj); |
| Lterm: handler for the region Lpartial-Lhere |
| call terminate |
| Lover: |
| [ |
| [ |
| call throw_type_match |
| if (eq) @{ |
| ] these lines disappear when there is no catch condition |
| +Lsregion2: |
| | ...1 |
| | jump Lresume |
| |Lhandler: handler for the region Lsregion2-Leregion2 |
| | rethrow (Lresume, same id, same obj); |
| +Leregion2 |
| @} |
| ] there are zero or more of these sections, depending upon how many |
| catch clauses there are |
| ----------------------------- expand_end_all_catch -------------------------- |
| here we have fallen off the end of all catch |
| clauses, so we rethrow to outer |
| rethrow (Lresume, same id, same obj); |
| ----------------------------- expand_end_all_catch -------------------------- |
| [ |
| L1: maybe throw routine |
| ] depending upon if we have expanded it or not |
| Ldone: |
| ret |
| |
| start_all_catch emits labels: Lresume, |
| |
| @end example |
| |
| The __unwind_function takes a pointer to the throw handler, and is |
| expected to pop the stack frame that was built to call it, as well as |
| the frame underneath and then jump to the throw handler. It must |
| restore all registers to their proper values as well as all other |
| machine state as determined by the context in which we are unwinding |
| into. The way I normally start is to compile: |
| |
| void *g; |
| foo(void* a) @{ g = a; @} |
| |
| with -S, and change the thing that alters the PC (return, or ret |
| usually) to not alter the PC, making sure to leave all other semantics |
| (like adjusting the stack pointer, or frame pointers) in. After that, |
| replicate the prologue once more at the end, again, changing the PC |
| altering instructions, and finally, at the very end, jump to `g'. |
| |
| It takes about a week to write this routine, if someone wants to |
| volunteer to write this routine for any architecture, exception support |
| for that architecture will be added to g++. Please send in those code |
| donations. One other thing that needs to be done, is to double check |
| that __builtin_return_address (0) works. |
| |
| @subsection Specific Targets |
| |
| For the alpha, the __unwind_function will be something resembling: |
| |
| @example |
| void |
| __unwind_function(void *ptr) |
| @{ |
| /* First frame */ |
| asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */ |
| asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ |
| |
| /* Second frame */ |
| asm ("ldq $15, 8($30)"); /* fp */ |
| asm ("bis $15, $15, $30"); /* reload sp with the fp we found */ |
| |
| /* Return */ |
| asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */ |
| @} |
| @end example |
| |
| @noindent |
| However, there are a few problems preventing it from working. First of |
| all, the gcc-internal function @code{__builtin_return_address} needs to |
| work given an argument of 0 for the alpha. As it stands as of August |
| 30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c} |
| will definitely not work on the alpha. Instead, we need to define |
| the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe), |
| @code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new |
| definition for @code{RETURN_ADDR_RTX}. |
| |
| In addition (and more importantly), we need a way to reliably find the |
| frame pointer on the alpha. The use of the value 8 above to restore the |
| frame pointer (register 15) is incorrect. On many systems, the frame |
| pointer is consistently offset to a specific point on the stack. On the |
| alpha, however, the frame pointer is pushed last. First the return |
| address is stored, then any other registers are saved (e.g., @code{s0}), |
| and finally the frame pointer is put in place. So @code{fp} could have |
| an offset of 8, but if the calling function saved any registers at all, |
| they add to the offset. |
| |
| The only places the frame size is noted are with the @samp{.frame} |
| directive, for use by the debugger and the OSF exception handling model |
| (useless to us), and in the initial computation of the new value for |
| @code{sp}, the stack pointer. For example, the function may start with: |
| |
| @example |
| lda $30,-32($30) |
| .frame $15,32,$26,0 |
| @end example |
| |
| @noindent |
| The 32 above is exactly the value we need. With this, we can be sure |
| that the frame pointer is stored 8 bytes less---in this case, at 24(sp)). |
| The drawback is that there is no way that I (Brendan) have found to let |
| us discover the size of a previous frame @emph{inside} the definition |
| of @code{__unwind_function}. |
| |
| So to accomplish exception handling support on the alpha, we need two |
| things: first, a way to figure out where the frame pointer was stored, |
| and second, a functional @code{__builtin_return_address} implementation |
| for except.c to be able to use it. |
| |
| Or just support DWARF 2 unwind info. |
| |
| @subsection New Backend Exception Support |
| |
| This subsection discusses various aspects of the design of the |
| data-driven model being implemented for the exception handling backend. |
| |
| The goal is to generate enough data during the compilation of user code, |
| such that we can dynamically unwind through functions at run time with a |
| single routine (@code{__throw}) that lives in libgcc.a, built by the |
| compiler, and dispatch into associated exception handlers. |
| |
| This information is generated by the DWARF 2 debugging backend, and |
| includes all of the information __throw needs to unwind an arbitrary |
| frame. It specifies where all of the saved registers and the return |
| address can be found at any point in the function. |
| |
| Major disadvantages when enabling exceptions are: |
| |
| @itemize @bullet |
| @item |
| Code that uses caller saved registers, can't, when flow can be |
| transferred into that code from an exception handler. In high performance |
| code this should not usually be true, so the effects should be minimal. |
| |
| @end itemize |
| |
| @subsection Backend Exception Support |
| |
| The backend must be extended to fully support exceptions. Right now |
| there are a few hooks into the alpha exception handling backend that |
| resides in the C++ frontend from that backend that allows exception |
| handling to work in g++. An exception region is a segment of generated |
| code that has a handler associated with it. The exception regions are |
| denoted in the generated code as address ranges denoted by a starting PC |
| value and an ending PC value of the region. Some of the limitations |
| with this scheme are: |
| |
| @itemize @bullet |
| @item |
| The backend replicates insns for such things as loop unrolling and |
| function inlining. Right now, there are no hooks into the frontend's |
| exception handling backend to handle the replication of insns. When |
| replication happens, a new exception region descriptor needs to be |
| generated for the new region. |
| |
| @item |
| The backend expects to be able to rearrange code, for things like jump |
| optimization. Any rearranging of the code needs have exception region |
| descriptors updated appropriately. |
| |
| @item |
| The backend can eliminate dead code. Any associated exception region |
| descriptor that refers to fully contained code that has been eliminated |
| should also be removed, although not doing this is harmless in terms of |
| semantics. |
| |
| @end itemize |
| |
| The above is not meant to be exhaustive, but does include all things I |
| have thought of so far. I am sure other limitations exist. |
| |
| Below are some notes on the migration of the exception handling code |
| backend from the C++ frontend to the backend. |
| |
| NOTEs are to be used to denote the start of an exception region, and the |
| end of the region. I presume that the interface used to generate these |
| notes in the backend would be two functions, start_exception_region and |
| end_exception_region (or something like that). The frontends are |
| required to call them in pairs. When marking the end of a region, an |
| argument can be passed to indicate the handler for the marked region. |
| This can be passed in many ways, currently a tree is used. Another |
| possibility would be insns for the handler, or a label that denotes a |
| handler. I have a feeling insns might be the best way to pass it. |
| Semantics are, if an exception is thrown inside the region, control is |
| transferred unconditionally to the handler. If control passes through |
| the handler, then the backend is to rethrow the exception, in the |
| context of the end of the original region. The handler is protected by |
| the conventional mechanisms; it is the frontend's responsibility to |
| protect the handler, if special semantics are required. |
| |
| This is a very low level view, and it would be nice is the backend |
| supported a somewhat higher level view in addition to this view. This |
| higher level could include source line number, name of the source file, |
| name of the language that threw the exception and possibly the name of |
| the exception. Kenner may want to rope you into doing more than just |
| the basics required by C++. You will have to resolve this. He may want |
| you to do support for non-local gotos, first scan for exception handler, |
| if none is found, allow the debugger to be entered, without any cleanups |
| being done. To do this, the backend would have to know the difference |
| between a cleanup-rethrower, and a real handler, if would also have to |
| have a way to know if a handler `matches' a thrown exception, and this |
| is frontend specific. |
| |
| The stack unwinder is one of the hardest parts to do. It is highly |
| machine dependent. The form that kenner seems to like was a couple of |
| macros, that would do the machine dependent grunt work. One preexisting |
| function that might be of some use is __builtin_return_address (). One |
| macro he seemed to want was __builtin_return_address, and the other |
| would do the hard work of fixing up the registers, adjusting the stack |
| pointer, frame pointer, arg pointer and so on. |
| |
| |
| @node Free Store, Mangling, Exception Handling, Top |
| @section Free Store |
| |
| @code{operator new []} adds a magic cookie to the beginning of arrays |
| for which the number of elements will be needed by @code{operator delete |
| []}. These are arrays of objects with destructors and arrays of objects |
| that define @code{operator delete []} with the optional size_t argument. |
| This cookie can be examined from a program as follows: |
| |
| @example |
| typedef unsigned long size_t; |
| extern "C" int printf (const char *, ...); |
| |
| size_t nelts (void *p) |
| @{ |
| struct cookie @{ |
| size_t nelts __attribute__ ((aligned (sizeof (double)))); |
| @}; |
| |
| cookie *cp = (cookie *)p; |
| --cp; |
| |
| return cp->nelts; |
| @} |
| |
| struct A @{ |
| ~A() @{ @} |
| @}; |
| |
| main() |
| @{ |
| A *ap = new A[3]; |
| printf ("%ld\n", nelts (ap)); |
| @} |
| @end example |
| |
| @section Linkage |
| The linkage code in g++ is horribly twisted in order to meet two design goals: |
| |
| 1) Avoid unnecessary emission of inlines and vtables. |
| |
| 2) Support pedantic assemblers like the one in AIX. |
| |
| To meet the first goal, we defer emission of inlines and vtables until |
| the end of the translation unit, where we can decide whether or not they |
| are needed, and how to emit them if they are. |
| |
| @node Mangling, Concept Index, Free Store, Top |
| @section Function name mangling for C++ and Java |
| |
| Both C++ and Jave provide overloaded function and methods, |
| which are methods with the same types but different parameter lists. |
| Selecting the correct version is done at compile time. |
| Though the overloaded functions have the same name in the source code, |
| they need to be translated into different assembler-level names, |
| since typical assemblers and linkers cannot handle overloading. |
| This process of encoding the parameter types with the method name |
| into a unique name is called @dfn{name mangling}. The inverse |
| process is called @dfn{demangling}. |
| |
| It is convenient that C++ and Java use compatible mangling schemes, |
| since the makes life easier for tools such as gdb, and it eases |
| integration between C++ and Java. |
| |
| Note there is also a standard "Jave Native Interface" (JNI) which |
| implements a different calling convention, and uses a different |
| mangling scheme. The JNI is a rather abstract ABI so Java can call methods |
| written in C or C++; |
| we are concerned here about a lower-level interface primarily |
| intended for methods written in Java, but that can also be used for C++ |
| (and less easily C). |
| |
| Note that on systems that follow BSD tradition, a C identifier @code{var} |
| would get "mangled" into the assembler name @samp{_var}. On such |
| systems, all other mangled names are also prefixed by a @samp{_} |
| which is not shown in the following examples. |
| |
| @subsection Method name mangling |
| |
| C++ mangles a method by emitting the function name, followed by @code{__}, |
| followed by encodings of any method qualifiers (such as @code{const}), |
| followed by the mangling of the method's class, |
| followed by the mangling of the parameters, in order. |
| |
| For example @code{Foo::bar(int, long) const} is mangled |
| as @samp{bar__C3Fooil}. |
| |
| For a constructor, the method name is left out. |
| That is @code{Foo::Foo(int, long) const} is mangled |
| as @samp{__C3Fooil}. |
| |
| GNU Java does the same. |
| |
| @subsection Primitive types |
| |
| The C++ types @code{int}, @code{long}, @code{short}, @code{char}, |
| and @code{long long} are mangled as @samp{i}, @samp{l}, |
| @samp{s}, @samp{c}, and @samp{x}, respectively. |
| The corresponding unsigned types have @samp{U} prefixed |
| to the mangling. The type @code{signed char} is mangled @samp{Sc}. |
| |
| The C++ and Java floating-point types @code{float} and @code{double} |
| are mangled as @samp{f} and @samp{d} respectively. |
| |
| The C++ @code{bool} type and the Java @code{boolean} type are |
| mangled as @samp{b}. |
| |
| The C++ @code{wchar_t} and the Java @code{char} types are |
| mangled as @samp{w}. |
| |
| The Java integral types @code{byte}, @code{short}, @code{int} |
| and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i}, |
| and @samp{x}, respectively. |
| |
| C++ code that has included @code{javatypes.h} will mangle |
| the typedefs @code{jbyte}, @code{jshort}, @code{jint} |
| and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i}, |
| and @samp{x}. (This has not been implemented yet.) |
| |
| @subsection Mangling of simple names |
| |
| A simple class, package, template, or namespace name is |
| encoded as the number of characters in the name, followed by |
| the actual characters. Thus the class @code{Foo} |
| is encoded as @samp{3Foo}. |
| |
| If any of the characters in the name are not alphanumeric |
| (i.e not one of the standard ASCII letters, digits, or '_'), |
| or the initial character is a digit, then the name is |
| mangled as a sequence of encoded Unicode letters. |
| A Unicode encoding starts with a @samp{U} to indicate |
| that Unicode escapes are used, followed by the number of |
| bytes used by the Unicode encoding, followed by the bytes |
| representing the encoding. ASSCI letters and |
| non-initial digits are encoded without change. However, all |
| other characters (including underscore and initial digits) are |
| translated into a sequence starting with an underscore, |
| followed by the big-endian 4-hex-digit lower-case encoding of the character. |
| |
| If a method name contains Unicode-escaped characters, the |
| entire mangled method name is followed by a @samp{U}. |
| |
| For example, the method @code{X\u0319::M\u002B(int)} is encoded as |
| @samp{M_002b__U6X_0319iU}. |
| |
| |
| @subsection Pointer and reference types |
| |
| A C++ pointer type is mangled as @samp{P} followed by the |
| mangling of the type pointed to. |
| |
| A C++ reference type as mangled as @samp{R} followed by the |
| mangling of the type referenced. |
| |
| A Java object reference type is equivalent |
| to a C++ pointer parameter, so we mangle such an parameter type |
| as @samp{P} followed by the mangling of the class name. |
| |
| @subsection Squangled type compression |
| |
| Squangling (enabled with the @samp{-fsquangle} option), utilizes the |
| @samp{B} code to indicate reuse of a previously seen type within an |
| indentifier. Types are recognized in a left to right manner and given |
| increasing values, which are appended to the code in the standard |
| manner. Ie, multiple digit numbers are delimited by @samp{_} |
| characters. A type is considered to be any non primitive type, |
| regardless of whether its a parameter, template parameter, or entire |
| template. Certain codes are considered modifiers of a type, and are not |
| included as part of the type. These are the @samp{C}, @samp{V}, |
| @samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting |
| constant, volatile, pointer, array, reference, unsigned, and restrict. |
| These codes may precede a @samp{B} type in order to make the required |
| modifications to the type. |
| |
| For example: |
| @example |
| template <class T> class class1 @{ @}; |
| |
| template <class T> class class2 @{ @}; |
| |
| class class3 @{ @}; |
| |
| int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @} |
| |
| B0 -> class2<class1<class3> |
| B1 -> class1<class3> |
| B2 -> class3 |
| @end example |
| Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}. |
| The int parameter is a basic type, and does not receive a B encoding... |
| |
| @subsection Qualified names |
| |
| Both C++ and Java allow a class to be lexically nested inside another |
| class. C++ also supports namespaces (not yet implemented by G++). |
| Java also supports packages. |
| |
| These are all mangled the same way: First the letter @samp{Q} |
| indicates that we are emitting a qualified name. |
| That is followed by the number of parts in the qualified name. |
| If that number is 9 or less, it is emitted with no delimiters. |
| Otherwise, an underscore is written before and after the count. |
| Then follows each part of the qualified name, as described above. |
| |
| For example @code{Foo::\u0319::Bar} is encoded as |
| @samp{Q33FooU5_03193Bar}. |
| |
| Squangling utilizes the the letter @samp{K} to indicate a |
| remembered portion of a qualified name. As qualified names are processed |
| for an identifier, the names are numbered and remembered in a |
| manner similar to the @samp{B} type compression code. |
| Names are recognized left to right, and given increasing values, which are |
| appended to the code in the standard manner. ie, multiple digit numbers |
| are delimited by @samp{_} characters. |
| |
| For example |
| @example |
| class Andrew |
| @{ |
| class WasHere |
| @{ |
| class AndHereToo |
| @{ |
| @}; |
| @}; |
| @}; |
| |
| f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @} |
| |
| K0 -> Andrew |
| K1 -> Andrew::WasHere |
| K2 -> Andrew::WasHere::AndHereToo |
| @end example |
| Function @samp{f()} would be mangled as : |
| @samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo} |
| |
| There are some occasions when either a @samp{B} or @samp{K} code could |
| be chosen, preference is always given to the @samp{B} code. Ie, the example |
| in the section on @samp{B} mangling could have used a @samp{K} code |
| instead of @samp{B2}. |
| |
| @subsection Templates |
| |
| A class template instantiation is encoded as the letter @samp{t}, |
| followed by the encoding of the template name, followed |
| the number of template parameters, followed by encoding of the template |
| parameters. If a template parameter is a type, it is written |
| as a @samp{Z} followed by the encoding of the type. |
| |
| A function template specialization (either an instantiation or an |
| explicit specialization) is encoded by an @samp{H} followed by the |
| encoding of the template parameters, as described above, followed by an |
| @samp{_}, the encoding of the argument types to the template function |
| (not the specialization), another @samp{_}, and the return type. (Like |
| the argument types, the return type is the return type of the function |
| template, not the specialization.) Template parameters in the argument |
| and return types are encoded by an @samp{X} for type parameters, or a |
| @samp{Y} for constant parameters, an index indicating their position |
| in the template parameter list declaration, and their template depth. |
| |
| @subsection Arrays |
| |
| C++ array types are mangled by emitting @samp{A}, followed by |
| the length of the array, followed by an @samp{_}, followed by |
| the mangling of the element type. Of course, normally |
| array parameter types decay into a pointer types, so you |
| don't see this. |
| |
| Java arrays are objects. A Java type @code{T[]} is mangled |
| as if it were the C++ type @code{JArray<T>}. |
| For example @code{java.lang.String[]} is encoded as |
| @samp{Pt6JArray1ZPQ34java4lang6String}. |
| |
| @subsection Static fields |
| |
| Both C++ and Java classes can have static fields. |
| These are allocated statically, and are shared among all instances. |
| |
| The mangling starts with a prefix (@samp{_} in most systems), which is |
| followed by the mangling |
| of the class name, followed by the "joiner" and finally the field name. |
| The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special |
| separator character. For historical reasons (and idiosyncracies |
| of assembler syntax) it can @samp{$} or @samp{.} (or even |
| @samp{_} on a few systems). If the joiner is @samp{_} then the prefix |
| is @samp{__static_} instead of just @samp{_}. |
| |
| For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax) |
| would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var} |
| (or rarely @samp{__static_Q23Foo3Bar_var}). |
| |
| If the name of a static variable needs Unicode escapes, |
| the Unicode indicator @samp{U} comes before the "joiner". |
| This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}. |
| |
| @subsection Table of demangling code characters |
| |
| The following special characters are used in mangling: |
| |
| @table @samp |
| @item A |
| Indicates a C++ array type. |
| |
| @item b |
| Encodes the C++ @code{bool} type, |
| and the Java @code{boolean} type. |
| |
| @item B |
| Used for squangling. Similar in concept to the 'T' non-squangled code. |
| |
| @item c |
| Encodes the C++ @code{char} type, and the Java @code{byte} type. |
| |
| @item C |
| A modifier to indicate a @code{const} type. |
| Also used to indicate a @code{const} member function |
| (in which cases it precedes the encoding of the method's class). |
| |
| @item d |
| Encodes the C++ and Java @code{double} types. |
| |
| @item e |
| Indicates extra unknown arguments @code{...}. |
| |
| @item E |
| Indicates the opening parenthesis of an expression. |
| |
| @item f |
| Encodes the C++ and Java @code{float} types. |
| |
| @item F |
| Used to indicate a function type. |
| |
| @item H |
| Used to indicate a template function. |
| |
| @item i |
| Encodes the C++ and Java @code{int} types. |
| |
| @item J |
| Indicates a complex type. |
| |
| @item K |
| Used by squangling to compress qualified names. |
| |
| @item l |
| Encodes the C++ @code{long} type. |
| |
| @item n |
| Immediate repeated type. Followed by the repeat count. |
| |
| @item N |
| Repeated type. Followed by the repeat count of the repeated type, |
| followed by the type index of the repeated type. Due to a bug in |
| g++ 2.7.2, this is only generated if index is 0. Superceded by |
| @samp{n} when squangling. |
| |
| @item P |
| Indicates a pointer type. Followed by the type pointed to. |
| |
| @item Q |
| Used to mangle qualified names, which arise from nested classes. |
| Also used for namespaces. |
| In Java used to mangle package-qualified names, and inner classes. |
| |
| @item r |
| Encodes the GNU C++ @code{long double} type. |
| |
| @item R |
| Indicates a reference type. Followed by the referenced type. |
| |
| @item s |
| Encodes the C++ and java @code{short} types. |
| |
| @item S |
| A modifier that indicates that the following integer type is signed. |
| Only used with @code{char}. |
| |
| Also used as a modifier to indicate a static member function. |
| |
| @item t |
| Indicates a template instantiation. |
| |
| @item T |
| A back reference to a previously seen type. |
| |
| @item U |
| A modifier that indicates that the following integer type is unsigned. |
| Also used to indicate that the following class or namespace name |
| is encoded using Unicode-mangling. |
| |
| @item u |
| The @code{restrict} type qualifier. |
| |
| @item v |
| Encodes the C++ and Java @code{void} types. |
| |
| @item V |
| A modifier for a @code{volatile} type or method. |
| |
| @item w |
| Encodes the C++ @code{wchar_t} type, and the Java @code{char} types. |
| |
| @item W |
| Indicates the closing parenthesis of an expression. |
| |
| @item x |
| Encodes the GNU C++ @code{long long} type, and the Java @code{long} type. |
| |
| @item X |
| Encodes a template type parameter, when part of a function type. |
| |
| @item Y |
| Encodes a template constant parameter, when part of a function type. |
| |
| @item Z |
| Used for template type parameters. |
| |
| @end table |
| |
| The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p} |
| also seem to be used for obscure purposes ... |
| |
| @node Concept Index, , Mangling, Top |
| |
| @section Concept Index |
| |
| @printindex cp |
| |
| @bye |