blob: 2c554dd979fa24723ecde50047c03c79488a081d [file] [log] [blame]
------------------------------------------------------------------------------
-- --
-- GNAT COMPILER COMPONENTS --
-- --
-- E X P _ U N S T --
-- --
-- S p e c --
-- --
-- Copyright (C) 2014-2015, Free Software Foundation, Inc. --
-- --
-- GNAT is free software; you can redistribute it and/or modify it under --
-- terms of the GNU General Public License as published by the Free Soft- --
-- ware Foundation; either version 3, or (at your option) any later ver- --
-- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
-- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
-- or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License --
-- for more details. You should have received a copy of the GNU General --
-- Public License distributed with GNAT; see file COPYING3. If not, go to --
-- http://www.gnu.org/licenses for a complete copy of the license. --
-- --
-- GNAT was originally developed by the GNAT team at New York University. --
-- Extensive contributions were provided by Ada Core Technologies Inc. --
-- --
------------------------------------------------------------------------------
-- Expand routines for unnesting subprograms
with Types; use Types;
package Exp_Unst is
-- -----------------
-- -- The Problem --
-- -----------------
-- Normally, nested subprograms in the source result in corresponding
-- nested subprograms in the resulting tree. We then expect the back end
-- to handle such nested subprograms, including all cases of uplevel
-- references. For example, the GCC back end can do this relatively easily
-- since GNU C (as an extension) allows nested functions with uplevel
-- references, and implements an appropriate static chain approach to
-- dealing with such uplevel references.
-- However, we also want to be able to interface with back ends that do
-- not easily handle such uplevel references. One example is the back end
-- that translates the tree into standard C source code. In the future,
-- other back ends might need the same capability (e.g. a back end that
-- generated LLVM intermediate code).
-- We could imagine simply handling such references in the appropriate
-- back end. For example the back end that generates C could recognize
-- nested subprograms and rig up some way of translating them, e.g. by
-- making a static-link source level visible.
-- Rather than take that approach, we prefer to do a semantics-preserving
-- transformation on the GNAT tree, that eliminates the problem before we
-- hand the tree over to the back end. There are two reasons for preferring
-- this approach:
-- First: the work needs only to be done once for all affected back ends
-- and we can remain within the semantics of the tree. The front end is
-- full of tree transformations, so we have all the infrastructure for
-- doing transformations of this type.
-- Second: given that the transformation will be semantics-preserving,
-- we can still used the standard GCC back end to build code from it.
-- This means we can easily run our full test suite to verify that the
-- transformations are indeed semantics preserving. It is a lot more
-- work to thoroughly test the output of specialized back ends.
-- Looking at the problem, we have three situations to deal with. Note
-- that in these examples, we use all lower case, since that is the way
-- the internal tree is cased.
-- First, cases where there are no uplevel references, for example
-- procedure case1 is
-- function max (m, n : Integer) return integer is
-- begin
-- return integer'max (m, n);
-- end max;
-- ...
-- end case1;
-- Second, cases where there are explicit uplevel references.
-- procedure case2 (b : integer) is
-- procedure Inner (bb : integer);
--
-- procedure inner2 is
-- begin
-- inner(5);
-- end;
--
-- x : integer := 77;
-- y : constant integer := 15 * 16;
-- rv : integer := 10;
--
-- procedure inner (bb : integer) is
-- begin
-- x := rv + y + bb + b;
-- end;
--
-- begin
-- inner2;
-- end case2;
-- In this second example, B, X, RV are uplevel referenced. Y is not
-- considered as an uplevel reference since it is a static constant
-- where references are replaced by the value at compile time.
-- Third, cases where there are implicit uplevel references via types
-- whose bounds depend on locally declared constants or variables:
-- function case3 (x, y : integer) return boolean is
-- subtype dynam is integer range x .. y + 3;
-- subtype static is integer range 42 .. 73;
-- xx : dynam := y;
--
-- type darr is array (dynam) of Integer;
-- type darec is record
-- A : darr;
-- B : integer;
-- end record;
-- darecv : darec;
--
-- function inner (b : integer) return boolean is
-- begin
-- return b in dynam and then darecv.b in static;
-- end inner;
--
-- begin
-- return inner (42) and then inner (xx * 3 - y * 2);
-- end case3;
--
-- In this third example, the membership test implicitly references the
-- the bounds of Dynam, which both involve uplevel references.
-- ------------------
-- -- The Solution --
-- ------------------
-- Looking at the three cases above, the first case poses no problem at
-- all. Indeed the subprogram could have been declared at the outer level
-- (perhaps changing the name). But this style is quite common as a way
-- of limiting the scope of a local procedure called only within the outer
-- procedure. We could move it to the outer level (with a name change if
-- needed), but we don't bother. We leave it nested, and the back end just
-- translates it as though it were not nested.
-- In general we leave nested procedures nested, rather than trying to move
-- them to the outer level (the back end may do that, e.g. as part of the
-- translation to C, but we don't do it in the tree itself). This saves a
-- LOT of trouble in terms of visibility and semantics.
-- But of course we have to deal with the uplevel references. The idea is
-- to rewrite these nested subprograms so that they no longer have any such
-- uplevel references, so by the time they reach the back end, they all are
-- case 1 (no uplevel references) and thus easily handled.
-- To deal with explicit uplevel references (case 2 above), we proceed with
-- the following steps:
-- All entities marked as being uplevel referenced are marked as aliased
-- since they will be accessed indirectly via an activation record as
-- described below.
-- An activation record is created containing system address values
-- for each uplevel referenced entity in a given scope. In the example
-- given before, we would have:
-- type AREC1T is record
-- b : Address;
-- x : Address;
-- rv : Address;
-- end record;
-- AREC1 : aliased AREC1T;
-- type AREC1PT is access all AREC1T;
-- AREC1P : constant AREC1PT := AREC1'Access;
-- The fields of AREC1 are set at the point the corresponding entity
-- is declared (immediately for parameters).
-- Note: the 1 in all these names represents the fact that we are at the
-- outer level of nesting. As we will see later, deeper levels of nesting
-- will use AREC2, AREC3, ...
-- For all subprograms nested immediately within the corresponding scope,
-- a parameter AREC1F is passed, and all calls to these routines have
-- AREC1P added as an additional formal.
-- Now within the nested procedures, any reference to an uplevel entity
-- xxx is replaced by Tnn!(AREC1.xxx).all (where ! represents a call
-- to unchecked conversion to convert the address to the access type
-- and Tnn is a locally declared type that is "access all t", where t
-- is the type of the reference).
-- Note: the reason that we use Address as the component type in the
-- declaration of AREC1T is that we may create this type before we see
-- the declaration of this type.
-- The following shows example 2 above after this translation:
-- procedure case2x (b : aliased Integer) is
-- type AREC1T is record
-- b : Address;
-- x : Address;
-- rv : Address;
-- end record;
--
-- AREC1 : aliased AREC1T;
-- type AREC1PT is access all AREC1T;
-- AREC1P : constant AREC1PT := AREC1'Access;
--
-- AREC1.b := b'Address;
--
-- procedure inner (bb : integer; AREC1F : AREC1PT);
--
-- procedure inner2 (AREC1F : AREC1PT) is
-- begin
-- inner(5, AREC1F);
-- end;
--
-- x : aliased integer := 77;
-- AREC1.x := X'Address;
--
-- y : constant Integer := 15 * 16;
--
-- rv : aliased Integer;
-- AREC1.rv := rv'Address;
--
-- procedure inner (bb : integer; AREC1F : AREC1PT) is
-- begin
-- type Tnn1 is access all Integer;
-- type Tnn2 is access all Integer;
-- type Tnn3 is access all Integer;
-- Tnn1!(AREC1F.x).all :=
-- Tnn2!(AREC1F.rv).all + y + b + Tnn3!(AREC1F.b).all;
-- end;
--
-- begin
-- inner2 (AREC1P);
-- end case2x;
-- And now the inner procedures INNER2 and INNER have no uplevel references
-- so they have been reduced to case 1, which is the case easily handled by
-- the back end. Note that the generated code is not strictly legal Ada
-- because of the assignments to AREC1 in the declarative sequence, but the
-- GNAT tree always allows such mixing of declarations and statements, so
-- the back end must be prepared to handle this in any case.
-- Case 3 where we have uplevel references to types is a bit more complex.
-- That would especially be the case if we did a full transformation that
-- completely eliminated such uplevel references as we did for case 2. But
-- instead of trying to do that, we rewrite the subprogram so that the code
-- generator can easily detect and deal with these uplevel type references.
-- First we distinguish two cases
-- Static types are one of the two following cases:
-- Discrete types whose bounds are known at compile time. This is not
-- quite the same as what is tested by Is_OK_Static_Subtype, in that
-- it allows compile time known values that are not static expressions.
-- Composite types, whose components are (recursively) static types.
-- Dynamic types are one of the two following cases:
-- Discrete types with at least one bound not known at compile time.
-- Composite types with at least one component that is (recursively)
-- a dynamic type.
-- Uplevel references to static types are not a problem, the front end
-- or the code generator fetches the bounds as required, and since they
-- are compile time known values, this value can just be extracted and
-- no actual uplevel reference is required.
-- Uplevel references to dynamic types are a potential problem, since
-- such references may involve an implicit access to a dynamic bound,
-- and this reference is an implicit uplevel access.
-- To fully unnest such references would be messy, since we would have
-- to create local copies of the dynamic types involved, so that the
-- front end or code generator could generate an explicit uplevel
-- reference to the bound involved. Rather than do that, we set things
-- up so that this situation can be easily detected and dealt with when
-- there is an implicit reference to the bounds.
-- What we do is to always generate a local constant for any dynamic
-- bound in a dynamic subtype xx with name xx_FIRST or xx_LAST. The one
-- case where we can skip this is where the bound is For
-- example in the third example above, subtype dynam is expanded as
-- dynam_LAST : constant Integer := y + 3;
-- subtype dynam is integer range x .. dynam_LAST;
-- Now if type dynam is uplevel referenced (as it is this case), then
-- the bounds x and dynam_LAST are marked as uplevel references
-- so that appropriate entries are made in the activation record. Any
-- explicit reference to such a bound in the front end generated code
-- will be handled by the normal uplevel reference mechanism which we
-- described above for case 2. For implicit references by a back end
-- that needs to unnest things, any such implicit reference to one of
-- these bounds can be replaced by an appropriate reference to the entry
-- in the activation record for xx_FIRST or xx_LAST. Thus the back end
-- can eliminate the problematical uplevel reference without the need to
-- do the heavy tree modification to do that at the code expansion level
-- Looking at case 3 again, here is the normal -gnatG expanded code
-- function case3 (x : integer; y : integer) return boolean is
-- dynam_LAST : constant integer := y {+} 3;
-- subtype dynam is integer range x .. dynam_LAST;
-- subtype static is integer range 42 .. 73;
--
-- [constraint_error when
-- not (y in x .. dynam_LAST)
-- "range check failed"]
--
-- xx : dynam := y;
--
-- type darr is array (x .. dynam_LAST) of integer;
-- type darec is record
-- a : darr;
-- b : integer;
-- end record;
-- [type TdarrB is array (x .. dynam_LAST range <>) of integer]
-- freeze TdarrB []
-- darecv : darec;
--
-- function inner (b : integer) return boolean is
-- begin
-- return b in x .. dynam_LAST and then darecv.b in 42 .. 73;
-- end inner;
-- begin
-- return inner (42) and then inner (xx {*} 3 {-} y {*} 2);
-- end case3;
-- Note: the actual expanded code has fully qualified names so for
-- example function inner is actually function case3__inner. For now
-- we ignore that detail to clarify the examples.
-- Here we see that some of the bounds references are expanded by the
-- front end, so that we get explicit references to y or dynamLast. These
-- cases are handled by the normal uplevel reference mechanism described
-- above for case 2. This is the case for the constraint check for the
-- initialization of xx, and the range check in function inner.
-- But the reference darecv.b in the return statement of function
-- inner has an implicit reference to the bounds of dynam, since to
-- compute the location of b in the record, we need the length of a.
-- Here is the full translation of the third example:
-- function case3x (x, y : integer) return boolean is
-- type AREC1T is record
-- x : Address;
-- dynam_LAST : Address;
-- end record;
--
-- AREC1 : aliased AREC1T;
-- type AREC1PT is access all AREC1T;
-- AREC1P : constant AREC1PT := AREC1'Access;
--
-- AREC1.x := x'Address;
--
-- dynam_LAST : constant integer := y {+} 3;
-- AREC1.dynam_LAST := dynam_LAST'Address;
-- subtype dynam is integer range x .. dynam_LAST;
-- xx : dynam := y;
--
-- [constraint_error when
-- not (y in x .. dynam_LAST)
-- "range check failed"]
--
-- subtype static is integer range 42 .. 73;
--
-- type darr is array (x .. dynam_LAST) of Integer;
-- type darec is record
-- A : darr;
-- B : integer;
-- end record;
-- darecv : darec;
--
-- function inner (b : integer; AREC1F : AREC1PT) return boolean is
-- begin
-- type Tnn is access all Integer
-- return b in x .. Tnn!(AREC1F.dynam_LAST).all
-- and then darecv.b in 42 .. 73;
-- end inner;
--
-- begin
-- return inner (42, AREC1P) and then inner (xx * 3, AREC1P);
-- end case3x;
-- And now the back end when it processes darecv.b will access the bounds
-- of darecv.a by referencing the d and dynam_LAST fields of AREC1P.
-----------------------------
-- Multiple Nesting Levels --
-----------------------------
-- In our examples so far, we have only nested to a single level, but the
-- scheme generalizes to multiple levels of nesting and in this section we
-- discuss how this generalization works.
-- Consider this example with two nesting levels
-- To deal with elimination of uplevel references, we follow the same basic
-- approach described above for case 2, except that we need an activation
-- record at each nested level. Basically the rule is that any procedure
-- that has nested procedures needs an activation record. When we do this,
-- the inner activation records have a pointer (uplink) to the immediately
-- enclosing activation record, the normal arrangement of static links. The
-- following shows the full translation of this fourth case.
-- function case4x (x : integer) return integer is
-- type AREC1T is record
-- v1 : Address;
-- end record;
--
-- AREC1 : aliased AREC1T;
-- type AREC1PT is access all AREC1T;
-- AREC1P : constant AREC1PT := AREC1'Access;
--
-- v1 : integer := x;
-- AREC1.v1 := v1'Address;
--
-- function inner1 (y : integer; AREC1F : AREC1PT) return integer is
-- type AREC2T is record
-- AREC1U : AREC1PT := AREC1F;
-- v2 : Address;
-- end record;
--
-- AREC2 : aliased AREC2T;
-- type AREC2PT is access all AREC2T;
-- AREC2P : constant AREC2PT := AREC2'Access;
--
-- type Tnn1 is access all Integer;
-- v2 : integer := Tnn1!(AREC1F.v1).all {+} 1;
-- AREC2.v2 := v2'Address;
--
-- function inner2
-- (z : integer; AREC2F : AREC2PT) return integer
-- is
-- begin
-- type Tnn1 is access all Integer;
-- type Tnn2 is access all Integer;
-- return integer(z {+}
-- Tnn1!(AREC2F.AREC1U.v1).all {+}
-- Tnn2!(AREC2F.v2).all);
-- end inner2;
-- begin
-- type Tnn is access all Integer;
-- return integer(y {+} inner2 (Tnn!(AREC1F.v1).all, AREC2P));
-- end inner1;
-- begin
-- return inner1 (x, AREC1P);
-- end case4x;
-- As can be seen in this example, the level number following AREC in the
-- names avoids any confusion between AREC names at different levels.
-------------------------
-- Name Disambiguation --
-------------------------
-- As described above, the translation scheme would raise issues when the
-- code generator did the actual unnesting if identically named nested
-- subprograms exist. Similarly overloading would cause a naming issue.
-- In fact, the expanded code includes qualified names which eliminate this
-- problem. We omitted the qualification from the exapnded examples above
-- for simplicity. But to see this in action, consider this example:
-- function Mnames return Boolean is
-- procedure Inner is
-- procedure Inner is
-- begin
-- null;
-- end;
-- begin
-- Inner;
-- end;
-- function F (A : Boolean) return Boolean is
-- begin
-- return not A;
-- end;
-- function F (A : Integer) return Boolean is
-- begin
-- return A > 42;
-- end;
-- begin
-- Inner;
-- return F (42) or F (True);
-- end;
-- The expanded code actually looks like:
-- function mnames return boolean is
-- procedure mnames__inner is
-- procedure mnames__inner__inner is
-- begin
-- null;
-- return;
-- end mnames__inner__inner;
-- begin
-- mnames__inner__inner;
-- return;
-- end mnames__inner;
-- function mnames__f (a : boolean) return boolean is
-- begin
-- return not a;
-- end mnames__f;
-- function mnames__f__2 (a : integer) return boolean is
-- begin
-- return a > 42;
-- end mnames__f__2;
-- begin
-- mnames__inner;
-- return mnames__f__2 (42) or mnames__f (true);
-- end mnames;
-- As can be seen from studying this example, the qualification deals both
-- with the issue of clashing names (mnames__inner, mnames__inner__inner),
-- and with overloading (mnames__f, mnames__f__2).
-----------------
-- Subprograms --
-----------------
procedure Check_Uplevel_Reference_To_Type (Typ : Entity_Id);
-- This procedure is called if Sem_Util.Check_Nested_Access detects an
-- uplevel reference to a type or subtype entity Typ. On return there are
-- two cases, if Typ is a static type (defined as a discrete type with
-- static bounds, or a record all of whose components are of a static type,
-- or an array whose index and component types are all static types), then
-- the flag Is_Static_Type (Typ) will be set True, and in this case the
-- flag Has_Uplevel_Reference is not set since we don't need to worry about
-- uplevel references to static types. If on the other hand Typ is not a
-- static type, then the flag Has_Uplevel_Reference will be set, and any
-- non-static bounds referenced by the type will also be marked as having
-- uplevel references (by setting Has_Uplevel_Reference for these bounds).
procedure Note_Uplevel_Reference (N : Node_Id; Subp : Entity_Id);
-- Called in Unnest_Subprogram_Mode when we detect an explicit uplevel
-- reference (node N) to an enclosing subprogram Subp.
procedure Unnest_Subprogram (Subp : Entity_Id; Subp_Body : Node_Id);
-- Subp is a library level subprogram which has nested subprograms, and
-- Subp_Body is the corresponding N_Subprogram_Body node. This procedure
-- declares the AREC types and objects, adds assignments to the AREC record
-- as required, defines the xxxPTR types for uplevel referenced objects,
-- adds the ARECP parameter to all nested subprograms which need it, and
-- modifies all uplevel references appropriately.
end Exp_Unst;