| .. |
| Copyright 1988-2022 Free Software Foundation, Inc. |
| This is part of the GCC manual. |
| For copying conditions, see the copyright.rst file. |
| |
| .. index:: costs of instructions, relative costs, speed of instructions |
| |
| .. _costs: |
| |
| Describing Relative Costs of Operations |
| *************************************** |
| |
| These macros let you describe the relative speed of various operations |
| on the target machine. |
| |
| .. c:macro:: REGISTER_MOVE_COST (mode, from, to) |
| |
| A C expression for the cost of moving data of mode :samp:`{mode}` from a |
| register in class :samp:`{from}` to one in class :samp:`{to}`. The classes are |
| expressed using the enumeration values such as ``GENERAL_REGS``. A |
| value of 2 is the default; other values are interpreted relative to |
| that. |
| |
| It is not required that the cost always equal 2 when :samp:`{from}` is the |
| same as :samp:`{to}` ; on some machines it is expensive to move between |
| registers if they are not general registers. |
| |
| If reload sees an insn consisting of a single ``set`` between two |
| hard registers, and if ``REGISTER_MOVE_COST`` applied to their |
| classes returns a value of 2, reload does not check to ensure that the |
| constraints of the insn are met. Setting a cost of other than 2 will |
| allow reload to verify that the constraints are met. You should do this |
| if the :samp:`mov{m}` pattern's constraints do not allow such copying. |
| |
| These macros are obsolete, new ports should use the target hook |
| ``TARGET_REGISTER_MOVE_COST`` instead. |
| |
| .. function:: int TARGET_REGISTER_MOVE_COST (machine_mode mode, reg_class_t from, reg_class_t to) |
| |
| .. hook-start:TARGET_REGISTER_MOVE_COST |
| |
| This target hook should return the cost of moving data of mode :samp:`{mode}` |
| from a register in class :samp:`{from}` to one in class :samp:`{to}`. The classes |
| are expressed using the enumeration values such as ``GENERAL_REGS``. |
| A value of 2 is the default; other values are interpreted relative to |
| that. |
| |
| It is not required that the cost always equal 2 when :samp:`{from}` is the |
| same as :samp:`{to}` ; on some machines it is expensive to move between |
| registers if they are not general registers. |
| |
| If reload sees an insn consisting of a single ``set`` between two |
| hard registers, and if ``TARGET_REGISTER_MOVE_COST`` applied to their |
| classes returns a value of 2, reload does not check to ensure that the |
| constraints of the insn are met. Setting a cost of other than 2 will |
| allow reload to verify that the constraints are met. You should do this |
| if the :samp:`mov{m}` pattern's constraints do not allow such copying. |
| |
| The default version of this function returns 2. |
| |
| .. hook-end |
| |
| .. c:macro:: MEMORY_MOVE_COST (mode, class, in) |
| |
| A C expression for the cost of moving data of mode :samp:`{mode}` between a |
| register of class :samp:`{class}` and memory; :samp:`{in}` is zero if the value |
| is to be written to memory, nonzero if it is to be read in. This cost |
| is relative to those in ``REGISTER_MOVE_COST``. If moving between |
| registers and memory is more expensive than between two registers, you |
| should define this macro to express the relative cost. |
| |
| If you do not define this macro, GCC uses a default cost of 4 plus |
| the cost of copying via a secondary reload register, if one is |
| needed. If your machine requires a secondary reload register to copy |
| between memory and a register of :samp:`{class}` but the reload mechanism is |
| more complex than copying via an intermediate, define this macro to |
| reflect the actual cost of the move. |
| |
| GCC defines the function ``memory_move_secondary_cost`` if |
| secondary reloads are needed. It computes the costs due to copying via |
| a secondary register. If your machine copies from memory using a |
| secondary register in the conventional way but the default base value of |
| 4 is not correct for your machine, define this macro to add some other |
| value to the result of that function. The arguments to that function |
| are the same as to this macro. |
| |
| These macros are obsolete, new ports should use the target hook |
| ``TARGET_MEMORY_MOVE_COST`` instead. |
| |
| .. function:: int TARGET_MEMORY_MOVE_COST (machine_mode mode, reg_class_t rclass, bool in) |
| |
| .. hook-start:TARGET_MEMORY_MOVE_COST |
| |
| This target hook should return the cost of moving data of mode :samp:`{mode}` |
| between a register of class :samp:`{rclass}` and memory; :samp:`{in}` is ``false`` |
| if the value is to be written to memory, ``true`` if it is to be read in. |
| This cost is relative to those in ``TARGET_REGISTER_MOVE_COST``. |
| If moving between registers and memory is more expensive than between two |
| registers, you should add this target hook to express the relative cost. |
| |
| If you do not add this target hook, GCC uses a default cost of 4 plus |
| the cost of copying via a secondary reload register, if one is |
| needed. If your machine requires a secondary reload register to copy |
| between memory and a register of :samp:`{rclass}` but the reload mechanism is |
| more complex than copying via an intermediate, use this target hook to |
| reflect the actual cost of the move. |
| |
| GCC defines the function ``memory_move_secondary_cost`` if |
| secondary reloads are needed. It computes the costs due to copying via |
| a secondary register. If your machine copies from memory using a |
| secondary register in the conventional way but the default base value of |
| 4 is not correct for your machine, use this target hook to add some other |
| value to the result of that function. The arguments to that function |
| are the same as to this target hook. |
| |
| .. hook-end |
| |
| .. c:macro:: BRANCH_COST (speed_p, predictable_p) |
| |
| A C expression for the cost of a branch instruction. A value of 1 is |
| the default; other values are interpreted relative to that. Parameter |
| :samp:`{speed_p}` is true when the branch in question should be optimized |
| for speed. When it is false, ``BRANCH_COST`` should return a value |
| optimal for code size rather than performance. :samp:`{predictable_p}` is |
| true for well-predicted branches. On many architectures the |
| ``BRANCH_COST`` can be reduced then. |
| |
| Here are additional macros which do not specify precise relative costs, |
| but only that certain actions are more expensive than GCC would |
| ordinarily expect. |
| |
| .. c:macro:: SLOW_BYTE_ACCESS |
| |
| Define this macro as a C expression which is nonzero if accessing less |
| than a word of memory (i.e. a ``char`` or a ``short``) is no |
| faster than accessing a word of memory, i.e., if such access |
| require more than one instruction or if there is no difference in cost |
| between byte and (aligned) word loads. |
| |
| When this macro is not defined, the compiler will access a field by |
| finding the smallest containing object; when it is defined, a fullword |
| load will be used if alignment permits. Unless bytes accesses are |
| faster than word accesses, using word accesses is preferable since it |
| may eliminate subsequent memory access if subsequent accesses occur to |
| other fields in the same word of the structure, but to different bytes. |
| |
| .. function:: bool TARGET_SLOW_UNALIGNED_ACCESS (machine_mode mode, unsigned int align) |
| |
| .. hook-start:TARGET_SLOW_UNALIGNED_ACCESS |
| |
| This hook returns true if memory accesses described by the |
| :samp:`{mode}` and :samp:`{alignment}` parameters have a cost many times greater |
| than aligned accesses, for example if they are emulated in a trap handler. |
| This hook is invoked only for unaligned accesses, i.e. when |
| ``alignment < GET_MODE_ALIGNMENT (mode)``. |
| |
| When this hook returns true, the compiler will act as if |
| ``STRICT_ALIGNMENT`` were true when generating code for block |
| moves. This can cause significantly more instructions to be produced. |
| Therefore, do not make this hook return true if unaligned accesses only |
| add a cycle or two to the time for a memory access. |
| |
| The hook must return true whenever ``STRICT_ALIGNMENT`` is true. |
| The default implementation returns ``STRICT_ALIGNMENT``. |
| |
| .. hook-end |
| |
| .. c:macro:: MOVE_RATIO (speed) |
| |
| The threshold of number of scalar memory-to-memory move insns, *below* |
| which a sequence of insns should be generated instead of a |
| string move insn or a library call. Increasing the value will always |
| make code faster, but eventually incurs high cost in increased code size. |
| |
| Note that on machines where the corresponding move insn is a |
| ``define_expand`` that emits a sequence of insns, this macro counts |
| the number of such sequences. |
| |
| The parameter :samp:`{speed}` is true if the code is currently being |
| optimized for speed rather than size. |
| |
| If you don't define this, a reasonable default is used. |
| |
| .. function:: bool TARGET_USE_BY_PIECES_INFRASTRUCTURE_P (unsigned HOST_WIDE_INT size, unsigned int alignment, enum by_pieces_operation op, bool speed_p) |
| |
| .. hook-start:TARGET_USE_BY_PIECES_INFRASTRUCTURE_P |
| |
| GCC will attempt several strategies when asked to copy between |
| two areas of memory, or to set, clear or store to memory, for example |
| when copying a ``struct``. The ``by_pieces`` infrastructure |
| implements such memory operations as a sequence of load, store or move |
| insns. Alternate strategies are to expand the |
| ``cpymem`` or ``setmem`` optabs, to emit a library call, or to emit |
| unit-by-unit, loop-based operations. |
| |
| This target hook should return true if, for a memory operation with a |
| given :samp:`{size}` and :samp:`{alignment}`, using the ``by_pieces`` |
| infrastructure is expected to result in better code generation. |
| Both :samp:`{size}` and :samp:`{alignment}` are measured in terms of storage |
| units. |
| |
| The parameter :samp:`{op}` is one of: ``CLEAR_BY_PIECES``, |
| ``MOVE_BY_PIECES``, ``SET_BY_PIECES``, ``STORE_BY_PIECES`` or |
| ``COMPARE_BY_PIECES``. These describe the type of memory operation |
| under consideration. |
| |
| The parameter :samp:`{speed_p}` is true if the code is currently being |
| optimized for speed rather than size. |
| |
| Returning true for higher values of :samp:`{size}` can improve code generation |
| for speed if the target does not provide an implementation of the |
| ``cpymem`` or ``setmem`` standard names, if the ``cpymem`` or |
| ``setmem`` implementation would be more expensive than a sequence of |
| insns, or if the overhead of a library call would dominate that of |
| the body of the memory operation. |
| |
| Returning true for higher values of ``size`` may also cause an increase |
| in code size, for example where the number of insns emitted to perform a |
| move would be greater than that of a library call. |
| |
| .. hook-end |
| |
| .. function:: bool TARGET_OVERLAP_OP_BY_PIECES_P (void) |
| |
| .. hook-start:TARGET_OVERLAP_OP_BY_PIECES_P |
| |
| This target hook should return true if when the ``by_pieces`` |
| infrastructure is used, an offset adjusted unaligned memory operation |
| in the smallest integer mode for the last piece operation of a memory |
| region can be generated to avoid doing more than one smaller operations. |
| |
| .. hook-end |
| |
| .. function:: int TARGET_COMPARE_BY_PIECES_BRANCH_RATIO (machine_mode mode) |
| |
| .. hook-start:TARGET_COMPARE_BY_PIECES_BRANCH_RATIO |
| |
| When expanding a block comparison in MODE, gcc can try to reduce the |
| number of branches at the expense of more memory operations. This hook |
| allows the target to override the default choice. It should return the |
| factor by which branches should be reduced over the plain expansion with |
| one comparison per :samp:`{mode}` -sized piece. A port can also prevent a |
| particular mode from being used for block comparisons by returning a |
| negative number from this hook. |
| |
| .. hook-end |
| |
| .. c:macro:: MOVE_MAX_PIECES |
| |
| A C expression used by ``move_by_pieces`` to determine the largest unit |
| a load or store used to copy memory is. Defaults to ``MOVE_MAX``. |
| |
| .. c:macro:: STORE_MAX_PIECES |
| |
| A C expression used by ``store_by_pieces`` to determine the largest unit |
| a store used to memory is. Defaults to ``MOVE_MAX_PIECES``, or two times |
| the size of ``HOST_WIDE_INT``, whichever is smaller. |
| |
| .. c:macro:: COMPARE_MAX_PIECES |
| |
| A C expression used by ``compare_by_pieces`` to determine the largest unit |
| a load or store used to compare memory is. Defaults to |
| ``MOVE_MAX_PIECES``. |
| |
| .. c:macro:: CLEAR_RATIO (speed) |
| |
| The threshold of number of scalar move insns, *below* which a sequence |
| of insns should be generated to clear memory instead of a string clear insn |
| or a library call. Increasing the value will always make code faster, but |
| eventually incurs high cost in increased code size. |
| |
| The parameter :samp:`{speed}` is true if the code is currently being |
| optimized for speed rather than size. |
| |
| If you don't define this, a reasonable default is used. |
| |
| .. c:macro:: SET_RATIO (speed) |
| |
| The threshold of number of scalar move insns, *below* which a sequence |
| of insns should be generated to set memory to a constant value, instead of |
| a block set insn or a library call. |
| Increasing the value will always make code faster, but |
| eventually incurs high cost in increased code size. |
| |
| The parameter :samp:`{speed}` is true if the code is currently being |
| optimized for speed rather than size. |
| |
| If you don't define this, it defaults to the value of ``MOVE_RATIO``. |
| |
| .. c:macro:: USE_LOAD_POST_INCREMENT (mode) |
| |
| A C expression used to determine whether a load postincrement is a good |
| thing to use for a given mode. Defaults to the value of |
| ``HAVE_POST_INCREMENT``. |
| |
| .. c:macro:: USE_LOAD_POST_DECREMENT (mode) |
| |
| A C expression used to determine whether a load postdecrement is a good |
| thing to use for a given mode. Defaults to the value of |
| ``HAVE_POST_DECREMENT``. |
| |
| .. c:macro:: USE_LOAD_PRE_INCREMENT (mode) |
| |
| A C expression used to determine whether a load preincrement is a good |
| thing to use for a given mode. Defaults to the value of |
| ``HAVE_PRE_INCREMENT``. |
| |
| .. c:macro:: USE_LOAD_PRE_DECREMENT (mode) |
| |
| A C expression used to determine whether a load predecrement is a good |
| thing to use for a given mode. Defaults to the value of |
| ``HAVE_PRE_DECREMENT``. |
| |
| .. c:macro:: USE_STORE_POST_INCREMENT (mode) |
| |
| A C expression used to determine whether a store postincrement is a good |
| thing to use for a given mode. Defaults to the value of |
| ``HAVE_POST_INCREMENT``. |
| |
| .. c:macro:: USE_STORE_POST_DECREMENT (mode) |
| |
| A C expression used to determine whether a store postdecrement is a good |
| thing to use for a given mode. Defaults to the value of |
| ``HAVE_POST_DECREMENT``. |
| |
| .. c:macro:: USE_STORE_PRE_INCREMENT (mode) |
| |
| This macro is used to determine whether a store preincrement is a good |
| thing to use for a given mode. Defaults to the value of |
| ``HAVE_PRE_INCREMENT``. |
| |
| .. c:macro:: USE_STORE_PRE_DECREMENT (mode) |
| |
| This macro is used to determine whether a store predecrement is a good |
| thing to use for a given mode. Defaults to the value of |
| ``HAVE_PRE_DECREMENT``. |
| |
| .. c:macro:: NO_FUNCTION_CSE |
| |
| Define this macro to be true if it is as good or better to call a constant |
| function address than to call an address kept in a register. |
| |
| .. c:macro:: LOGICAL_OP_NON_SHORT_CIRCUIT |
| |
| Define this macro if a non-short-circuit operation produced by |
| :samp:`fold_range_test ()` is optimal. This macro defaults to true if |
| ``BRANCH_COST`` is greater than or equal to the value 2. |
| |
| .. function:: bool TARGET_OPTAB_SUPPORTED_P (int op, machine_mode mode1, machine_mode mode2, optimization_type opt_type) |
| |
| .. hook-start:TARGET_OPTAB_SUPPORTED_P |
| |
| Return true if the optimizers should use optab :samp:`{op}` with |
| modes :samp:`{mode1}` and :samp:`{mode2}` for optimization type :samp:`{opt_type}`. |
| The optab is known to have an associated :samp:`.md` instruction |
| whose C condition is true. :samp:`{mode2}` is only meaningful for conversion |
| optabs; for direct optabs it is a copy of :samp:`{mode1}`. |
| |
| For example, when called with :samp:`{op}` equal to ``rint_optab`` and |
| :samp:`{mode1}` equal to ``DFmode``, the hook should say whether the |
| optimizers should use optab ``rintdf2``. |
| |
| The default hook returns true for all inputs. |
| |
| .. hook-end |
| |
| .. function:: bool TARGET_RTX_COSTS (rtx x, machine_mode mode, int outer_code, int opno, int *total, bool speed) |
| |
| .. hook-start:TARGET_RTX_COSTS |
| |
| This target hook describes the relative costs of RTL expressions. |
| |
| The cost may depend on the precise form of the expression, which is |
| available for examination in :samp:`{x}`, and the fact that :samp:`{x}` appears |
| as operand :samp:`{opno}` of an expression with rtx code :samp:`{outer_code}`. |
| That is, the hook can assume that there is some rtx :samp:`{y}` such |
| that :samp:`GET_CODE ({y}) == {outer_code}` and such that |
| either (a) :samp:`XEXP ({y}, {opno}) == {x}` or |
| (b) :samp:`XVEC ({y}, {opno})` contains :samp:`{x}`. |
| |
| :samp:`{mode}` is :samp:`{x}` 's machine mode, or for cases like ``const_int`` that |
| do not have a mode, the mode in which :samp:`{x}` is used. |
| |
| In implementing this hook, you can use the construct |
| ``COSTS_N_INSNS (n)`` to specify a cost equal to :samp:`{n}` fast |
| instructions. |
| |
| On entry to the hook, ``*total`` contains a default estimate |
| for the cost of the expression. The hook should modify this value as |
| necessary. Traditionally, the default costs are ``COSTS_N_INSNS (5)`` |
| for multiplications, ``COSTS_N_INSNS (7)`` for division and modulus |
| operations, and ``COSTS_N_INSNS (1)`` for all other operations. |
| |
| When optimizing for code size, i.e. when ``speed`` is |
| false, this target hook should be used to estimate the relative |
| size cost of an expression, again relative to ``COSTS_N_INSNS``. |
| |
| The hook returns true when all subexpressions of :samp:`{x}` have been |
| processed, and false when ``rtx_cost`` should recurse. |
| |
| .. hook-end |
| |
| .. function:: int TARGET_ADDRESS_COST (rtx address, machine_mode mode, addr_space_t as, bool speed) |
| |
| .. hook-start:TARGET_ADDRESS_COST |
| |
| This hook computes the cost of an addressing mode that contains |
| :samp:`{address}`. If not defined, the cost is computed from |
| the :samp:`{address}` expression and the ``TARGET_RTX_COST`` hook. |
| |
| For most CISC machines, the default cost is a good approximation of the |
| true cost of the addressing mode. However, on RISC machines, all |
| instructions normally have the same length and execution time. Hence |
| all addresses will have equal costs. |
| |
| In cases where more than one form of an address is known, the form with |
| the lowest cost will be used. If multiple forms have the same, lowest, |
| cost, the one that is the most complex will be used. |
| |
| For example, suppose an address that is equal to the sum of a register |
| and a constant is used twice in the same basic block. When this macro |
| is not defined, the address will be computed in a register and memory |
| references will be indirect through that register. On machines where |
| the cost of the addressing mode containing the sum is no higher than |
| that of a simple indirect reference, this will produce an additional |
| instruction and possibly require an additional register. Proper |
| specification of this macro eliminates this overhead for such machines. |
| |
| This hook is never called with an invalid address. |
| |
| On machines where an address involving more than one register is as |
| cheap as an address computation involving only one register, defining |
| ``TARGET_ADDRESS_COST`` to reflect this can cause two registers to |
| be live over a region of code where only one would have been if |
| ``TARGET_ADDRESS_COST`` were not defined in that manner. This effect |
| should be considered in the definition of this macro. Equivalent costs |
| should probably only be given to addresses with different numbers of |
| registers on machines with lots of registers. |
| |
| .. hook-end |
| |
| .. function:: int TARGET_INSN_COST (rtx_insn *insn, bool speed) |
| |
| .. hook-start:TARGET_INSN_COST |
| |
| This target hook describes the relative costs of RTL instructions. |
| |
| In implementing this hook, you can use the construct |
| ``COSTS_N_INSNS (n)`` to specify a cost equal to :samp:`{n}` fast |
| instructions. |
| |
| When optimizing for code size, i.e. when ``speed`` is |
| false, this target hook should be used to estimate the relative |
| size cost of an expression, again relative to ``COSTS_N_INSNS``. |
| |
| .. hook-end |
| |
| .. function:: unsigned int TARGET_MAX_NOCE_IFCVT_SEQ_COST (edge e) |
| |
| .. hook-start:TARGET_MAX_NOCE_IFCVT_SEQ_COST |
| |
| This hook returns a value in the same units as ``TARGET_RTX_COSTS``, |
| giving the maximum acceptable cost for a sequence generated by the RTL |
| if-conversion pass when conditional execution is not available. |
| The RTL if-conversion pass attempts to convert conditional operations |
| that would require a branch to a series of unconditional operations and |
| ``movmodecc`` insns. This hook returns the maximum cost of the |
| unconditional instructions and the ``movmodecc`` insns. |
| RTL if-conversion is cancelled if the cost of the converted sequence |
| is greater than the value returned by this hook. |
| |
| ``e`` is the edge between the basic block containing the conditional |
| branch to the basic block which would be executed if the condition |
| were true. |
| |
| The default implementation of this hook uses the |
| ``max-rtl-if-conversion-[un]predictable`` parameters if they are set, |
| and uses a multiple of ``BRANCH_COST`` otherwise. |
| |
| .. hook-end |
| |
| .. function:: bool TARGET_NOCE_CONVERSION_PROFITABLE_P (rtx_insn *seq, struct noce_if_info *if_info) |
| |
| .. hook-start:TARGET_NOCE_CONVERSION_PROFITABLE_P |
| |
| This hook returns true if the instruction sequence ``seq`` is a good |
| candidate as a replacement for the if-convertible sequence described in |
| ``if_info``. |
| |
| .. hook-end |
| |
| .. function:: bool TARGET_NEW_ADDRESS_PROFITABLE_P (rtx memref, rtx_insn * insn, rtx new_addr) |
| |
| .. hook-start:TARGET_NEW_ADDRESS_PROFITABLE_P |
| |
| Return ``true`` if it is profitable to replace the address in |
| :samp:`{memref}` with :samp:`{new_addr}`. This allows targets to prevent the |
| scheduler from undoing address optimizations. The instruction containing the |
| memref is :samp:`{insn}`. The default implementation returns ``true``. |
| |
| .. hook-end |
| |
| .. function:: bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void) |
| |
| .. hook-start:TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P |
| |
| This predicate controls the use of the eager delay slot filler to disallow |
| speculatively executed instructions being placed in delay slots. Targets |
| such as certain MIPS architectures possess both branches with and without |
| delay slots. As the eager delay slot filler can decrease performance, |
| disabling it is beneficial when ordinary branches are available. Use of |
| delay slot branches filled using the basic filler is often still desirable |
| as the delay slot can hide a pipeline bubble. |
| |
| .. hook-end |
| |
| .. function:: HOST_WIDE_INT TARGET_ESTIMATED_POLY_VALUE (poly_int64 val, poly_value_estimate_kind kind) |
| |
| .. hook-start:TARGET_ESTIMATED_POLY_VALUE |
| |
| Return an estimate of the runtime value of :samp:`{val}`, for use in |
| things like cost calculations or profiling frequencies. :samp:`{kind}` is used |
| to ask for the minimum, maximum, and likely estimates of the value through |
| the ``POLY_VALUE_MIN``, ``POLY_VALUE_MAX`` and |
| ``POLY_VALUE_LIKELY`` values. The default |
| implementation returns the lowest possible value of :samp:`{val}`. |
| |
| .. hook-end |