ipa-cp: Fix updating of profile counts and self-gen value evaluation
IPA-CP does not do a reasonable job when it is updating profile counts
after it has created clones of recursive functions. This patch
addresses that by:
1. Only updating counts for special-context clones. When a clone is
created for all contexts, the original is going to be dead and the
cgraph machinery has copied counts to the new node which is the right
thing to do. Therefore updating counts has been moved from
create_specialized_node to decide_about_value and
2. The current profile updating code artificially increased the assumed
old count when the sum of counts of incoming edges to both the
original and new node were bigger than the count of the original
node. This always happened when self-recursive edge from the clone
was also redirected to the clone because both the original edge and
its clone had original high counts. This clutch was removed and
replaced by the next point.
3. When cloning also redirects a self-recursive clone to the clone
itself, new logic has been added to divide the counts brought by such
recursive edges between the original node and the clone. This is
impossible to do well without special knowledge about the function and
which non-recursive entry calls are responsible for what portion of
recursion depth, so the approach taken is rather crude.
For local nodes, we detect the case when the original node is never
called (in the training run at least) with another value and if so,
steal all its counts like if it was dead. If that is not the case, we
try to divide the count brought by recursive edges (or rather not
brought by direct edges) proportionally to the counts brought by
non-recursive edges - but with artificial limits in place so that we
do not take too many or too few, because that was happening with
detrimental effect in mcf_r.
4. When cloning creates extra clones for values brought by a formerly
self-recursive edge with an arithmetic pass-through jump function on
it, such as it does in exchange2_r, all such clones are processed at
once rather than one after another. The counts of all such nodes are
distributed evenly (modulo even-formerly-non-recursive-edges) and the
whole situation is then fixed up so that the edge counts fit. This is
what new function update_counts_for_self_gen_clones does.
5. When values brought by a formerly self-recursive edge with an
arithmetic pass-through jump function on it are evaluated by
heuristics which assumes vast majority of node counts are result of
recursive calls and so we simply divide those with the number of
clones there would be if we created another one.
6. The mechanisms in init_caller_stats and gather_caller_stats and
get_info_about_necessary_edges was enhanced to gather data required
for the above and a missing check not to count dead incoming edges was
2021-10-15 Martin Jambor <email@example.com>
* ipa-cp.c (struct caller_statistics): New fields rec_count_sum,
n_nonrec_calls and itself, document all fields.
(init_caller_stats): Initialize the above new fields.
(gather_caller_stats): Gather self-recursive counts and calls number.
(get_info_about_necessary_edges): Gather counts of self-recursive and
other edges bringing in the requested value separately.
(dump_profile_updates): Rework to dump info about a single node only.
(lenient_count_portion_handling): New function.
(struct gather_other_count_struct): New type.
(gather_count_of_non_rec_edges): New function.
(struct desc_incoming_count_struct): New type.
(analyze_clone_icoming_counts): New function.
(update_specialized_profile): Adjust call to dump_profile_updates.
(create_specialized_node): Do not update profiling info.
(decide_about_value): New parameter self_gen_clones, either push new
clones into it or updat their profile counts. For self-recursively
generated values, use a portion of the node count instead of count
from self-recursive edges to estimate goodness.
(decide_whether_version_node): Gather clones for self-generated values
in a new vector, update their profiles at once at the end.
1 file changed