gnu/gcc/a36f0edbec6f2ef36792eda245fa7e512d032872 libgomp: let plugins handle allocating the target variable table
In my examination of BabelStream results on AMD GCN, I've found that,
for each BabelStream kernel execution, we spend significant time in
allocating and initializing memory in gomp_map_vars (~55µs, whereas the
actual BabelStream code executes in ~746µs, meaning we increase the time
BabelStream measures by 7% just on that).
Upon further examination, I've found that the only reason gomp_map_vars
decides to allocate and map any memory in the first place is because it
is constructing the table of pointers to variables on the target, which
I've taken to calling the "target variable table". Given that the GCN
plugin already must perform some memory allocation before starting up a
kernel, namely to allocate kernel arguments, it would be beneficial if
we could merge this allocation with the kernel arguments allocation.
In addition, since the kernel arguments live in host memory, populating
them can be performed using string functions, without any need to call
for expensive host2dev copies.
This patch introduces an opaque type for "offload sessions". This type
is defined by each plugin and allows it to store data related to a
single offload job. The sessions are allocated and managed by libgomp,
and initialized and utilized by the plugin. Their lifetime starts with
a call to GOMP_OFFLOAD_session_start, and ends with
GOMP_OFFLOAD_{openacc_{async_,}exec,{async_,}run}.
The patch then uses this framework to make management of the target
variable table more flexible: the plugin may elect to implement
GOMP_OFFLOAD_session_allocate_target_var_table, which allows the plugin
to attempt to allocate the target variable table in host memory.
If it fails, or if the plugin does not provide this function, libgomp
will perform this allocation as it does today - in target memory - and
tell the session about it using
GOMP_OFFLOAD_session_set_target_var_table.
In the case of AMD GCN, upon a call to
GOMP_OFFLOAD_session_allocate_target_var_table, the plugin will
immediately allocate kernel arguments with enough space for the target
variable table, no matter what size the plugin asks for[1], and return
that pointer to libgomp.
This results in the runtime of gomp_map_vars effectively disappearing
from traces.
[1] It may be beneficial to limit this, to some fixed amount, to make it
so that the future allocation cache has a higher cache hit rate. It
may also depend on whether hsa_memory_allocate for kernel arguments
takes runtime proportional to the number of bytes it needs to
allocate.
include/ChangeLog:
* gomp-constants.h (GOMP_VERSION): Bump. Signature of
GOMP_OFFLOAD_run et al changed.
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_run, GOMP_OFFLOAD_exec)
(GOMP_OFFLOAD_async_run, GOMP_OFFLOAD_openacc_async_exec): Pass
session in place of target variable table and devices.
(struct gomp_offload_session): New.
(GOMP_OFFLOAD_session_size): New
(GOMP_OFFLOAD_check_session_struct): New.
(GOMP_OFFLOAD_session_boilerplate): New.
(GOMP_OFFLOAD_session_start): New.
(GOMP_OFFLOAD_session_allocate_target_var_table): New.
(GOMP_OFFLOAD_session_set_target_var_table): New.
* libgomp.h (struct gomp_target_task): Add offload_session
field.
(struct gomp_device_descr): Add offload session management
functions.
(gomp_offload_session_new): New.
(goacc_map_vars): Add SESSION to signature
* oacc-host.c (struct gomp_offload_session): Define, for host
offload fallback case.
(host_session_size): New. Implements GOMP_OFFLOAD_session_size.
(host_session_start): New. Implements
GOMP_OFFLOAD_session_start.
(host_session_set_target_var_table): New. Implements
GOMP_OFFLOAD_session_set_target_var_table.
(host_run): Adjust to match GOMP_OFFLOAD_run.
(host_openacc_exec): Adjust to match GOMP_OFFLOAD_openacc_exec.
(host_openacc_async_exec): Adjust to match
GOMP_OFFLOAD_openacc_async_exec.
* oacc-mem.c (acc_map_data): Adjust call to goacc_map_vars.
(goacc_enter_datum): Ditto.
(goacc_enter_data_internal): Ditto.
* oacc-parallel.c (GOACC_parallel_keyed): Allocate and pass
offload session.
(GOACC_data_start): Adjust call to goacc_map_vars.
* plugin/plugin-gcn.c (struct kernel_dispatch): Remove
kernarg_cache_node.
(struct kernargs): Add a flexible array member for the target
variable table.
(struct kernel_launch): Store an offload session rather than
target var. table pointer.
(print_kernel_dispatch): Receive kernargs as parameter.
(struct gomp_offload_session): Define.
(init_session): New.
(GOMP_OFFLOAD_session_start): Implement, using init_session.
(release_session): New.
(alloc_kernargs_on_agent): Rename to...
(allocate_session_kernargs): ... this, store result in
passed-in SESSION, and allocate extra room for target variable
table (rounding it up to nearest multiple of 64 pointers).
(GOMP_OFFLOAD_session_allocate_target_var_table): Implement
using the previous function.
(GOMP_OFFLOAD_session_set_target_var_table): Ditto.
(create_kernel_dispatch): Remove kernarg allocation, instead
receiving it as an argument.
(release_kernel_dispatch): Receive kernargs as an argument,
don't release them.
(run_kernel): Adjust to use sessions.
(destroy_module): Ditto.
(GOMP_OFFLOAD_load_image): Ditto.
(execute_queue_entry): Adjust to match changed struct
kernel_launch.
(queue_push_launch): Ditto.
(gcn_exec): Receive and pass along session.
(GOMP_OFFLOAD_run): Ditto.
(GOMP_OFFLOAD_async_run): Ditto.
(GOMP_OFFLOAD_openacc_exec): Ditto.
(GOMP_OFFLOAD_openacc_async_exec): Ditto.
* plugin/plugin-nvptx.c (struct gomp_offload_session): Define.
(GOMP_OFFLOAD_session_start): Implement.
(GOMP_OFFLOAD_session_set_target_var_table): Implement.
(GOMP_OFFLOAD_openacc_exec): Adjust to receive session.
(GOMP_OFFLOAD_openacc_async_exec): Ditto.
(GOMP_OFFLOAD_run): Ditto.
* target.c (gomp_get_tvt_size): Extract helper from...
(gomp_map_vars_internal): ... here. Receive SESSION, iff doing
target offload. Use a target variable table on the host
allocated by GOMP_OFFLOAD_session_allocate_target_var_table if
possible, or call GOMP_OFFLOAD_session_set_target_var_table with
an allocated device pointer otherwise.
(gomp_map_vars): Update to pass along session.
(goacc_map_vars): Ditto.
(GOMP_target): Allocate and pass along session.
(GOMP_target_ext): Ditto.
(gomp_target_data_fallback): Adjust call to gomp_map_vars.
(GOMP_target_data): Ditto.
(GOMP_target_data_ext): Ditto.
(GOMP_target_enter_exit_data): Ditto.
(gomp_target_task_fn): Start and pass along session, the storage
for which is allocated by gomp_create_target_task.
(DLSYM2): Rename from DLSYM, adding a new parameter for the
variable to populate, akin to DLSYM_OPT.
(DLSYM): Delegate to DLSYM2.
(gomp_load_plugin_for_device): Populate session-related fields.
* task.c (gomp_create_target_task): Allocate enough storage for
an offload session.
* testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c: New test.
* testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c: New test.
12 files changed