)]}'
{
  "commit": "a36f0edbec6f2ef36792eda245fa7e512d032872",
  "tree": "2cf82354b2f0ef9e457cca04c9d25adbb3a91410",
  "parents": [
    "c04011b3eac8475debd8c0add7a881becb598377"
  ],
  "author": {
    "name": "Arsen Arsenović",
    "email": "aarsenovic@baylibre.com",
    "time": "Thu Apr 23 10:47:38 2026 +0000"
  },
  "committer": {
    "name": "Arsen Arsenović",
    "email": "arsen@gcc.gnu.org",
    "time": "Mon Jun 01 11:02:23 2026 +0200"
  },
  "message": "libgomp: let plugins handle allocating the target variable table\n\nIn my examination of BabelStream results on AMD GCN, I\u0027ve found that,\nfor each BabelStream kernel execution, we spend significant time in\nallocating and initializing memory in gomp_map_vars (~55µs, whereas the\nactual BabelStream code executes in ~746µs, meaning we increase the time\nBabelStream measures by 7% just on that).\n\nUpon further examination, I\u0027ve found that the only reason gomp_map_vars\ndecides to allocate and map any memory in the first place is because it\nis constructing the table of pointers to variables on the target, which\nI\u0027ve taken to calling the \"target variable table\".  Given that the GCN\nplugin already must perform some memory allocation before starting up a\nkernel, namely to allocate kernel arguments, it would be beneficial if\nwe could merge this allocation with the kernel arguments allocation.\n\nIn addition, since the kernel arguments live in host memory, populating\nthem can be performed using string functions, without any need to call\nfor expensive host2dev copies.\n\nThis patch introduces an opaque type for \"offload sessions\".  This type\nis defined by each plugin and allows it to store data related to a\nsingle offload job.  The sessions are allocated and managed by libgomp,\nand initialized and utilized by the plugin.  Their lifetime starts with\na call to GOMP_OFFLOAD_session_start, and ends with\nGOMP_OFFLOAD_{openacc_{async_,}exec,{async_,}run}.\n\nThe patch then uses this framework to make management of the target\nvariable table more flexible: the plugin may elect to implement\nGOMP_OFFLOAD_session_allocate_target_var_table, which allows the plugin\nto attempt to allocate the target variable table in host memory.\n\nIf it fails, or if the plugin does not provide this function, libgomp\nwill perform this allocation as it does today - in target memory - and\ntell the session about it using\nGOMP_OFFLOAD_session_set_target_var_table.\n\nIn the case of AMD GCN, upon a call to\nGOMP_OFFLOAD_session_allocate_target_var_table, the plugin will\nimmediately allocate kernel arguments with enough space for the target\nvariable table, no matter what size the plugin asks for[1], and return\nthat pointer to libgomp.\n\nThis results in the runtime of gomp_map_vars effectively disappearing\nfrom traces.\n\n[1] It may be beneficial to limit this, to some fixed amount, to make it\n    so that the future allocation cache has a higher cache hit rate.  It\n    may also depend on whether hsa_memory_allocate for kernel arguments\n    takes runtime proportional to the number of bytes it needs to\n    allocate.\n\ninclude/ChangeLog:\n\n\t* gomp-constants.h (GOMP_VERSION): Bump.  Signature of\n\tGOMP_OFFLOAD_run et al changed.\n\nlibgomp/ChangeLog:\n\n\t* libgomp-plugin.h (GOMP_OFFLOAD_run, GOMP_OFFLOAD_exec)\n\t(GOMP_OFFLOAD_async_run, GOMP_OFFLOAD_openacc_async_exec): Pass\n\tsession in place of target variable table and devices.\n\t(struct gomp_offload_session): New.\n\t(GOMP_OFFLOAD_session_size): New\n\t(GOMP_OFFLOAD_check_session_struct): New.\n\t(GOMP_OFFLOAD_session_boilerplate): New.\n\t(GOMP_OFFLOAD_session_start): New.\n\t(GOMP_OFFLOAD_session_allocate_target_var_table): New.\n\t(GOMP_OFFLOAD_session_set_target_var_table): New.\n\t* libgomp.h (struct gomp_target_task): Add offload_session\n\tfield.\n\t(struct gomp_device_descr): Add offload session management\n\tfunctions.\n\t(gomp_offload_session_new): New.\n\t(goacc_map_vars): Add SESSION to signature\n\t* oacc-host.c (struct gomp_offload_session): Define, for host\n\toffload fallback case.\n\t(host_session_size): New.  Implements GOMP_OFFLOAD_session_size.\n\t(host_session_start): New.  Implements\n\tGOMP_OFFLOAD_session_start.\n\t(host_session_set_target_var_table): New.  Implements\n\tGOMP_OFFLOAD_session_set_target_var_table.\n\t(host_run): Adjust to match GOMP_OFFLOAD_run.\n\t(host_openacc_exec): Adjust to match GOMP_OFFLOAD_openacc_exec.\n\t(host_openacc_async_exec): Adjust to match\n\tGOMP_OFFLOAD_openacc_async_exec.\n\t* oacc-mem.c (acc_map_data): Adjust call to goacc_map_vars.\n\t(goacc_enter_datum): Ditto.\n\t(goacc_enter_data_internal): Ditto.\n\t* oacc-parallel.c (GOACC_parallel_keyed): Allocate and pass\n\toffload session.\n\t(GOACC_data_start): Adjust call to goacc_map_vars.\n\t* plugin/plugin-gcn.c (struct kernel_dispatch): Remove\n\tkernarg_cache_node.\n\t(struct kernargs): Add a flexible array member for the target\n\tvariable table.\n\t(struct kernel_launch): Store an offload session rather than\n\ttarget var. table pointer.\n\t(print_kernel_dispatch): Receive kernargs as parameter.\n\t(struct gomp_offload_session): Define.\n\t(init_session): New.\n\t(GOMP_OFFLOAD_session_start): Implement, using init_session.\n\t(release_session): New.\n\t(alloc_kernargs_on_agent): Rename to...\n\t(allocate_session_kernargs): ... this, store result in\n\tpassed-in SESSION, and allocate extra room for target variable\n\ttable (rounding it up to nearest multiple of 64 pointers).\n\t(GOMP_OFFLOAD_session_allocate_target_var_table): Implement\n\tusing the previous function.\n\t(GOMP_OFFLOAD_session_set_target_var_table): Ditto.\n\t(create_kernel_dispatch): Remove kernarg allocation, instead\n\treceiving it as an argument.\n\t(release_kernel_dispatch): Receive kernargs as an argument,\n\tdon\u0027t release them.\n\t(run_kernel): Adjust to use sessions.\n\t(destroy_module): Ditto.\n\t(GOMP_OFFLOAD_load_image): Ditto.\n\t(execute_queue_entry): Adjust to match changed struct\n\tkernel_launch.\n\t(queue_push_launch): Ditto.\n\t(gcn_exec): Receive and pass along session.\n\t(GOMP_OFFLOAD_run): Ditto.\n\t(GOMP_OFFLOAD_async_run): Ditto.\n\t(GOMP_OFFLOAD_openacc_exec): Ditto.\n\t(GOMP_OFFLOAD_openacc_async_exec): Ditto.\n\t* plugin/plugin-nvptx.c (struct gomp_offload_session): Define.\n\t(GOMP_OFFLOAD_session_start): Implement.\n\t(GOMP_OFFLOAD_session_set_target_var_table): Implement.\n\t(GOMP_OFFLOAD_openacc_exec): Adjust to receive session.\n\t(GOMP_OFFLOAD_openacc_async_exec): Ditto.\n\t(GOMP_OFFLOAD_run): Ditto.\n\t* target.c (gomp_get_tvt_size): Extract helper from...\n\t(gomp_map_vars_internal): ... here.  Receive SESSION, iff doing\n\ttarget offload.  Use a target variable table on the host\n\tallocated by GOMP_OFFLOAD_session_allocate_target_var_table if\n\tpossible, or call GOMP_OFFLOAD_session_set_target_var_table with\n\tan allocated device pointer otherwise.\n\t(gomp_map_vars): Update to pass along session.\n\t(goacc_map_vars): Ditto.\n\t(GOMP_target): Allocate and pass along session.\n\t(GOMP_target_ext): Ditto.\n\t(gomp_target_data_fallback): Adjust call to gomp_map_vars.\n\t(GOMP_target_data): Ditto.\n\t(GOMP_target_data_ext): Ditto.\n\t(GOMP_target_enter_exit_data): Ditto.\n\t(gomp_target_task_fn): Start and pass along session, the storage\n\tfor which is allocated by gomp_create_target_task.\n\t(DLSYM2): Rename from DLSYM, adding a new parameter for the\n\tvariable to populate, akin to DLSYM_OPT.\n\t(DLSYM): Delegate to DLSYM2.\n\t(gomp_load_plugin_for_device): Populate session-related fields.\n\t* task.c (gomp_create_target_task): Allocate enough storage for\n\tan offload session.\n\t* testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c: New test.\n\t* testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c: New test.\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "4cfda5ee063d7ba7042ea6dfee4e455c2e2d754a",
      "old_mode": 33188,
      "old_path": "include/gomp-constants.h",
      "new_id": "def838a949d45c72f766993c210a0a354a22aeb1",
      "new_mode": 33188,
      "new_path": "include/gomp-constants.h"
    },
    {
      "type": "modify",
      "old_id": "bb4d577b66d7d2b9f527f94cab0a98b590401522",
      "old_mode": 33188,
      "old_path": "libgomp/libgomp-plugin.h",
      "new_id": "39c322255fd6d4019f312e70601a7bbd137a6c36",
      "new_mode": 33188,
      "new_path": "libgomp/libgomp-plugin.h"
    },
    {
      "type": "modify",
      "old_id": "4b31564a6b24484e91a9111ca6651a970e4dd339",
      "old_mode": 33188,
      "old_path": "libgomp/libgomp.h",
      "new_id": "be496560591aece931f56ccf11bd1e3c5eaa1312",
      "new_mode": 33188,
      "new_path": "libgomp/libgomp.h"
    },
    {
      "type": "modify",
      "old_id": "028a5c943b7eba6e74d4009ec4f316a4523d80ee",
      "old_mode": 33188,
      "old_path": "libgomp/oacc-host.c",
      "new_id": "2cc7e6464eae7ef844cc7050d6225ed4e6d58e46",
      "new_mode": 33188,
      "new_path": "libgomp/oacc-host.c"
    },
    {
      "type": "modify",
      "old_id": "738281f5701c32641fdfd538d0c1d030fae64a7e",
      "old_mode": 33188,
      "old_path": "libgomp/oacc-mem.c",
      "new_id": "5601daf13957ca50b192537c03938ad65d8806f5",
      "new_mode": 33188,
      "new_path": "libgomp/oacc-mem.c"
    },
    {
      "type": "modify",
      "old_id": "9f48c8b7f64482051917aa4827cace099cbac79f",
      "old_mode": 33188,
      "old_path": "libgomp/oacc-parallel.c",
      "new_id": "08f969d7d7ac2cf4137207ef1e8630dc2459bf2b",
      "new_mode": 33188,
      "new_path": "libgomp/oacc-parallel.c"
    },
    {
      "type": "modify",
      "old_id": "e415eea273dfdaf717abad94bc00f10d70f8e7ad",
      "old_mode": 33188,
      "old_path": "libgomp/plugin/plugin-gcn.c",
      "new_id": "cf5f7bf3415c96b1d81e12ecc3e262149b2cf86d",
      "new_mode": 33188,
      "new_path": "libgomp/plugin/plugin-gcn.c"
    },
    {
      "type": "modify",
      "old_id": "377893e6b703b0e0dcc9a7aea4d2c65cb59db4f9",
      "old_mode": 33188,
      "old_path": "libgomp/plugin/plugin-nvptx.c",
      "new_id": "34203d14daf837fa83f5bcd1c66406079f47d92d",
      "new_mode": 33188,
      "new_path": "libgomp/plugin/plugin-nvptx.c"
    },
    {
      "type": "modify",
      "old_id": "7543065d106980fdb6e4521b3f6e79f929681c20",
      "old_mode": 33188,
      "old_path": "libgomp/target.c",
      "new_id": "0c24072019d8a59e13d08f16ed8198af9ee096ad",
      "new_mode": 33188,
      "new_path": "libgomp/target.c"
    },
    {
      "type": "modify",
      "old_id": "cbba28516e3fa074c83bd84e94eababb34285e33",
      "old_mode": 33188,
      "old_path": "libgomp/task.c",
      "new_id": "89dafb8722086d5525a8917e14f2a082c531f00a",
      "new_mode": 33188,
      "new_path": "libgomp/task.c"
    },
    {
      "type": "add",
      "old_id": "0000000000000000000000000000000000000000",
      "old_mode": 0,
      "old_path": "/dev/null",
      "new_id": "7494c5a5f4c8ccc58773ea0ea11b7b258a0165a6",
      "new_mode": 33188,
      "new_path": "libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-no-tvt-alloc.c"
    },
    {
      "type": "add",
      "old_id": "0000000000000000000000000000000000000000",
      "old_mode": 0,
      "old_path": "/dev/null",
      "new_id": "ab5ed2dc43369ff616e944f3b6ce193c4604dac0",
      "new_mode": 33188,
      "new_path": "libgomp/testsuite/libgomp.c-c++-common/gcn-kernel-launch-tvt-alloc.c"
    }
  ]
}
