)]}'
{
  "commit": "33179c5b57c9fc89c061deb1d387f972b8604b49",
  "tree": "8b68cd0f5c846575def1369d11b6b8f061062173",
  "parents": [
    "7d1fd6713582f2436a9a569ac7a8700c75e35a42"
  ],
  "author": {
    "name": "Pedro Alves",
    "email": "pedro@palves.net",
    "time": "Fri Dec 01 17:45:21 2023 +0000"
  },
  "committer": {
    "name": "Pedro Alves",
    "email": "pedro@palves.net",
    "time": "Wed Dec 20 21:20:20 2023 +0000"
  },
  "message": "Fix handling of vanishing threads that were stepping/stopping\n\nDownstream, AMD is carrying a testcase\n(gdb.rocm/continue-over-kernel-exit.exp) that exposes a couple issues\nwith the amd-dbgapi target\u0027s handling of exited threads.  The test\ncan\u0027t be added upstream yet, unfortunately, due to dependency on DWARF\nextensions that can\u0027t be upstreamed yet.  However, it can be found on\nthe mailing list on the same series as this patch.\n\nThe test spawns a kernel with a number of waves.  The waves do nothing\nbut exit.  There is a breakpoint on the s_endpgm instruction.  Once\nthat breakpoint is hit, the test issues a \"continue\" command.  We\nshould see one breakpoint hit per wave, and then the whole program\nexiting.  We do see that, however we also see this:\n\n [New AMDGPU Wave ?:?:?:1 (?,?,?)/?]\n [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]\n *repeat for other waves*\n ...\n [Thread 0x7ffff626f640 (LWP 3048491) exited]\n [Thread 0x7fffeb7ff640 (LWP 3048488) exited]\n [Inferior 1 (process 3048475) exited normally]\n\nThat \"New AMDGPU Wave\" output comes from infrun.c itself adding the\nthread to the GDB thread list, because it got an event for a thread\nnot on the thread list yet.  The output shows \"?\"s instead of proper\ncoordinates, because the event was a TARGET_WAITKIND_THREAD_EXITED,\ni.e., the wave was already gone when infrun.c added the thread to the\nthread list.\n\nThat shouldn\u0027t ever happen for the amd-dbgapi target, threads should\nonly ever be added by the backend.\n\nNote \"New AMDGPU Wave ?:?:?:1\" is for wave 1.  What happened was that\nwave 1 terminated previously, and a previous call to\namd_dbgapi_target::update_thread_list() noticed the wave had vanished\nand removed it from the GDB thread list.  However, because the wave\nwas stepping when it terminated (due to the displaced step over the\ns_endpgm) instruction, it is guaranteed that the amd-dbgapi library\nqueues a WAVE_COMMAND_TERMINATED event for the exit.\n\nWhen we process that WAVE_COMMAND_TERMINATED event, in\namd-dbgapi-target.c:process_one_event, we return it to the core as a\nTARGET_WAITKIND_THREAD_EXITED event:\n\n static void\n process_one_event (amd_dbgapi_event_id_t event_id,\n\t\t    amd_dbgapi_event_kind_t event_kind)\n {\n ...\n\t if (status \u003d\u003d AMD_DBGAPI_STATUS_ERROR_INVALID_WAVE_ID\n\t     \u0026\u0026 event_kind \u003d\u003d AMD_DBGAPI_EVENT_KIND_WAVE_COMMAND_TERMINATED)\n\t   ws.set_thread_exited (0);\n ...\n }\n\nRecall the wave is already gone from the GDB thread list.  So when GDB\nsees that TARGET_WAITKIND_THREAD_EXITED event for a thread it doesn\u0027t\nknow about, it adds the thread to the thread list, resulting in that:\n\n [New AMDGPU Wave ?:?:?:1 (?,?,?)/?]\n\nand then, because it was a TARGET_WAITKIND_THREAD_EXITED event, GDB\nmarks the thread exited right afterwards:\n\n [AMDGPU Wave ?:?:?:1 (?,?,?)/? exited]\n\nThe fix is to make amd_dbgapi_target::update_thread_list() _not_\ndelete vanishing waves iff they were stepping or in progress of being\nstopped.  These two cases are the ones dbgapi guarantees will result\nin a WAVE_COMMAND_TERMINATED event if the wave terminates:\n\n  /**\n   * A command for a wave was not able to complete because the wave has\n   * terminated.\n   *\n   * Commands that can result in this event are ::amd_dbgapi_wave_stop and\n   * ::amd_dbgapi_wave_resume in single step mode.  Since the wave terminated\n   * before stopping, this event will be reported instead of\n   * ::AMD_DBGAPI_EVENT_KIND_WAVE_STOP.\n   *\n   * The wave that terminated is available by the ::AMD_DBGAPI_EVENT_INFO_WAVE\n   * query.  However, the wave will be invalid since it has already terminated.\n   * It is the client\u0027s responsibility to know what command was being performed\n   * and was unable to complete due to the wave terminating.\n   */\n  AMD_DBGAPI_EVENT_KIND_WAVE_COMMAND_TERMINATED \u003d 2,\n\nAs the comment says, it\u0027s GDB\u0027s responsability to know whether the\nwave was stepping or being stopped.  Since we now have a wave_info map\nwith one entry for each wave, that seems like the place to store that\ninformation.  However, I still decided to put all the coordinate\ninformation in its own structure.  I.e., basically renamed the\nexisting wave_info to wave_coordinates, and then added a new wave_info\nstructure that holds the new state, plus a wave_coordinates object.\nThis seemed cleaner as there are places where we only need to\ninstantiate a wave_coordinates object.\n\nThere\u0027s an extra twist.  The testcase also exercises stopping at a new\nkernel right after the first kernel fully exits.  In that scenario, we\nwere hitting this assertion after the first kernel fully exits and the\nhit of the breakpoint at the second kernel is handled:\n\n [amd-dbgapi] process_event_queue: Pulled event from dbgapi: event_id.handle \u003d 26, event_kind \u003d WAVE_STOP\n [amd-dbgapi-lib] suspending queue_3, queue_2, queue_1 (refresh wave list)\n ../../src/gdb/amd-dbgapi-target.c:1625: internal-error: amd_dbgapi_thread_deleted: Assertion `it !\u003d info-\u003ewave_info_map.end ()\u0027 failed.\n A problem internal to GDB has been detected,\n further debugging may prove unreliable.\n\nThis is the exact same problem as above, just a different\nmanifestation.  In this scenario, we end up in update_thread_list\nsuccessfully deleting the exited thread (because it was no longer the\ncurrent thread) that was incorrectly added by infrun.c.  Because it\nwas added by infrun.c and not by amd-dbgapi-target.c:add_gpu_thread,\nit doesn\u0027t have an entry in the wave_info map, so\namd_dbgapi_thread_deleted trips on this assertion:\n\n      gdb_assert (it !\u003d info-\u003ewave_info_map.end ());\n\nhere:\n\n  ...\n  -\u003e stop_all_threads\n   -\u003e update_thread_list\n    -\u003e target_update_thread_list\n     -\u003e amd_dbgapi_target::update_thread_list\n      -\u003e thread_db_target::update_thread_list\n       -\u003e linux_nat_target::update_thread_list\n\t-\u003e delete_exited_threads\n\t -\u003e delete_thread\n\t  -\u003e delete_thread_1\n\t   -\u003e gdb::observers::observable\u003cthread_info*\u003e::notify\n\t    -\u003e amd_dbgapi_thread_deleted\n\t     -\u003e internal_error_loc\n\nThe testcase thus tries both running to exit after the first kernel\nexits, and running to a breakpoint in a second kernel after the first\nkernel exits.\n\nApproved-By: Lancelot Six \u003clancelot.six@amd.com\u003e (amdgpu)\nChange-Id: I43a66f060c35aad1fe0d9ff022ce2afd0537f028\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "be5e9ff93019305e229ce6d1c4c66271de5d8d7d",
      "old_mode": 33188,
      "old_path": "gdb/amd-dbgapi-target.c",
      "new_id": "9aa8db2a0a1a1524bbbb9d55176ce2a757362c96",
      "new_mode": 33188,
      "new_path": "gdb/amd-dbgapi-target.c"
    }
  ]
}
