)]}'
{
  "commit": "b4f36efbd7cc0abc6cbbf74b9f36b4739b037fd0",
  "tree": "5b5b8cfa176b4bf124c8864dbab192ac963b5280",
  "parents": [
    "d5998d61eef27c98baa0307b2302e2475b9a3d64"
  ],
  "author": {
    "name": "Tamar Christina",
    "email": "tamar.christina@arm.com",
    "time": "Wed May 27 10:50:05 2026 +0100"
  },
  "committer": {
    "name": "Tamar Christina",
    "email": "tamar.christina@arm.com",
    "time": "Wed Jun 17 12:51:28 2026 +0100"
  },
  "message": "AArch64: fix the SVE-\u003eSIMD lowering optimization [PR125148]\n\nThe optimization added in g:210d06502f22964c7214586c54f8eb54a6965bfd has an\nimplementation bug which makes it generate bogus code.\n\nThe optimization was support to convert SVE loads with a known predicate into\nAdv. SIMD loads without the predicate.\n\nThe current implementation is done at expansion time where the predicate is\nstill clearly available.\n\nIt does this by rewriting the loads to an Adv. SIMD load and then taking a\nparadoxical subreg of the result into an SVE vector.\n\ni.e. (subreg:VNx16QI (reg:QI 111) 0)  for a byte load with a VL1 predicate.\n\nThe issue is that the SVE loads were UNSPEC before and they didn\u0027t get optimized\nby passes like forwprop and cse.  Adv. SIMD loads are.\n\nas such in cases where you have such a pattern:\n\nchar[] p \u003d {1,2,3,3};\nload (p, VL1)\n\nwe used to generate\n\n        mov     w0, 1\n        strb    w0, [x19]\n        ptrue   p7.b, vl1\n        ld1b    z30.b, p7/z, [x19]\n\nwhich was dumb, but valid and the above optimization now gets the load\neliminated and the constants folded.  However, in particular for scalars,\nAArch64 has an optimization that\u0027s been a long for ages in which scalar FPR\nconstants are created using vector broadcasting operations.  It assumes scalars\nare accessed as scalars (as in, in the mode that created them).\n\nSo the above gets optimized to\n\n        movi    v30.8b, 0x1\n\nwhich is invalid.  The original load requires the inactive elements to be zero,\nwhere-as by using the paradoxical subreg it\u0027s relying on the implicit (as in,\nnot modelled in RTL) assumption that the load zeros the top bits, but doesn\u0027t\nkeep in mind that the load can be optimized away.\n\nThis patch fixes it by creating a full SVE vector of 0s and writing only the\nvalues we want to set using an INSR. (i.e. using VL2 of bytes writes a short).\n\nIt then provides patterns to optimize this:\n\n1. if it\u0027s still following a load, just emit the load.\n2. if it\u0027s not, then optimize it to a zero\u0027ing operation. so e.g. HI mode\n   issues an fmov h0, h0 and so clears the top bits to zero.\n\nI choose this representation because even without the above operations it is\nsemantically valid and will generate correct code.\n\nThe alternative would be to delay this optimization to e.g. combine however we\nhave two problems there:\n\n1. It\u0027s quite late, so the above constant cases for instance don\u0027t get optimized\n   and we keep the pointless store and loads.\n2. Our RTX costs don\u0027t model predicates.  and so it may not accept the\n   combination since the replacement is more expensive.\n\nSo I chose to keep the optimization early, but just replace the paradoxical\nsubreg with a zero-extend.\n\ngcc/ChangeLog:\n\n\tPR target/125148\n\t* config/aarch64/aarch64-sve.md\n\t(*aarch64_vec_shl_insert_into_zero_\u003cmode\u003e,\n\t*aarch64_vec_shl_insert_into_zero_vnx16qi,\n\t*aarch64_vec_shl_insert_from_load_\u003cmode\u003e): New.\n\t* config/aarch64/aarch64.cc (aarch64_emit_load_store_through_mode):\n\tReplace paradoxical subreg with zero-extend.\n\ngcc/testsuite/ChangeLog:\n\n\tPR target/125148\n\t* gcc.target/aarch64/sve/highway_run.c: New test.\n\n(cherry picked from commit a6ee91793b9f4d28ccd3fcc6f607f646d305a39e)\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "019630eb8d21941b0ca718838ae6ef23e8aaedab",
      "old_mode": 33188,
      "old_path": "gcc/config/aarch64/aarch64-sve.md",
      "new_id": "e7d98c3754f157efc7711395364e0533775eebdc",
      "new_mode": 33188,
      "new_path": "gcc/config/aarch64/aarch64-sve.md"
    },
    {
      "type": "modify",
      "old_id": "6b5196b21eb5ec575c391e297213eab5277a3e98",
      "old_mode": 33188,
      "old_path": "gcc/config/aarch64/aarch64.cc",
      "new_id": "05e582220c1992ac3c389b9bdd2d05ea1e84b644",
      "new_mode": 33188,
      "new_path": "gcc/config/aarch64/aarch64.cc"
    },
    {
      "type": "add",
      "old_id": "0000000000000000000000000000000000000000",
      "old_mode": 0,
      "old_path": "/dev/null",
      "new_id": "b73fd51c63bed6f9b4e710546924ca60736d10e5",
      "new_mode": 33188,
      "new_path": "gcc/testsuite/gcc.target/aarch64/sve/highway_run.c"
    }
  ]
}
