analyzer: revamp of heap-allocated regions [PR106473]

PR analyzer/106473 reports a false positive from -Wanalyzer-malloc-leak
on:

  void foo(char **args[], int *argc) {
      *argc = 1;
      (*args)[0] = __builtin_malloc(42);
  }

The issue is that at the write to *argc we don't know if argc could
point within *args, and so we conservatiely set *args to be unknown.
At the write "(*args)[0] = __builtin_malloc(42)" we have the result of
the allocation written through an unknown pointer, so we mark the
heap_allocated_region as having escaped.
Unfortunately, within store::canonicalize we overzealously purge the
heap allocated region, losing the information that it has escaped, and
thus errnoeously report a leak.

The first part of the fix is to update store::canonicalize so that it
doesn't purge heap_allocated_regions that are marked as escaping.

Doing so fixes the leak false positive, but leads to various state
explosions relating to anywhere we have a malloc/free pair in a loop,
where the analysis of the iteration appears to only have been reaching
a fixed state due to a bug in the state merger code that was erroneously
merging state about the region allocated in one iteration with that
of another.  On touching that, the analyzer fails to reach a fixed state
on any loops containing a malloc/free pair, since each analysis of a
malloc was creating a new heap_allocated_region instance.

Hence the second part of the fix is to revamp how heap_allocated_regions
are managed within the analyzer.  Rather than create a new one at each
analysis of a malloc call, instead we reuse them across the analysis,
only creating a new one if the current path's state is referencing all
of the existing ones.  Hence the heap_allocated_region instances get
used in a fixed order along every analysis path, so e.g. at:

  if (flag)
    p = malloc (4096);
  else
    p = malloc (1024);

both paths now use the same heap_allocated_region for their malloc
calls - but we still end up with two enodes after the CFG merger, by
rejecting merger of states with non-equal dynamic extents.

gcc/analyzer/ChangeLog:
	PR analyzer/106473
	* call-summary.cc
	(call_summary_replay::convert_region_from_summary_1): Update for
	change to creation of heap-allocated regions.
	* program-state.cc (test_program_state_1): Likewise.
	(test_program_state_merging): Likewise.
	* region-model-impl-calls.cc (kf_calloc::impl_call_pre): Likewise.
	(kf_malloc::impl_call_pre): Likewise.
	(kf_operator_new::impl_call_pre): Likewise.
	(kf_realloc::impl_call_postsuccess_with_move::update_model): Likewise.
	* region-model-manager.cc
	(region_model_manager::create_region_for_heap_alloc): Convert
	to...
	(region_model_manager::get_or_create_region_for_heap_alloc):
	...this, reusing an existing region if it's unreferenced in the
	client state.
	* region-model-manager.h (region_model_manager::get_num_regions): New.
	 (region_model_manager::create_region_for_heap_alloc): Convert to...
	 (region_model_manager::get_or_create_region_for_heap_alloc): ...this.
	* region-model.cc (region_to_value_map::can_merge_with_p): Reject
	merger when the values are different.
	(region_model::create_region_for_heap_alloc): Convert to...
	(region_model::get_or_create_region_for_heap_alloc): ...this.
	(region_model::get_referenced_base_regions): New.
	(selftest::test_state_merging):  Update for change to creation of
	heap-allocated regions.
	(selftest::test_malloc_constraints): Likewise.
	(selftest::test_malloc): Likewise.
	* region-model.h: Include "sbitmap.h".
	(region_model::create_region_for_heap_alloc): Convert to...
	(region_model::get_or_create_region_for_heap_alloc): ...this.
	(region_model::get_referenced_base_regions): New decl.
	* store.cc (store::canonicalize): Don't purge a heap-allocated region
	that's been marked as escaping.

gcc/testsuite/ChangeLog:
	PR analyzer/106473
	* gcc.dg/analyzer/aliasing-pr106473.c: New test.
	* gcc.dg/analyzer/allocation-size-2.c: Add
	-fanalyzer-fine-grained".
	* gcc.dg/analyzer/allocation-size-3.c: Likewise.
	* gcc.dg/analyzer/explode-1.c: Mark leak with XFAIL.
	* gcc.dg/analyzer/explode-3.c: New test.
	* gcc.dg/analyzer/malloc-reuse.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
diff --git a/gcc/analyzer/call-summary.cc b/gcc/analyzer/call-summary.cc
index 4c4694b..3167473 100644
--- a/gcc/analyzer/call-summary.cc
+++ b/gcc/analyzer/call-summary.cc
@@ -726,7 +726,9 @@
 	/* If we have a heap-allocated region in the summary, then
 	   it was allocated within the callee.
 	   Create a new heap-allocated region to summarize this.  */
-	return mgr->create_region_for_heap_alloc ();
+	auto_sbitmap heap_regs_in_use (mgr->get_num_regions ());
+	get_caller_model ()->get_referenced_base_regions (heap_regs_in_use);
+	return mgr->get_or_create_region_for_heap_alloc (heap_regs_in_use);
       }
       break;
     case RK_ALLOCA:
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index a0585f5..037dbec 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -1733,7 +1733,7 @@
   const svalue *size_in_bytes
     = mgr->get_or_create_unknown_svalue (size_type_node);
   const region *new_reg
-    = model->create_region_for_heap_alloc (size_in_bytes, NULL);
+    = model->get_or_create_region_for_heap_alloc (size_in_bytes, NULL);
   const svalue *ptr_sval = mgr->get_ptr_svalue (ptr_type_node, new_reg);
   model->set_value (model->get_lvalue (p, NULL),
 		    ptr_sval, NULL);
@@ -1790,7 +1790,7 @@
   const svalue *size_in_bytes
     = mgr->get_or_create_unknown_svalue (size_type_node);
   const region *new_reg
-    = model0->create_region_for_heap_alloc (size_in_bytes, NULL);
+    = model0->get_or_create_region_for_heap_alloc (size_in_bytes, NULL);
   const svalue *ptr_sval = mgr->get_ptr_svalue (ptr_type_node, new_reg);
   model0->set_value (model0->get_lvalue (p, &ctxt),
 		     ptr_sval, &ctxt);
diff --git a/gcc/analyzer/region-model-impl-calls.cc b/gcc/analyzer/region-model-impl-calls.cc
index d3f2bf8..37cb09f 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -634,7 +634,7 @@
     = mgr->get_or_create_binop (size_type_node, MULT_EXPR,
 				nmemb_sval, size_sval);
   const region *new_reg
-    = model->create_region_for_heap_alloc (prod_sval, cd.get_ctxt ());
+    = model->get_or_create_region_for_heap_alloc (prod_sval, cd.get_ctxt ());
   const region *sized_reg
     = mgr->get_sized_region (new_reg, NULL_TREE, prod_sval);
   model->zero_fill_region (sized_reg);
@@ -837,7 +837,7 @@
   region_model_manager *mgr = cd.get_manager ();
   const svalue *size_sval = cd.get_arg_svalue (0);
   const region *new_reg
-    = model->create_region_for_heap_alloc (size_sval, cd.get_ctxt ());
+    = model->get_or_create_region_for_heap_alloc (size_sval, cd.get_ctxt ());
   if (cd.get_lhs_type ())
     {
       const svalue *ptr_sval
@@ -1067,7 +1067,7 @@
     region_model_manager *mgr = cd.get_manager ();
     const svalue *size_sval = cd.get_arg_svalue (0);
     const region *new_reg
-      = model->create_region_for_heap_alloc (size_sval, cd.get_ctxt ());
+      = model->get_or_create_region_for_heap_alloc (size_sval, cd.get_ctxt ());
     if (cd.get_lhs_type ())
       {
 	const svalue *ptr_sval
@@ -1255,7 +1255,7 @@
 
       /* Create the new region.  */
       const region *new_reg
-	= model->create_region_for_heap_alloc (new_size_sval, ctxt);
+	= model->get_or_create_region_for_heap_alloc (new_size_sval, ctxt);
       const svalue *new_ptr_sval
 	= mgr->get_ptr_svalue (cd.get_lhs_type (), new_reg);
       if (!model->add_constraint (new_ptr_sval, NE_EXPR, old_ptr_sval,
diff --git a/gcc/analyzer/region-model-manager.cc b/gcc/analyzer/region-model-manager.cc
index 08bf5d2..d9a7ae9 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -1678,11 +1678,22 @@
   return new_reg;
 }
 
-/* Return a new region describing a heap-allocated block of memory.  */
+/* Return a region describing a heap-allocated block of memory.
+   Reuse an existing heap_allocated_region is its id is not within
+   BASE_REGS_IN_USE.  */
 
 const region *
-region_model_manager::create_region_for_heap_alloc ()
+region_model_manager::
+get_or_create_region_for_heap_alloc (const sbitmap &base_regs_in_use)
 {
+  /* Try to reuse an existing region, if it's unreferenced in the
+     client state.  */
+  for (auto existing_reg : m_managed_dynamic_regions)
+    if (!bitmap_bit_p (base_regs_in_use, existing_reg->get_id ()))
+      if (existing_reg->get_kind () == RK_HEAP_ALLOCATED)
+	return existing_reg;
+
+  /* All existing ones (if any) are in use; create a new one.  */
   region *reg
     = new heap_allocated_region (alloc_region_id (), &m_heap_region);
   m_managed_dynamic_regions.safe_push (reg);
diff --git a/gcc/analyzer/region-model-manager.h b/gcc/analyzer/region-model-manager.h
index 0ff253b..22f9800 100644
--- a/gcc/analyzer/region-model-manager.h
+++ b/gcc/analyzer/region-model-manager.h
@@ -100,6 +100,7 @@
   const svalue *create_unique_svalue (tree type);
 
   /* region consolidation.  */
+  unsigned get_num_regions () const { return m_next_region_id; }
   const stack_region * get_stack_region () const { return &m_stack_region; }
   const heap_region *get_heap_region () const { return &m_heap_region; }
   const code_region *get_code_region () const { return &m_code_region; }
@@ -152,7 +153,8 @@
   /* Dynamically-allocated region instances.
      The number of these within the analysis can grow arbitrarily.
      They are still owned by the manager.  */
-  const region *create_region_for_heap_alloc ();
+  const region *
+  get_or_create_region_for_heap_alloc (const sbitmap &base_regs_in_use);
   const region *create_region_for_alloca (const frame_region *frame);
 
   void log_stats (logger *logger, bool show_objs) const;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 92f8b94..7f2c0b6 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -209,8 +209,9 @@
    to OUT.
 
    For now, write (region, value) mappings that are in common between THIS
-   and OTHER to OUT, effectively taking the intersection, rather than
-   rejecting differences.  */
+   and OTHER to OUT, effectively taking the intersection.
+
+   Reject merger of different values.  */
 
 bool
 region_to_value_map::can_merge_with_p (const region_to_value_map &other,
@@ -222,8 +223,12 @@
       const svalue *iter_sval = iter.second;
       const svalue * const * other_slot = other.get (iter_reg);
       if (other_slot)
-	if (iter_sval == *other_slot)
-	  out->put (iter_reg, iter_sval);
+	{
+	  if (iter_sval == *other_slot)
+	    out->put (iter_reg, iter_sval);
+	  else
+	    return false;
+	}
     }
   return true;
 }
@@ -5498,19 +5503,52 @@
 	}
 }
 
-/* Return a new region describing a heap-allocated block of memory.
-   Use CTXT to complain about tainted sizes.  */
+/* Return a region describing a heap-allocated block of memory.
+   Use CTXT to complain about tainted sizes.
+
+   Reuse an existing heap_allocated_region if it's not being referenced by
+   this region_model; otherwise create a new one.  */
 
 const region *
-region_model::create_region_for_heap_alloc (const svalue *size_in_bytes,
-					    region_model_context *ctxt)
+region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
+						   region_model_context *ctxt)
 {
-  const region *reg = m_mgr->create_region_for_heap_alloc ();
+  /* Determine which regions are referenced in this region_model, so that
+     we can reuse an existing heap_allocated_region if it's not in use on
+     this path.  */
+  auto_sbitmap base_regs_in_use (m_mgr->get_num_regions ());
+  get_referenced_base_regions (base_regs_in_use);
+  const region *reg
+    = m_mgr->get_or_create_region_for_heap_alloc (base_regs_in_use);
   if (compat_types_p (size_in_bytes->get_type (), size_type_node))
     set_dynamic_extents (reg, size_in_bytes, ctxt);
   return reg;
 }
 
+/* Populate OUT_IDS with the set of IDs of those base regions which are
+   reachable in this region_model.  */
+
+void
+region_model::get_referenced_base_regions (auto_sbitmap &out_ids) const
+{
+  reachable_regions reachable_regs (const_cast<region_model *> (this));
+  m_store.for_each_cluster (reachable_regions::init_cluster_cb,
+			    &reachable_regs);
+  /* Get regions for locals that have explicitly bound values.  */
+  for (store::cluster_map_t::iterator iter = m_store.begin ();
+       iter != m_store.end (); ++iter)
+    {
+      const region *base_reg = (*iter).first;
+      if (const region *parent = base_reg->get_parent_region ())
+	if (parent->get_kind () == RK_FRAME)
+	  reachable_regs.add (base_reg, false);
+    }
+
+  bitmap_clear (out_ids);
+  for (auto iter_reg : reachable_regs)
+    bitmap_set_bit (out_ids, iter_reg->get_id ());
+}
+
 /* Return a new region describing a block of memory allocated within the
    current frame.
    Use CTXT to complain about tainted sizes.  */
@@ -7608,7 +7646,7 @@
     tree size = build_int_cst (size_type_node, 1024);
     const svalue *size_sval = mgr.get_or_create_constant_svalue (size);
     const region *new_reg
-      = model0.create_region_for_heap_alloc (size_sval, &ctxt);
+      = model0.get_or_create_region_for_heap_alloc (size_sval, &ctxt);
     const svalue *ptr_sval = mgr.get_ptr_svalue (ptr_type_node, new_reg);
     model0.set_value (model0.get_lvalue (p, &ctxt),
 		      ptr_sval, &ctxt);
@@ -7996,7 +8034,8 @@
 
   const svalue *size_in_bytes
     = mgr.get_or_create_unknown_svalue (size_type_node);
-  const region *reg = model.create_region_for_heap_alloc (size_in_bytes, NULL);
+  const region *reg
+    = model.get_or_create_region_for_heap_alloc (size_in_bytes, NULL);
   const svalue *sval = mgr.get_ptr_svalue (ptr_type_node, reg);
   model.set_value (model.get_lvalue (p, NULL), sval, NULL);
   model.set_value (q, p, NULL);
@@ -8225,7 +8264,8 @@
 
   /* "p = malloc (n * 4);".  */
   const svalue *size_sval = model.get_rvalue (n_times_4, &ctxt);
-  const region *reg = model.create_region_for_heap_alloc (size_sval, &ctxt);
+  const region *reg
+    = model.get_or_create_region_for_heap_alloc (size_sval, &ctxt);
   const svalue *ptr = mgr.get_ptr_svalue (int_star, reg);
   model.set_value (model.get_lvalue (p, &ctxt), ptr, &ctxt);
   ASSERT_EQ (model.get_capacity (reg), size_sval);
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 4413f55..86c42a2 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -26,6 +26,7 @@
       (Zhongxing Xu, Ted Kremenek, and Jian Zhang)
      http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf  */
 
+#include "sbitmap.h"
 #include "selftest.h"
 #include "analyzer/svalue.h"
 #include "analyzer/region.h"
@@ -434,10 +435,12 @@
 		       region_model_context *ctxt,
 		       rejected_constraint **out);
 
-  const region *create_region_for_heap_alloc (const svalue *size_in_bytes,
-					      region_model_context *ctxt);
+  const region *
+  get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
+				       region_model_context *ctxt);
   const region *create_region_for_alloca (const svalue *size_in_bytes,
 					  region_model_context *ctxt);
+  void get_referenced_base_regions (auto_sbitmap &out_ids) const;
 
   tree get_representative_tree (const svalue *sval) const;
   tree get_representative_tree (const region *reg) const;
diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index 636d4aa..99939b7 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -3069,6 +3069,15 @@
       binding_cluster *cluster = (*iter).second;
       if (base_reg->get_kind () == RK_HEAP_ALLOCATED)
 	{
+	  /* Don't purge a heap-allocated region that's been marked as
+	     escaping, since this could be recording that a ptr to it
+	     was written to an unknown symbolic region along this
+	     path, and so we don't know whether it's referenced or
+	     not, and hence should report it as leaking
+	     (PR analyzer/106473).  */
+	  if (cluster->escaped_p ())
+	    continue;
+
 	  if (cluster->empty_p ())
 	    if (!s.m_regs.contains (base_reg))
 	      purgeable_regions.add (base_reg);
diff --git a/gcc/testsuite/gcc.dg/analyzer/aliasing-pr106473.c b/gcc/testsuite/gcc.dg/analyzer/aliasing-pr106473.c
new file mode 100644
index 0000000..afd1492
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/aliasing-pr106473.c
@@ -0,0 +1,5 @@
+void foo(char **args[], int *argc)
+{
+  *argc = 1;
+  (*args)[0] = __builtin_malloc(42);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
index 42c39e2..2cf64e9 100644
--- a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fanalyzer-fine-grained" }
+   -fanalyzer-fine-grained is currently required; see PR analyzer/107851.  */
+
 #include <stdlib.h>
 #include <stdio.h>
 #include <stdint.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
index 9590e313..6751441 100644
--- a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fanalyzer-fine-grained" }
+   -fanalyzer-fine-grained is currently required; see PR analyzer/107851.  */
+
 /* { dg-additional-options -Wno-analyzer-out-of-bounds } */
 
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/explode-1.c b/gcc/testsuite/gcc.dg/analyzer/explode-1.c
index 9b95afd..7d05f40 100644
--- a/gcc/testsuite/gcc.dg/analyzer/explode-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/explode-1.c
@@ -47,7 +47,8 @@
 	{
 	default:
 	case 0:
-	  *pp = malloc (16); /* { dg-warning "leak" } */
+	  *pp = malloc (16);  /* { dg-warning "leak" "" { xfail *-*-* } } */
+	  // TODO: xfail
 	  break;
 	case 1:
 	  free (*pp);
diff --git a/gcc/testsuite/gcc.dg/analyzer/explode-3.c b/gcc/testsuite/gcc.dg/analyzer/explode-3.c
new file mode 100644
index 0000000..1657836
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/explode-3.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "-Wno-analyzer-too-complex" } */
+
+#include <stdlib.h>
+
+extern int get (void);
+
+/* In theory each of p0...p1 can be in various malloc states,
+   independently, so the total combined number of states
+   at any program point within the loop is NUM_VARS * NUM_STATES.  */
+
+void test (void)
+{
+  void *p0, *p1;
+  void **pp;
+  while (get ())
+    {
+      switch (get ())
+	{
+	default:
+	case 0:
+	  pp = &p0;
+	  break;
+	case 1:
+	  pp = &p1;
+	  break;
+	}
+
+      switch (get ())
+	{
+	default:
+	case 0:
+	  *pp = malloc (16); /* { dg-warning "leak" "" { xfail *-*-* } } */
+	  // TODO: xfail
+	  break;
+	case 1:
+	  free (*pp);
+	  break;
+	case 2:
+	  /* no-op.  */
+	  break;
+	}
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-reuse.c b/gcc/testsuite/gcc.dg/analyzer/malloc-reuse.c
new file mode 100644
index 0000000..4575ff5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/malloc-reuse.c
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target alloca } */
+
+#include <stdlib.h>
+#include "analyzer-decls.h"
+
+/* Multiple calls to malloc (without a free) should return non-equal pointers.  */
+
+void test_1 (void)
+{
+  void *p, *q;
+  p = malloc (1024);
+  if (!p)
+    return;
+  q = malloc (1024);
+  __analyzer_eval (p == q); /* { dg-warning "FALSE" } */
+  free (p);
+  free (q);
+}
+
+/* Multiple calls to malloc with a free might or might not
+   return the same pointer.  */
+
+void test_2 (void)
+{
+  void *p, *q;
+  p = malloc (1024);
+  if (!p)
+    return;
+  free (p);
+  
+  q = malloc (1024);
+  __analyzer_eval (p == q); /* { dg-warning "UNKNOWN" "ideal" { xfail *-*-* } } */
+  /* { dg-bogus "FALSE" "status quo" { xfail *-*-* } .-1 } */
+  // TODO: ideally this should be UNKNOWN
+  free (q);
+}
+
+void test_two_malloc_sites_same_size (int flag)
+{
+  void *p;
+  if (flag)
+    p = malloc (1024);
+  else
+    p = malloc (1024);
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
+  free (p);
+}
+
+void test_two_malloc_sites_different_sizes (int flag)
+{
+  void *p;
+  if (flag)
+    p = malloc (4096);
+  else
+    p = malloc (1024);
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2 processed enodes" } */
+  free (p);
+}