Use OpenACC code to process OpenMP target regions

This is a backport of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619003.html

This patch implements '-fopenmp-target=acc', which enables internally handling
a subset of OpenMP target regions as OpenACC parallel regions. This basically
includes target, teams, parallel, distribute, for/do constructs, and atomics.

Essentially, we adjust the internal kinds to OpenACC type, and let OpenACC code
paths handle them, with various needed adjustments throughout middle-end and
nvptx backend. When using this "OMPACC" mode, if there are cases the patch
doesn't handle, it issues a warning, and reverts to normal processing for that
target region.

gcc/ChangeLog:

	* builtins.cc (expand_builtin_omp_builtins): New function.
	(expand_builtin): Add expand cases for BUILT_IN_GOMP_BARRIER,
	BUILT_IN_OMP_GET_THREAD_NUM, BUILT_IN_OMP_GET_NUM_THREADS,
	BUILT_IN_OMP_GET_TEAM_NUM, and BUILT_IN_OMP_GET_NUM_TEAMS using
	expand_builtin_omp_builtins, enabled under -fopenmp-target=acc.
	* cgraphunit.cc (analyze_functions): Add call to
	omp_ompacc_attribute_tagging, enabled under -fopenmp-target=acc.
	* common.opt (fopenmp-target=): Add new option and enums.
	* config/nvptx/mkoffload.cc (main): Handle -fopenmp-target=.
	* config/nvptx/nvptx-protos.h (nvptx_expand_omp_get_num_threads): New
	prototype.
	(nvptx_mem_shared_p): Likewise.
	* config/nvptx/nvptx.cc (omp_num_threads_sym): New global static RTX
	symbol for number of threads in team.
	(omp_num_threads_align): New var for alignment of omp_num_threads_sym.
	(need_omp_num_threads): New bool for if any function references
	omp_num_threads_sym.
	(nvptx_option_override): Initialize omp_num_threads_sym/align.
	(write_as_kernel): Disable normal OpenMP kernel entry under OMPACC mode.
	(nvptx_declare_function_name): Disable shim function under OMPACC mode.
	Disable soft-stack under OMPACC mode. Add generation of neutering init
	code under OMPACC mode.
	(nvptx_output_set_softstack): Return "" under OMPACC mode.
	(nvptx_expand_call): Set parallelism to vector for function calls with
	"ompacc for" attached.
	(nvptx_expand_oacc_fork): Set mode to GOMP_DIM_VECTOR under OMPACC mode.
	(nvptx_expand_oacc_join): Likewise.
	(nvptx_expand_omp_get_num_threads): New function.
	(nvptx_mem_shared_p): New function.
	(nvptx_mach_max_workers): Return 1 under OMPACC mode.
	(nvptx_mach_vector_length): Return 32 under OMPACC mode.
	(nvptx_single): Add adjustments for OMPACC mode, which have
	parallel-construct fork/joins, and regions of code where neutering is
	dynamically determined.
	(nvptx_reorg): Enable neutering under OMPACC mode when "ompacc for"
	attribute is attached to function. Disable uniform-simt when under
	OMPACC mode.
	(nvptx_file_end): Write __nvptx_omp_num_threads out when needed.
	(nvptx_goacc_fork_join): Return true under OMPACC mode.
	* config/nvptx/nvptx.h (struct GTY(()) machine_function): Add
	omp_parallel_predicate and omp_fn_entry_num_threads_reg fields.
	* config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID,
	UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID,
	UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries.
	(nvptx_shared_mem_operand): New predicate.
	(gomp_barrier): New expand pattern.
	(omp_get_num_threads): New expand pattern.
	(omp_get_num_teams): New insn pattern.
	(omp_get_thread_num): Likewise.
	(omp_get_team_num): Likewise.
	(get_ntid): Likewise.
	(nvptx_omp_parallel_fork): Likewise.
	(nvptx_omp_parallel_join): Likewise.

	* flag-types.h (omp_target_mode_kind): New flag value enum.
	* gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field.
	(gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_.
	(gimplify_adjust_omp_clauses): Likewise.
	(gimplify_omp_ctx_ompacc_p): New function.
	(gimplify_omp_for): Handle combined loops under OMPACC.

	* lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_.
	* omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST.
	(BUILT_IN_OMP_GET_NUM_THREADS): Likewise.
	* omp-expand.cc (remove_exit_barrier): Disable addressable-var
	processing for parallel construct child functions under OMPACC mode.
	(expand_oacc_for): Add OMPACC mode handling.
	(get_target_arguments): Force thread_limit clause value to 1 under
	OMPACC mode.
	(expand_omp): Under OMPACC mode, avoid child function expanding of
	GIMPLE_OMP_PARALLEL.
	* omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode.
	* omp-low.cc (struct omp_context): Add 'bool ompacc_p' field.
	(scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_.
	(ompacc_ctx_p): New function.
	(scan_omp_parallel): Handle OMPACC mode, avoid creating child function.
	(scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target
	construct child function, remove OMP_CLAUSE__OMPACC_ clauses.
	(lower_oacc_head_mark): Handle OMPACC mode cases.
	(lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add
	vector/gang clauses as needed. Add other OMPACC handling.
	(lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case.
	(lower_omp_target): Do OpenACC gang privatization under OMPACC case.
	(lower_omp_teams): Forward OpenACC privatization variables to outer
	target region under OMPACC mode.
	(lower_omp_1): Do OpenACC gang privatization under OMPACC case for
	GIMPLE_BIND.
	* omp-offload.cc (ompacc_supported_clauses_p): New function.
	(struct target_region_data): New struct type for tree walk.
	(scan_fndecl_for_ompacc): New function.
	(scan_omp_target_region_r): New function.
	(scan_omp_target_construct_r): New function.
	(omp_ompacc_attribute_tagging): New function.
	(oacc_dim_call): Add OMPACC case handling.
	(execute_oacc_device_lower): Make parts explicitly only OpenACC enabled.
	(pass_oacc_device_lower::gate): Enable pass under OMPACC mode.
	* omp-offload.h (omp_ompacc_attribute_tagging): New prototype.
	* opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp
	and no -fopenacc.
	* target-insns.def (gomp_barrier): New defined insn pattern.
	(omp_get_thread_num): Likewise.
	(omp_get_num_threads): Likewise.
	(omp_get_team_num): Likewise.
	(omp_get_num_teams): Likewise.
	* tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry
	for internal clause.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Handle
	OMP_CLAUSE__OMPACC_.
	* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_.
	* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry.
	(omp_clause_code_name): Likewise.
	* tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_.

	* tree-ssa-loop.cc (pass_oacc_only::gate): Enable pass under OMPACC
	mode cases.

libgomp/ChangeLog:

	* config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in
	shared memory.
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index b8cd75d..f36fe15 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6785,6 +6785,62 @@
   return target;
 }
 
+static rtx
+expand_builtin_omp_builtins (tree exp, rtx target, int ignore)
+{
+  rtx ret = NULL;
+  rtx_insn *(*gen_fn) (rtx) = NULL;
+
+  switch (DECL_FUNCTION_CODE (get_callee_fndecl (exp)))
+    {
+    case BUILT_IN_GOMP_BARRIER:
+      if (targetm.have_gomp_barrier ())
+	{
+	  emit_insn (targetm.gen_gomp_barrier ());
+	  return target;
+	}
+      break;
+
+    case BUILT_IN_OMP_GET_THREAD_NUM:
+      if (targetm.have_omp_get_thread_num ())
+	gen_fn = targetm.gen_omp_get_thread_num;
+      break;
+
+    case BUILT_IN_OMP_GET_NUM_THREADS:
+      if (targetm.have_omp_get_num_threads ())
+	gen_fn = targetm.gen_omp_get_num_threads;
+      break;
+
+    case BUILT_IN_OMP_GET_TEAM_NUM:
+      if (targetm.have_omp_get_team_num ())
+	gen_fn = targetm.gen_omp_get_team_num;
+      break;
+
+    case BUILT_IN_OMP_GET_NUM_TEAMS:
+      if (targetm.have_omp_get_num_teams ())
+	gen_fn = targetm.gen_omp_get_num_teams;
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  if (ignore)
+    return const0_rtx;
+
+  if (gen_fn)
+    {
+      rtx reg = (MEM_P (target)
+		 ? gen_reg_rtx (GET_MODE (target))
+		 : target);
+      emit_insn (gen_fn (reg));
+      if (reg != target)
+	emit_move_insn (target, reg);
+      ret = target;
+    }
+  return ret;
+}
+
 /* Expand a string compare operation using a sequence of char comparison
    to get rid of the calling overhead, with result going to TARGET if
    that's convenient.
@@ -8113,6 +8169,21 @@
     case BUILT_IN_GOACC_PARLEVEL_SIZE:
       return expand_builtin_goacc_parlevel_id_size (exp, target, ignore);
 
+    case BUILT_IN_GOMP_BARRIER:
+    case BUILT_IN_OMP_GET_THREAD_NUM:
+    case BUILT_IN_OMP_GET_NUM_THREADS:
+    case BUILT_IN_OMP_GET_TEAM_NUM:
+    case BUILT_IN_OMP_GET_NUM_TEAMS:
+      if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+	  && lookup_attribute ("ompacc",
+			       DECL_ATTRIBUTES (current_function_decl)))
+	{
+	  target = expand_builtin_omp_builtins (exp, target, ignore);
+	  if (target)
+	    return target;
+	}
+      break;
+
     case BUILT_IN_SPECULATION_SAFE_VALUE_PTR:
       return expand_speculation_safe_value (VOIDmode, exp, target, ignore);
 
diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index b949f61..4d45ab3 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -1174,7 +1174,12 @@
   build_type_inheritance_graph ();
 
   if (flag_openmp && first_time)
-    omp_discover_implicit_declare_target ();
+    {
+      omp_discover_implicit_declare_target ();
+
+      if(flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+	omp_ompacc_attribute_tagging ();
+    }
 
   /* Analysis adds static variables that in turn adds references to new functions.
      So we need to iterate the process until it stabilize.  */
diff --git a/gcc/common.opt b/gcc/common.opt
index e682cea..0caa645 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2233,6 +2233,19 @@
 EnumValue
 Enum(target_simd_clone_device) String(any) Value(OMP_TARGET_SIMD_CLONE_ANY)
 
+fopenmp-target=
+Common Joined RejectNegative Enum(openmp_target) Var(flag_openmp_target) Init(OMP_TARGET_MODE_DEFAULT)
+Execution model used for OpenMP target regions.
+
+Enum
+Name(openmp_target) Type(int)
+
+EnumValue
+Enum(openmp_target) String(default) Value(OMP_TARGET_MODE_DEFAULT)
+
+EnumValue
+Enum(openmp_target) String(acc) Value(OMP_TARGET_MODE_OMPACC)
+
 fopt-info
 Common Var(flag_opt_info) Optimization
 Enable all optimization info dumps on stderr.
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 5d89ba8..82ea313 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -603,6 +603,7 @@
 
   /* Scan the argument vector.  */
   bool fopenmp = false;
+  bool fopenmp_target = false;
   bool fopenacc = false;
   bool fPIC = false;
   bool fpic = false;
@@ -622,6 +623,9 @@
 #undef STR
       else if (strcmp (argv[i], "-fopenmp") == 0)
 	fopenmp = true;
+      else if (strncmp (argv[i], "-fopenmp-target=",
+			strlen ("-fopenmp-target=")) == 0)
+	fopenmp_target = true;
       else if (strcmp (argv[i], "-fopenacc") == 0)
 	fopenacc = true;
       else if (strcmp (argv[i], "-fPIC") == 0)
@@ -639,6 +643,15 @@
   if (!(fopenacc ^ fopenmp))
     fatal_error (input_location, "either %<-fopenacc%> or %<-fopenmp%> "
 		 "must be set");
+  if (fopenmp_target)
+    {
+      if (fopenacc)
+	fatal_error (input_location, "%<-fopenacc%> not compatible with "
+		     "%<-fopenmp-target=%>");
+      if (!fopenmp)
+	fatal_error (input_location, "%<-fopenmp-target=%> requires "
+		     "%<-fopenmp%>");
+    }
 
   struct obstack argv_obstack;
   obstack_init (&argv_obstack);
diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index dfa08ec..a86514b 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -50,6 +50,7 @@
 extern void nvptx_expand_oacc_fork (unsigned);
 extern void nvptx_expand_oacc_join (unsigned);
 extern void nvptx_expand_call (rtx, rtx);
+extern void nvptx_expand_omp_get_num_threads (rtx);
 extern rtx nvptx_gen_shuffle (rtx, rtx, rtx, nvptx_shuffle_kind);
 extern rtx nvptx_expand_compare (rtx);
 extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
@@ -63,5 +64,6 @@
 extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int);
 extern bool nvptx_mem_local_p (rtx);
 extern bool nvptx_mem_maybe_shared_p (const_rtx);
+extern bool nvptx_mem_shared_p (const_rtx);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 9c284ed..3b2bfd3 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -176,6 +176,9 @@
 static GTY(()) rtx gang_private_shared_sym;
 static hash_map<tree_decl_hash, unsigned int> gang_private_shared_hmap;
 
+static GTY(()) rtx omp_num_threads_sym;
+static unsigned omp_num_threads_align;
+
 /* Global lock variable, needed for 128bit worker & gang reductions.  */
 static GTY(()) tree global_lock_var;
 
@@ -187,6 +190,9 @@
 static bool need_unisimt_decl;
 static bool have_unisimt_decl;
 
+/* True if any function references __nvptx_omp_num_threads.  */
+static bool need_omp_num_threads;
+
 static int nvptx_mach_max_workers ();
 
 /* Allocate a new, cleared machine_function structure.  */
@@ -393,6 +399,10 @@
   SET_SYMBOL_DATA_AREA (gang_private_shared_sym, DATA_AREA_SHARED);
   gang_private_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
+  omp_num_threads_sym = gen_rtx_SYMBOL_REF (Pmode, "__nvptx_omp_num_threads");
+  SET_SYMBOL_DATA_AREA (omp_num_threads_sym, DATA_AREA_SHARED);
+  omp_num_threads_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
   diagnose_openacc_conflict (TARGET_GOMP, "-mgomp");
   diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack");
   diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt");
@@ -961,7 +971,8 @@
 {
   return (lookup_attribute ("kernel", attrs) != NULL_TREE
 	  || (lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE
-	      && lookup_attribute ("oacc function", attrs) != NULL_TREE));
+	      && (lookup_attribute ("oacc function", attrs) != NULL_TREE
+		  || lookup_attribute ("ompacc", attrs) != NULL_TREE)));
   /* For OpenMP target regions, the corresponding kernel entry is emitted from
      write_omp_entry as a separate function.  */
 }
@@ -1495,6 +1506,7 @@
 			DECL_ATTRIBUTES (decl)))
     force_public = true;
   if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))
+      && !lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))
       && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl)))
     {
       char *buf = (char *) alloca (strlen (name) + sizeof ("$impl"));
@@ -1548,7 +1560,7 @@
   HOST_WIDE_INT sz = get_frame_size ();
   bool need_frameptr = sz || cfun->machine->has_chain;
   int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT;
-  if (!TARGET_SOFT_STACK)
+  if (!TARGET_SOFT_STACK || lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)))
     {
       /* Declare a local var for outgoing varargs.  */
       if (cfun->machine->has_varadic)
@@ -1619,6 +1631,45 @@
     nvptx_init_unisimt_predicate (file);
   if (cfun->machine->bcast_partition || cfun->machine->sync_bar)
     nvptx_init_oacc_workers (file);
+
+  if (offloading_function_p ((tree) decl)
+      && lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))
+      && !lookup_attribute ("ompacc seq", DECL_ATTRIBUTES (decl)))
+    {
+      int nthr_regno = REGNO (cfun->machine->omp_fn_entry_num_threads_reg);
+      if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)))
+	{
+	  fprintf (file, "\t{\n");
+	  if (cfun->machine->omp_parallel_predicate)
+	    {
+	      /* Borrow num-threads regno as temp register.  */
+	      fprintf (file, "\t\tmov.u32 %%r%d, %%tid.x;\n", nthr_regno);
+	      fprintf (file, "\t\tsetp.ne.u32 %%r%d, %%r%d, 0;\n",
+		       REGNO (cfun->machine->omp_parallel_predicate), nthr_regno);
+	    }
+	  fprintf (file, "\t\tmov.u32 %%r%d, 1;\n", nthr_regno);
+	  fprintf (file, "\t\tst.shared.u32 [__nvptx_omp_num_threads], %%r%d;\n", nthr_regno);
+	  fprintf (file, "\t}\n");
+	  need_omp_num_threads = true;
+	}
+      else
+	{
+	  fprintf (file, "\t\tld.shared.u32 %%r%d, [__nvptx_omp_num_threads];\n", nthr_regno);
+	  if (cfun->machine->omp_parallel_predicate)
+	    {
+	      fprintf (file, "\t{\n");
+	      fprintf (file, "\t\t.reg.u32 %%tmp1;\n");
+	      fprintf (file, "\t\t.reg.pred %%not_parallel_mode, %%v1_lane;\n");
+	      fprintf (file, "\t\tsetp.eq.u32 %%not_parallel_mode, %%r%d, 1;\n", nthr_regno);
+	      fprintf (file, "\t\tmov.u32 %%tmp1, %%tid.x;\n");
+	      fprintf (file, "\t\tsetp.ne.u32 %%v1_lane, %%tmp1, 0;\n");
+	      fprintf (file, "\t\tand.pred %%r%d, %%not_parallel_mode, %%v1_lane;\n",
+		       REGNO (cfun->machine->omp_parallel_predicate));
+	      fprintf (file, "\t}\n");
+	      need_omp_num_threads = true;
+	    }
+	}
+    }
 }
 
 /* Output code for switching uniform-simt state.  ENTERING indicates whether
@@ -1736,6 +1787,10 @@
 const char *
 nvptx_output_set_softstack (unsigned src_regno)
 {
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+      && lookup_attribute ("ompacc",
+			   DECL_ATTRIBUTES (current_function_decl)))
+    return "";
   if (cfun->machine->has_softstack && !crtl->is_leaf)
     {
       fprintf (asm_out_file, "\tst.shared.u%d\t[%s], ",
@@ -1854,20 +1909,29 @@
 	  if (DECL_STATIC_CHAIN (decl))
 	    cfun->machine->has_chain = true;
 
-	  tree attr = oacc_get_fn_attrib (decl);
-	  if (attr)
+	  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
 	    {
-	      tree dims = TREE_VALUE (attr);
-
-	      parallel = GOMP_DIM_MASK (GOMP_DIM_MAX) - 1;
-	      for (int ix = 0; ix != GOMP_DIM_MAX; ix++)
+	      if (lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))
+		  && !lookup_attribute ("ompacc seq", DECL_ATTRIBUTES (decl)))
+		parallel = GOMP_DIM_MASK (GOMP_DIM_VECTOR);
+	    }
+	  else
+	    {
+	      tree attr = oacc_get_fn_attrib (decl);
+	      if (attr)
 		{
-		  if (TREE_PURPOSE (dims)
-		      && !integer_zerop (TREE_PURPOSE (dims)))
-		    break;
-		  /* Not on this axis.  */
-		  parallel ^= GOMP_DIM_MASK (ix);
-		  dims = TREE_CHAIN (dims);
+		  tree dims = TREE_VALUE (attr);
+
+		  parallel = GOMP_DIM_MASK (GOMP_DIM_MAX) - 1;
+		  for (int ix = 0; ix != GOMP_DIM_MAX; ix++)
+		    {
+		      if (TREE_PURPOSE (dims)
+			  && !integer_zerop (TREE_PURPOSE (dims)))
+			break;
+		      /* Not on this axis.  */
+		      parallel ^= GOMP_DIM_MASK (ix);
+		      dims = TREE_CHAIN (dims);
+		    }
 		}
 	    }
 	}
@@ -1930,15 +1994,27 @@
 void
 nvptx_expand_oacc_fork (unsigned mode)
 {
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+    mode = GOMP_DIM_VECTOR;
   nvptx_emit_forking (GOMP_DIM_MASK (mode), false);
 }
 
 void
 nvptx_expand_oacc_join (unsigned mode)
 {
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+    mode = GOMP_DIM_VECTOR;
   nvptx_emit_joining (GOMP_DIM_MASK (mode), false);
 }
 
+void
+nvptx_expand_omp_get_num_threads (rtx target)
+{
+  rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym);
+  emit_insn (gen_rtx_SET (target, mem));
+  need_omp_num_threads = true;
+}
+
 /* Generate instruction(s) to unpack a 64 bit object into 2 32 bit
    objects.  */
 
@@ -2879,6 +2955,13 @@
   return area == DATA_AREA_SHARED || area == DATA_AREA_GENERIC;
 }
 
+bool
+nvptx_mem_shared_p (const_rtx x)
+{
+  nvptx_data_area area = nvptx_mem_data_area (x);
+  return area == DATA_AREA_SHARED;
+}
+
 /* Print an operand, X, to FILE, with an optional modifier in CODE.
 
    Meaning of CODE:
@@ -3483,6 +3566,11 @@
 static int ATTRIBUTE_UNUSED
 nvptx_mach_max_workers ()
 {
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+      && lookup_attribute ("ompacc",
+			   DECL_ATTRIBUTES (current_function_decl)))
+    return 1;
+
   if (!cfun->machine->axis_dim_init_p)
     init_axis_dim ();
   return cfun->machine->axis_dim[MACH_MAX_WORKERS];
@@ -3491,6 +3579,11 @@
 static int ATTRIBUTE_UNUSED
 nvptx_mach_vector_length ()
 {
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+      && lookup_attribute ("ompacc",
+			   DECL_ATTRIBUTES (current_function_decl)))
+    return 32;
+
   if (!cfun->machine->axis_dim_init_p)
     init_axis_dim ();
   return cfun->machine->axis_dim[MACH_VECTOR_LENGTH];
@@ -4873,11 +4966,27 @@
   rtx_insn *tail = BB_END (to);
   unsigned skip_mask = mask;
 
+  rtx_insn *join = NULL;
+  rtx_insn *fork = NULL;
+
   while (true)
     {
       /* Find first insn of from block.  */
-      while (head != BB_END (from) && !needs_neutering_p (head))
-	head = NEXT_INSN (head);
+      while (true)
+	{
+	  if (INSN_P (head)
+	      && recog_memoized (head) == CODE_FOR_nvptx_join)
+	    {
+	      /* Record join if we see it.  */
+	      gcc_assert (!join);
+	      join = head;
+	    }
+
+	  if (head != BB_END (from) && !needs_neutering_p (head))
+	    head = NEXT_INSN (head);
+	  else
+	    break;
+	}
 
       if (from == to)
 	break;
@@ -4895,8 +5004,46 @@
 
   /* Find last insn of to block */
   rtx_insn *limit = from == to ? head : BB_HEAD (to);
-  while (tail != limit && !INSN_P (tail) && !LABEL_P (tail))
-    tail = PREV_INSN (tail);
+  while (true)
+    {
+      if (INSN_P (tail)
+	  && recog_memoized (tail) == CODE_FOR_nvptx_fork)
+	{
+	  /* Record join if we see it.  */
+	  gcc_assert (!fork);
+	  fork = tail;
+	}
+
+      if (tail != limit && !INSN_P (tail) && !LABEL_P (tail))
+	tail = PREV_INSN (tail);
+      else
+	break;
+    }
+
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+    {
+      if (join
+	  /* We do not set/restore parallel state across function calls.  */
+	  && !(INTVAL (XVECEXP (PATTERN (join), 0, 0)) & (1 << GOMP_DIM_MAX)))
+	{
+	  rtx reg = cfun->machine->omp_fn_entry_num_threads_reg;
+	  rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym);
+	  emit_insn_before (gen_nvptx_omp_parallel_join (mem, reg), head);
+	  need_omp_num_threads = true;
+	  head = PREV_INSN (head);
+	}
+
+      if (fork
+	  /* We do not set/restore parallel state across function calls.  */
+	  && !(INTVAL (XVECEXP (PATTERN (fork), 0, 0)) & (1 << GOMP_DIM_MAX)))
+	{
+	  rtx reg = gen_reg_rtx (SImode);
+	  rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym);
+	  emit_insn_before (gen_get_ntid (reg), tail);
+	  emit_insn_before (gen_nvptx_omp_parallel_fork (mem, reg), tail);
+	  need_omp_num_threads = true;
+	}
+    }
 
   /* Detect if tail is a branch.  */
   rtx tail_branch = NULL_RTX;
@@ -4943,16 +5090,31 @@
     if (GOMP_DIM_MASK (mode) & skip_mask)
       {
 	rtx_code_label *label = gen_label_rtx ();
-	rtx pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER];
 	rtx_insn **mode_jump
 	  = mode == GOMP_DIM_VECTOR ? &vector_jump : &worker_jump;
 	rtx_insn **mode_label
 	  = mode == GOMP_DIM_VECTOR ? &vector_label : &worker_label;
 
-	if (!pred)
+	rtx pred;
+
+	if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+	    && mode == GOMP_DIM_VECTOR)
 	  {
-	    pred = gen_reg_rtx (BImode);
-	    cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred;
+	    pred = cfun->machine->omp_parallel_predicate;
+	    if (!pred)
+	      {
+		pred = gen_reg_rtx (BImode);
+		cfun->machine->omp_parallel_predicate = pred;
+	      }
+	  }
+	else
+	  {
+	    pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER];
+	    if (!pred)
+	      {
+		pred = gen_reg_rtx (BImode);
+		cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred;
+	      }
 	  }
 
 	rtx br;
@@ -5067,7 +5229,38 @@
 	  rtx tmp = gen_reg_rtx (BImode);
 	  emit_insn_before (gen_movbi (tmp, const0_rtx),
 			    bb_first_real_insn (from));
-	  emit_insn_before (gen_rtx_SET (tmp, pvar), label);
+
+	  if(flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+	    {
+	      rtx nthr = cfun->machine->omp_fn_entry_num_threads_reg;
+	      rtx single_p = gen_reg_rtx (BImode);
+
+	      rtx_code_label *lbl_copy_tmp_pvar = gen_label_rtx ();
+	      LABEL_NUSES (lbl_copy_tmp_pvar) = 1;
+
+	      rtx_insn *lbl_fallthru = NEXT_INSN (tail);
+	      gcc_assert (lbl_fallthru);
+	      if (!LABEL_P (lbl_fallthru))
+		{
+		  rtx_code_label *nlbl = gen_label_rtx ();
+		  LABEL_NUSES (nlbl) = 1;
+		  emit_label_before (nlbl, lbl_fallthru);
+		  lbl_fallthru = nlbl;
+		}
+	      emit_insn_before
+		(gen_rtx_SET (single_p,
+			      gen_rtx_EQ (BImode, nthr, GEN_INT (1))),
+		 label);
+	      emit_insn_before
+		(gen_br_true (single_p, lbl_copy_tmp_pvar), label);
+	      emit_jump_insn_before (copy_rtx (tail_branch), label);
+	      emit_insn_before (gen_jump (lbl_fallthru), label);
+	      emit_label_before (lbl_copy_tmp_pvar, label);
+	      emit_insn_before (gen_rtx_SET (tmp, pvar), label);
+	    }
+	  else
+	    emit_insn_before (gen_rtx_SET (tmp, pvar), label);
+
 	  emit_insn_before (gen_rtx_SET (pvar, tmp), tail);
 #endif
 	  emit_insn_before (nvptx_gen_warp_bcast (pvar), tail);
@@ -5826,10 +6019,29 @@
       delete pars;
     }
 
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+      && offloading_function_p (current_function_decl)
+      && lookup_attribute ("ompacc",
+			   DECL_ATTRIBUTES (current_function_decl))
+      && !lookup_attribute ("ompacc seq",
+			    DECL_ATTRIBUTES (current_function_decl)))
+    {
+      cfun->machine->omp_fn_entry_num_threads_reg = gen_reg_rtx (SImode);
+
+      /* Discover & process partitioned regions.  */
+      parallel *pars = nvptx_discover_pars (&bb_insn_map);
+      nvptx_process_pars (pars);
+      nvptx_neuter_pars (pars, GOMP_DIM_MASK (GOMP_DIM_VECTOR), 0);
+      delete pars;
+    }
+
   /* Replace subregs.  */
   nvptx_reorg_subreg ();
 
-  if (TARGET_UNIFORM_SIMT)
+  if (TARGET_UNIFORM_SIMT
+      && (flag_openmp_target != OMP_TARGET_MODE_OMPACC
+	  || !lookup_attribute ("ompacc",
+				DECL_ATTRIBUTES (current_function_decl))))
     nvptx_reorg_uniform_simt ();
 
 #if WORKAROUND_PTXJIT_BUG_2
@@ -6076,6 +6288,12 @@
       write_var_marker (asm_out_file, false, true, "__nvptx_uni");
       fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n");
     }
+  if (need_omp_num_threads)
+    {
+      write_var_marker (asm_out_file, false, true, "__nvptx_omp_num_threads");
+      fprintf (asm_out_file,
+	       ".extern .shared .u32 __nvptx_omp_num_threads;\n");
+    }
 }
 
 /* Expander for the shuffle builtins.  */
@@ -6732,6 +6950,9 @@
   tree arg = gimple_call_arg (call, 2);
   unsigned axis = TREE_INT_CST_LOW (arg);
 
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+    return true;
+
   /* We only care about worker and vector partitioning.  */
   if (axis < GOMP_DIM_WORKER)
     return false;
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index d815081..59580d2 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -267,6 +267,9 @@
      for per-lane storage in OpenMP SIMD regions.  */
   unsigned HOST_WIDE_INT simt_stack_size;
   unsigned HOST_WIDE_INT simt_stack_align;
+
+  rtx omp_parallel_predicate;
+  rtx omp_fn_entry_num_threads_reg;
 };
 #endif
 
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index d271265..1d1a857 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -80,6 +80,14 @@
    UNSPECV_SIMT_EXIT
 
    UNSPECV_RED_PART
+
+   UNSPECV_GET_TID
+   UNSPECV_GET_NTID
+   UNSPECV_GET_CTAID
+   UNSPECV_GET_NCTAID
+
+   UNSPECV_OMP_PARALLEL_FORK
+   UNSPECV_OMP_PARALLEL_JOIN
 ])
 
 (define_attr "subregs_ok" "false,true"
@@ -123,6 +131,12 @@
           : immediate_operand (op, mode));
 })
 
+(define_predicate "nvptx_shared_mem_operand"
+  (match_code "mem")
+{
+  return nvptx_mem_shared_p (op);
+})
+
 (define_predicate "const0_operand"
   (and (match_code "const_int")
        (match_test "op == const0_rtx")))
@@ -1774,6 +1788,60 @@
   return asms[INTVAL (operands[1])];
 })
 
+(define_expand "gomp_barrier"
+  [(const_int 1)]
+  "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+{
+  emit_insn (gen_nvptx_barsync (GEN_INT (0), GEN_INT (0)));
+  DONE;
+})
+
+(define_expand "omp_get_num_threads"
+  [(match_operand 0 "nvptx_register_operand" "=R")]
+  "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+{
+  nvptx_expand_omp_get_num_threads (operands[0]);
+  DONE;
+})
+
+(define_insn "omp_get_num_teams"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+        (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_NCTAID))]
+  "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+  "%.\\tmov.u32\\t%0, %%nctaid.x;")
+
+(define_insn "omp_get_thread_num"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+        (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_TID))]
+  "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+  "%.\\tmov.u32\\t%0, %%tid.x;")
+
+(define_insn "omp_get_team_num"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unspec_volatile:SI [(const_int 0)] UNSPECV_GET_CTAID))]
+  "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+  "%.\\tmov.u32\\t%0, %%ctaid.x;")
+
+(define_insn "get_ntid"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unspec_volatile:SI [(const_int 0)] UNSPECV_GET_NTID))]
+  "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+  "%.\\tmov.u32\\t%0, %%ntid.x;")
+
+(define_insn "nvptx_omp_parallel_fork"
+  [(set (match_operand:SI 0 "nvptx_shared_mem_operand" "=m")
+	(unspec_volatile:SI [(match_operand:SI 1 "nvptx_register_operand" "R")]
+			    UNSPECV_OMP_PARALLEL_FORK))]
+  "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+  "%.\\tst.shared.u32\\t%0, %1; //omp parallel fork")
+
+(define_insn "nvptx_omp_parallel_join"
+  [(set (match_operand:SI 0 "nvptx_shared_mem_operand" "=m")
+	(unspec_volatile:SI [(match_operand:SI 1 "nvptx_register_operand" "R")]
+			    UNSPECV_OMP_PARALLEL_JOIN))]
+  "flag_openmp_target == OMP_TARGET_MODE_OMPACC"
+  "%.\\tst.shared.u32\\t%0, %1; //omp parallel join")
+
 (define_insn "nvptx_fork"
   [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")]
 		       UNSPECV_FORK)]
diff --git a/gcc/expr.cc b/gcc/expr.cc
index e0a0b80..58a596e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10532,7 +10532,8 @@
       /* Allow accel compiler to handle variables that require special
 	 treatment, e.g. if they have been modified in some way earlier in
 	 compilation by the adjust_private_decl OpenACC hook.  */
-      if (flag_openacc && targetm.goacc.expand_var_decl)
+      if ((flag_openacc || flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+	  && targetm.goacc.expand_var_decl)
 	{
 	  temp = targetm.goacc.expand_var_decl (exp);
 	  if (temp)
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 2bfab98..518caad 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -519,6 +519,12 @@
   OMP_TARGET_SIMD_CLONE_ANY = 3
 };
 
+enum omp_target_mode_kind
+{
+  OMP_TARGET_MODE_DEFAULT = 0,
+  OMP_TARGET_MODE_OMPACC = 1
+};
+
 #endif
 
 #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 6a3bd68..4274c33 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -253,6 +253,7 @@
   bool order_concurrent;
   bool has_depend;
   bool in_for_exprs;
+  bool ompacc;
   int defaultmap[5];
   hash_map<tree, oacc_array_mapping_info> *decl_data_clause;
 };
@@ -11345,6 +11346,10 @@
 	case OMP_CLAUSE_USES_ALLOCATORS:
 	  break;
 
+	case OMP_CLAUSE__OMPACC_:
+	  ctx->ompacc = true;
+	  break;
+
 	case OMP_CLAUSE_ORDER:
 	  ctx->order_concurrent = true;
 	  break;
@@ -12657,6 +12662,7 @@
 	case OMP_CLAUSE_FINALIZE:
 	case OMP_CLAUSE_INCLUSIVE:
 	case OMP_CLAUSE_EXCLUSIVE:
+	case OMP_CLAUSE__OMPACC_:
 	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE_UNROLL_FULL:
 	case OMP_CLAUSE_UNROLL_NONE:
@@ -13250,6 +13256,21 @@
     }
 }
 
+/* Return true if in an omp_context in OMPACC mode.  */
+static bool
+gimplify_omp_ctx_ompacc_p (void)
+{
+  if (cgraph_node::get (current_function_decl)->offloadable
+      && lookup_attribute ("ompacc",
+			   DECL_ATTRIBUTES (current_function_decl)))
+    return true;
+  struct gimplify_omp_ctx *ctx;
+  for (ctx = gimplify_omp_ctxp; ctx; ctx = ctx->outer_context)
+    if (ctx->ompacc)
+      return true;
+  return false;
+}
+
 /* Gimplify the gross structure of an OMP_FOR statement.  */
 
 static enum gimplify_status
@@ -13281,6 +13302,18 @@
 	  *expr_p = NULL_TREE;
 	  return GS_ERROR;
 	}
+
+      if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+	  && gimplify_omp_ctx_ompacc_p ())
+	{
+	  gcc_assert (inner_for_stmt && TREE_CODE (for_stmt) == OMP_DISTRIBUTE);
+	  *expr_p = OMP_FOR_BODY (for_stmt);
+	  tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_GANG);
+	  OMP_CLAUSE_CHAIN (c) = OMP_FOR_CLAUSES (inner_for_stmt);
+	  OMP_FOR_CLAUSES (inner_for_stmt) = c;
+	  return GS_OK;
+	}
+
       if (data[2] && OMP_FOR_PRE_BODY (*data[2]))
 	{
 	  append_to_statement_list_force (OMP_FOR_PRE_BODY (*data[2]),
diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 3d57643..3c833fc 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -733,6 +733,7 @@
 	case OPT_fcommon:
 	case OPT_fgnu_tm:
 	case OPT_fopenmp:
+	case OPT_fopenmp_target_:
 	case OPT_fopenacc:
 	case OPT_fopenacc_dim_:
 	case OPT_foffload_abi_:
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index b3715b9..6d7e9d3 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -71,9 +71,9 @@
 DEF_GOMP_BUILTIN (BUILT_IN_OMP_IS_INITIAL_DEVICE, "omp_is_initial_device",
 		  BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_THREAD_NUM, "omp_get_thread_num",
-		  BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+		  BT_FN_INT, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_THREADS, "omp_get_num_threads",
-		  BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
+		  BT_FN_INT, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_TEAM_NUM, "omp_get_team_num",
 		  BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_TEAMS, "omp_get_num_teams",
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index afe0006..d7e61c1 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -1050,11 +1050,16 @@
 	     from within current function (this would be easy to check)
 	     or from some function it calls and gets passed an address
 	     of such a variable.  */
+	  gomp_parallel *parallel_stmt
+	    = as_a <gomp_parallel *> (last_stmt (region->entry));
+	  tree child_fun = gimple_omp_parallel_child_fn (parallel_stmt);
+
+	  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+	      && child_fun == NULL_TREE)
+	    any_addressable_vars = 0;
+
 	  if (any_addressable_vars < 0)
 	    {
-	      gomp_parallel *parallel_stmt
-		= as_a <gomp_parallel *> (last_stmt (region->entry));
-	      tree child_fun = gimple_omp_parallel_child_fn (parallel_stmt);
 	      tree local_decls, block, decl;
 	      unsigned ix;
 
@@ -7773,6 +7778,17 @@
       /* The SSA parallelizer does gang parallelism.  */
       gwv = build_int_cst (integer_type_node, GOMP_DIM_MASK (GOMP_DIM_GANG));
     }
+  else if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+    {
+      tree clauses = gimple_omp_for_clauses (for_stmt);
+      int omp_mask = 0;
+      if (omp_find_clause (clauses, OMP_CLAUSE_GANG))
+	omp_mask |= GOMP_DIM_MASK (GOMP_DIM_GANG);
+      if (omp_find_clause (clauses, OMP_CLAUSE_VECTOR))
+	omp_mask |= GOMP_DIM_MASK (GOMP_DIM_VECTOR);
+      gcc_assert (omp_mask);
+      gwv = build_int_cst (integer_type_node, omp_mask);
+    }
 
   if (fd->collapse > 1 || fd->tiling)
     {
@@ -9816,6 +9832,13 @@
     t = OMP_CLAUSE_THREAD_LIMIT_EXPR (c);
   else
     t = integer_minus_one_node;
+
+  /* Currently, OMPACC mode has a limitation of only one warp thread.  */
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+      && lookup_attribute
+           ("ompacc", DECL_ATTRIBUTES (gimple_omp_target_child_fn (tgt_stmt))))
+    t = integer_one_node;
+
   push_target_argument_according_to_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL,
 					   GOMP_TARGET_ARG_THREAD_LIMIT, t,
 					   &args);
@@ -10698,6 +10721,44 @@
       switch (region->type)
 	{
 	case GIMPLE_OMP_PARALLEL:
+	  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+	    {
+	      struct omp_region *r;
+	      for (r = region->outer; r; r = r->outer)
+		if (r->type == GIMPLE_OMP_TARGET)
+		  {
+		    gomp_target *tgt
+		      = as_a <gomp_target *> (last_stmt (r->entry));
+		    tree tgtfn_attrs
+		      = DECL_ATTRIBUTES (gimple_omp_target_child_fn (tgt));
+		    if (!lookup_attribute ("ompacc", tgtfn_attrs))
+		      r = NULL;
+		    break;
+		  }
+	      if (r != NULL
+		  || (lookup_attribute
+		      ("ompacc", DECL_ATTRIBUTES (current_function_decl))))
+		{
+		  gimple_stmt_iterator gsi;
+		  gsi = gsi_last_nondebug_bb (region->entry);
+		  gcc_assert (!gsi_end_p (gsi)
+			      && gimple_code
+			      (gsi_stmt (gsi)) == GIMPLE_OMP_PARALLEL);
+		  gsi_remove (&gsi, true);
+
+		  if (region->exit)
+		    {
+		      gsi = gsi_last_nondebug_bb (region->exit);
+		      gcc_assert (!gsi_end_p (gsi)
+				  && gimple_code
+				  (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN);
+		      gsi_remove (&gsi, true);
+		    }
+		  break;
+		}
+	    }
+	  /* Fallthrough.  */
+
 	case GIMPLE_OMP_TASK:
 	  expand_omp_taskreg (region);
 	  break;
diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index abaae12..9823535 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -202,8 +202,12 @@
   struct omp_for_data_loop dummy_loop;
   location_t loc = gimple_location (for_stmt);
   bool simd = gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_SIMD;
-  bool distribute = gimple_omp_for_kind (for_stmt)
-		    == GF_OMP_FOR_KIND_DISTRIBUTE;
+  bool distribute =
+    (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_DISTRIBUTE
+     || (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+	 && gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP
+	 && omp_find_clause (gimple_omp_for_clauses (for_stmt),
+			     OMP_CLAUSE_GANG)));
   bool taskloop = gimple_omp_for_kind (for_stmt)
 		  == GF_OMP_FOR_KIND_TASKLOOP;
   bool order_reproducible = false;
@@ -441,7 +445,8 @@
       loop->n2 = gimple_omp_for_final (for_stmt, i);
       gcc_assert (loop->cond_code != NE_EXPR
 		  || (gimple_omp_for_kind (for_stmt)
-		      != GF_OMP_FOR_KIND_OACC_LOOP));
+		      != GF_OMP_FOR_KIND_OACC_LOOP)
+		  || flag_openmp_target == OMP_TARGET_MODE_OMPACC);
       if (TREE_CODE (loop->n2) == TREE_VEC)
 	{
 	  if (loop->outer)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index bb4d148..9a569df 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -187,6 +187,10 @@
      than teams is strictly nested in it.  */
   bool nonteams_nested_p;
 
+  /* Indicates that context is in OMPACC mode, set after _ompacc_ internal
+     clauses are removed.  */
+  bool ompacc_p;
+
   /* Candidates for adjusting OpenACC privatization level.  */
   vec<tree> oacc_privatization_candidates;
 
@@ -2039,6 +2043,7 @@
 	case OMP_CLAUSE_TASK_REDUCTION:
 	case OMP_CLAUSE_ALLOCATE:
 	case OMP_CLAUSE_ALLOCATOR:
+	case OMP_CLAUSE__OMPACC_:
 	  break;
 
 	case OMP_CLAUSE_ALIGNED:
@@ -2263,6 +2268,7 @@
 	case OMP_CLAUSE_FILTER:
 	case OMP_CLAUSE__CONDTEMP_:
 	case OMP_CLAUSE_ALLOCATOR:
+	case OMP_CLAUSE__OMPACC_:
 	  break;
 
 	case OMP_CLAUSE__CACHE_:
@@ -2332,6 +2338,21 @@
   return false;
 }
 
+static bool
+ompacc_ctx_p (omp_context *ctx)
+{
+  if (cgraph_node::get (current_function_decl)->offloadable
+      && lookup_attribute ("ompacc",
+			   DECL_ATTRIBUTES (current_function_decl)))
+    return true;
+  for (; ctx; ctx = ctx->outer)
+    if (is_gimple_omp_offloaded (ctx->stmt))
+      return (ctx->ompacc_p
+	      || omp_find_clause (gimple_omp_target_clauses (ctx->stmt),
+				  OMP_CLAUSE__OMPACC_));
+  return false;
+}
+
 /* Build a decl for the omp child function.  It'll not contain a body
    yet, just the bare decl.  */
 
@@ -2641,8 +2662,28 @@
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx->record_type) = name;
   TYPE_ARTIFICIAL (ctx->record_type) = 1;
-  create_omp_child_function (ctx, false);
-  gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
+
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+      && ompacc_ctx_p (ctx))
+    {
+      tree data_name = get_identifier (".omp_data_i_par");
+      tree t = build_decl (gimple_location (stmt), VAR_DECL, data_name,
+			   ptr_type_node);
+      DECL_ARTIFICIAL (t) = 1;
+      DECL_NAMELESS (t) = 1;
+      DECL_CONTEXT (t) = current_function_decl;
+      DECL_SEEN_IN_BIND_EXPR_P (t) = 1;
+      DECL_CHAIN (t) = ctx->block_vars;
+      ctx->block_vars = t;
+      TREE_USED (t) = 1;
+      TREE_READONLY (t) = 1;
+      ctx->receiver_decl = t;
+    }
+  else
+    {
+      create_omp_child_function (ctx, false);
+      gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn);
+    }
 
   scan_sharing_clauses (gimple_omp_parallel_clauses (stmt), ctx);
   scan_omp (gimple_omp_body_ptr (stmt), ctx);
@@ -3565,6 +3606,24 @@
   scan_sharing_clauses (clauses, ctx, base_pointers_restrict);
   scan_omp (gimple_omp_body_ptr (stmt), ctx);
 
+  if (offloaded && flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+    {
+      for (tree *cp = gimple_omp_target_clauses_ptr (stmt); *cp;
+	   cp = &OMP_CLAUSE_CHAIN (*cp))
+	if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE__OMPACC_)
+	  {
+	    DECL_ATTRIBUTES (gimple_omp_target_child_fn (stmt))
+	      = tree_cons (get_identifier ("ompacc"), NULL_TREE,
+			   DECL_ATTRIBUTES (gimple_omp_target_child_fn (stmt)));
+	    /* Unlink and remove.  */
+	    *cp = OMP_CLAUSE_CHAIN (*cp);
+
+	    /* Set to true.  */
+	    ctx->ompacc_p = true;
+	    break;
+	  }
+    }
+
   if (TYPE_FIELDS (ctx->record_type) == NULL)
     ctx->record_type = ctx->receiver_decl = NULL;
   else
@@ -8947,6 +9006,9 @@
     gcc_unreachable ();
   else if (is_oacc_kernels_decomposed_part (tgt))
     ;
+  else if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+	   && is_omp_target (tgt->stmt))
+    ;
   else
     gcc_unreachable ();
 
@@ -8975,7 +9037,13 @@
 		 != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE);
     }
 
-  if (tag & OLF_TILE)
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+      && gimple_code (ctx->stmt) == GIMPLE_OMP_PARALLEL
+      && tgt
+      && ompacc_ctx_p (tgt))
+    levels = 1;
+  else
+    if (tag & OLF_TILE)
     /* Tiling could use all 3 levels.  */ 
     levels = 3;
   else
@@ -12460,6 +12528,23 @@
 
   push_gimplify_context ();
 
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx))
+    {
+      enum omp_clause_code code = OMP_CLAUSE_ERROR;
+      if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_FOR)
+	code = OMP_CLAUSE_VECTOR;
+      else if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_DISTRIBUTE)
+	code = OMP_CLAUSE_GANG;
+      if (code)
+	{
+	  /* Adjust into OACC loop kind with vector/gang clause.  */
+	  gimple_omp_for_set_kind (stmt, GF_OMP_FOR_KIND_OACC_LOOP);
+	  tree c = build_omp_clause (UNKNOWN_LOCATION, code);
+	  OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (stmt);
+	  gimple_omp_for_set_clauses (stmt, c);
+	}
+    }
+
   if (is_gimple_omp_oacc (ctx->stmt))
     oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt));
 
@@ -12481,7 +12566,9 @@
       gbind *inner_bind
 	= as_a <gbind *> (gimple_seq_first_stmt (omp_for_body));
       tree vars = gimple_bind_vars (inner_bind);
-      if (is_gimple_omp_oacc (ctx->stmt))
+      if (is_gimple_omp_oacc (ctx->stmt)
+	  || (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+	      && ompacc_ctx_p (ctx)))
 	oacc_privatization_scan_decl_chain (ctx, vars);
       gimple_bind_append_vars (new_stmt, vars);
       /* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't
@@ -12597,7 +12684,8 @@
   lower_omp (gimple_omp_body_ptr (stmt), ctx);
 
   gcall *private_marker = NULL;
-  if (is_gimple_omp_oacc (ctx->stmt)
+  if ((is_gimple_omp_oacc (ctx->stmt)
+       || (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx)))
       && !gimple_seq_empty_p (omp_for_body))
     private_marker = lower_oacc_private_marker (ctx);
 
@@ -12652,15 +12740,16 @@
   /* Once lowered, extract the bounds and clauses.  */
   omp_extract_for_data (stmt, &fd, NULL);
 
-  bool oacc_kernels_parloops = false;
-  if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
-      || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS)
-    oacc_kernels_parloops = ctx_in_oacc_kernels_region (ctx);
-  if (is_gimple_omp_oacc (ctx->stmt) && !oacc_kernels_parloops)
+  if (flag_openacc)
     {
-      lower_oacc_head_tail (gimple_location (stmt),
-			    gimple_omp_for_clauses (stmt), private_marker,
-			    NULL, NULL, &oacc_head, &oacc_tail, ctx);
+      bool oacc_kernels_parloops = false;
+      if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS
+	  || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS)
+	oacc_kernels_parloops = ctx_in_oacc_kernels_region (ctx);
+      if (is_gimple_omp_oacc (ctx->stmt) && !oacc_kernels_parloops)
+	lower_oacc_head_tail (gimple_location (stmt),
+			      gimple_omp_for_clauses (stmt), private_marker,
+			      NULL, NULL, &oacc_head, &oacc_tail, ctx);
     }
 
   /* Add OpenACC partitioning and reduction markers just before the loop.  */
@@ -13447,9 +13536,20 @@
     bind = gimple_build_bind (NULL, NULL, make_node (BLOCK));
   else
     bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind));
+
+  gimple_seq oacc_head = NULL, oacc_tail = NULL;
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+      && gimple_code (stmt) == GIMPLE_OMP_PARALLEL
+      && ompacc_ctx_p (ctx))
+    lower_oacc_head_tail (gimple_location (stmt), clauses,
+			  NULL, NULL, NULL, &oacc_head, &oacc_tail,
+			  ctx);
+
   gsi_replace (gsi_p, dep_bind ? dep_bind : bind, true);
   gimple_bind_add_seq (bind, ilist);
+  gimple_bind_add_seq (bind, oacc_head);
   gimple_bind_add_stmt (bind, stmt);
+  gimple_bind_add_seq (bind, oacc_tail);
   gimple_bind_add_seq (bind, olist);
 
   pop_gimplify_context (NULL);
@@ -15320,7 +15420,9 @@
       gimple_seq fork_seq = NULL;
       gimple_seq join_seq = NULL;
 
-      if (offloaded && is_gimple_omp_oacc (ctx->stmt))
+      if (offloaded && (is_gimple_omp_oacc (ctx->stmt)
+			|| (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+			    && ompacc_ctx_p (ctx))))
 	{
 	  /* If there are reductions on the offloaded region itself, treat
 	     them as a dummy GANG loop.  */
@@ -15456,6 +15558,22 @@
   lower_omp (gimple_omp_body_ptr (teams_stmt), ctx);
   lower_reduction_clauses (gimple_omp_teams_clauses (teams_stmt), &olist,
 			   NULL, ctx);
+
+  if (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx))
+    {
+      /* Forward the team/gang-wide variables to outer target region.  */
+      struct omp_context *tgt = ctx;
+      while (tgt && !is_gimple_omp_offloaded (tgt->stmt))
+	tgt = tgt->outer;
+      if (tgt)
+	{
+	  int i;
+	  tree decl;
+	  FOR_EACH_VEC_ELT (ctx->oacc_privatization_candidates, i, decl)
+	    tgt->oacc_privatization_candidates.safe_push (decl);
+	}
+    }
+
   gimple_seq_add_stmt (&bind_body, teams_stmt);
 
   gimple_seq_add_seq (&bind_body, gimple_omp_body (teams_stmt));
@@ -15620,7 +15738,9 @@
 		 ctx);
       break;
     case GIMPLE_BIND:
-      if (ctx && is_gimple_omp_oacc (ctx->stmt))
+      if (ctx && (is_gimple_omp_oacc (ctx->stmt)
+		  || (flag_openmp_target == OMP_TARGET_MODE_OMPACC
+		      && ompacc_ctx_p (ctx))))
 	{
 	  tree vars = gimple_bind_vars (as_a <gbind *> (stmt));
 	  oacc_privatization_scan_decl_chain (ctx, vars);
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index b18f28f..9dae07c 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -388,6 +388,269 @@
   lang_hooks.decls.omp_finish_decl_inits ();
 }
 
+static bool ompacc_supported_clauses_p (tree clauses)
+{
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    switch (OMP_CLAUSE_CODE (c))
+      {
+      case OMP_CLAUSE_COLLAPSE:
+      case OMP_CLAUSE_NOWAIT:
+	continue;
+      default:
+	return false;
+      }
+  return true;
+}
+
+struct target_region_data
+{
+  tree func_decl;
+  bool has_omp_for;
+  bool has_omp_parallel;
+  bool ompacc_invalid;
+  auto_vec<const char *> warning_msgs;
+  auto_vec<location_t> warning_locs;
+  target_region_data (void)
+    : func_decl (NULL_TREE),
+      has_omp_for (false), has_omp_parallel (false), ompacc_invalid (false),
+      warning_msgs (), warning_locs () {}
+};
+
+static tree scan_omp_target_region_r (tree *, int *, void *);
+
+static void
+scan_fndecl_for_ompacc (tree decl, target_region_data *tgtdata)
+{
+  target_region_data td;
+  td.func_decl = decl;
+  walk_tree_without_duplicates (&DECL_SAVED_TREE (decl),
+				scan_omp_target_region_r, &td);
+  tree v;
+  if ((v = lookup_attribute ("omp declare variant base",
+			     DECL_ATTRIBUTES (decl)))
+      || (v = lookup_attribute ("omp declare variant variant",
+				DECL_ATTRIBUTES (decl))))
+    {
+      td.ompacc_invalid = true;
+      td.warning_msgs.safe_push ("declare variant not supported for OMPACC");
+      td.warning_locs.safe_push (EXPR_LOCATION (v));
+    }
+  if (tgtdata)
+    {
+      tgtdata->has_omp_for |= td.has_omp_for;
+      tgtdata->has_omp_parallel |= td.has_omp_parallel;
+      tgtdata->ompacc_invalid |= td.ompacc_invalid;
+      for (unsigned i = 0; i < td.warning_msgs.length (); i++)
+	tgtdata->warning_msgs.safe_push (td.warning_msgs[i]);
+      for (unsigned i = 0; i < td.warning_locs.length (); i++)
+	tgtdata->warning_locs.safe_push (td.warning_locs[i]);
+    }
+
+  if (!td.ompacc_invalid
+      && !lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)))
+    {
+      DECL_ATTRIBUTES (decl)
+	= tree_cons (get_identifier ("ompacc"), NULL_TREE,
+		     DECL_ATTRIBUTES (decl));
+      if (!td.has_omp_parallel)
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("ompacc seq"), NULL_TREE,
+		       DECL_ATTRIBUTES (decl));
+    }
+}
+
+static tree
+scan_omp_target_region_r (tree *tp, int *walk_subtrees, void *data)
+{
+  target_region_data *tgtdata = (target_region_data *) data;
+
+  if (TREE_CODE (*tp) == FUNCTION_DECL
+      && !(fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_THREAD_NUM)
+	   || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_NUM_THREADS)
+	   || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_TEAM_NUM)
+	   || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_NUM_TEAMS)
+	   || id_equal (DECL_NAME (*tp), "omp_get_thread_num")
+	   || id_equal (DECL_NAME (*tp), "omp_get_num_threads")
+	   || id_equal (DECL_NAME (*tp), "omp_get_team_num")
+	   || id_equal (DECL_NAME (*tp), "omp_get_num_teams"))
+      && *tp != tgtdata->func_decl)
+    {
+      tree decl = *tp;
+      symtab_node *node = symtab_node::get (*tp);
+      if (node)
+	{
+	  node = node->ultimate_alias_target ();
+	  decl = node->decl;
+	}
+
+      if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl))
+	{
+	  scan_fndecl_for_ompacc (decl, tgtdata);
+	}
+      else
+	{
+	  tgtdata->warning_msgs.safe_push ("referencing external function");
+	  tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+	  tgtdata->ompacc_invalid = true;
+	}
+      *walk_subtrees = 0;
+      return NULL_TREE;
+    }
+
+  switch (TREE_CODE (*tp))
+    {
+    case OMP_FOR:
+      if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp)))
+	{
+	  tgtdata->ompacc_invalid = true;
+	  tgtdata->warning_msgs.safe_push ("clauses not supported");
+	  tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+	}
+      else if (OMP_FOR_NON_RECTANGULAR (*tp))
+	{
+	  tgtdata->ompacc_invalid = true;
+	  tgtdata->warning_msgs.safe_push ("non-rectangular loops not supported");
+	  tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+	}
+      else
+	tgtdata->has_omp_for = true;
+      break;
+
+    case OMP_PARALLEL:
+      if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp)))
+	{
+	  tgtdata->ompacc_invalid = true;
+	  tgtdata->warning_msgs.safe_push ("clauses not supported");
+	  tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+	}
+      else
+	tgtdata->has_omp_parallel = true;
+      break;
+
+    case OMP_DISTRIBUTE:
+    case OMP_TEAMS:
+      if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp)))
+	{
+	  tgtdata->ompacc_invalid = true;
+	  tgtdata->warning_msgs.safe_push ("clauses not supported");
+	  tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+	}
+      /* Fallthru.  */
+
+    case OMP_ATOMIC:
+    case OMP_ATOMIC_READ:
+    case OMP_ATOMIC_CAPTURE_OLD:
+    case OMP_ATOMIC_CAPTURE_NEW:
+      break;
+
+    case OMP_SIMD:
+    case OMP_TASK:
+    case OMP_LOOP:
+    case OMP_TASKLOOP:
+    case OMP_TASKGROUP:
+    case OMP_SECTION:
+    case OMP_MASTER:
+    case OMP_MASKED:
+    case OMP_ORDERED:
+    case OMP_CRITICAL:
+    case OMP_SCAN:
+    case OMP_METADIRECTIVE:
+      tgtdata->ompacc_invalid = true;
+      tgtdata->warning_msgs.safe_push ("construct not supported");
+      tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+      *walk_subtrees = 0;
+      break;
+
+    case OMP_TARGET:
+      tgtdata->ompacc_invalid = true;
+      tgtdata->warning_msgs.safe_push ("nested target/reverse offload "
+				       "not supported");
+      tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp));
+      *walk_subtrees = 0;
+      break;
+
+    default:
+      break;
+    }
+  return NULL_TREE;
+}
+
+static tree
+scan_omp_target_construct_r (tree *tp, int *walk_subtrees,
+			     void *data)
+{
+  if (TREE_CODE (*tp) == OMP_TARGET)
+    {
+      target_region_data td;
+      td.func_decl = (tree) data;
+      walk_tree_without_duplicates (&OMP_TARGET_BODY (*tp),
+				    scan_omp_target_region_r, &td);
+      for (tree c = OMP_TARGET_CLAUSES (*tp); c; c = OMP_CLAUSE_CHAIN (c))
+	{
+	  switch (OMP_CLAUSE_CODE (c))
+	    {
+	    case OMP_CLAUSE_MAP:
+	      continue;
+	    default:
+	      td.ompacc_invalid = true;
+	      td.warning_msgs.safe_push ("clause not supported");
+	      td.warning_locs.safe_push (EXPR_LOCATION (c));
+	      break;
+	    }
+	  break;
+	}
+      if (!td.ompacc_invalid)
+	{
+	  tree c = build_omp_clause (EXPR_LOCATION (*tp), OMP_CLAUSE__OMPACC_);
+	  if (!td.has_omp_parallel)
+	    OMP_CLAUSE__OMPACC__SEQ (c) = 1;
+	  OMP_CLAUSE_CHAIN (c) = OMP_TARGET_CLAUSES (*tp);
+	  OMP_TARGET_CLAUSES (*tp) = c;
+	}
+      else
+	{
+	  warning_at (EXPR_LOCATION (*tp), 0, "Target region not suitable for "
+		      "OMPACC mode");
+	  for (unsigned i = 0; i < td.warning_locs.length (); i++)
+	    warning_at (td.warning_locs[i], 0, td.warning_msgs[i]);
+	}
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+void
+omp_ompacc_attribute_tagging (void)
+{
+  cgraph_node *node;
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (DECL_SAVED_TREE (node->decl))
+      {
+	if (DECL_STRUCT_FUNCTION (node->decl)
+	    && DECL_STRUCT_FUNCTION (node->decl)->has_omp_target)
+	  walk_tree_without_duplicates (&DECL_SAVED_TREE (node->decl),
+					scan_omp_target_construct_r,
+					node->decl);
+
+	for (cgraph_node *cgn = first_nested_function (node);
+	     cgn; cgn = next_nested_function (cgn))
+	  if (omp_declare_target_fn_p (cgn->decl))
+	    {
+	      scan_fndecl_for_ompacc (cgn->decl, NULL);
+
+	      if (lookup_attribute ("ompacc", DECL_ATTRIBUTES (cgn->decl))
+		  && !lookup_attribute ("noinline", DECL_ATTRIBUTES (cgn->decl)))
+		{
+		  DECL_ATTRIBUTES (cgn->decl)
+		    = tree_cons (get_identifier ("noinline"),
+				 NULL, DECL_ATTRIBUTES (cgn->decl));
+		  DECL_ATTRIBUTES (cgn->decl)
+		    = tree_cons (get_identifier ("noipa"),
+				 NULL, DECL_ATTRIBUTES (cgn->decl));
+		}
+	    }
+      }
+}
 
 /* Create new symbols containing (address, size) pairs for global variables,
    marked with "omp declare target" attribute, as well as addresses for the
@@ -480,6 +743,22 @@
 static tree
 oacc_dim_call (bool pos, int dim, gimple_seq *seq)
 {
+  if (flag_openmp && flag_openmp_target == OMP_TARGET_MODE_OMPACC)
+    {
+      enum built_in_function fn;
+      if (dim == GOMP_DIM_VECTOR)
+	fn = pos ? BUILT_IN_OMP_GET_THREAD_NUM : BUILT_IN_OMP_GET_NUM_THREADS;
+      else if (dim == GOMP_DIM_GANG)
+	fn = pos ? BUILT_IN_OMP_GET_TEAM_NUM : BUILT_IN_OMP_GET_NUM_TEAMS;
+      else
+	gcc_unreachable ();
+      tree size = create_tmp_var (integer_type_node);
+      gimple *call = gimple_build_call (builtin_decl_explicit (fn), 0);
+      gimple_call_set_lhs (call, size);
+      gimple_seq_add_stmt (seq, call);
+      return size;
+    }
+
   tree arg = build_int_cst (unsigned_type_node, dim);
   tree size = create_tmp_var (integer_type_node);
   enum internal_fn fn = pos ? IFN_GOACC_DIM_POS : IFN_GOACC_DIM_SIZE;
@@ -2776,15 +3055,19 @@
 static unsigned int
 execute_oacc_device_lower ()
 {
-  tree attrs = oacc_get_fn_attrib (current_function_decl);
-
-  if (!attrs)
-    /* Not an offloaded function.  */
-    return 0;
-
+  tree attrs;
   int dims[GOMP_DIM_MAX];
-  for (unsigned i = 0; i < GOMP_DIM_MAX; i++)
-    dims[i] = oacc_get_fn_dim_size (current_function_decl, i);
+
+  if (flag_openacc)
+    {
+      attrs = oacc_get_fn_attrib (current_function_decl);
+      if (!attrs)
+	/* Not an offloaded function.  */
+	return 0;
+
+      for (unsigned i = 0; i < GOMP_DIM_MAX; i++)
+	dims[i] = oacc_get_fn_dim_size (current_function_decl, i);
+    }
 
   hash_map<tree, tree> adjusted_vars;
 
@@ -2853,7 +3136,8 @@
 
 		case IFN_UNIQUE_OACC_FORK:
 		case IFN_UNIQUE_OACC_JOIN:
-		  if (integer_minus_onep (gimple_call_arg (call, 2)))
+		  if (flag_openacc
+		      && integer_minus_onep (gimple_call_arg (call, 2)))
 		    remove = true;
 		  else if (!targetm.goacc.fork_join
 			   (call, dims, kind == IFN_UNIQUE_OACC_FORK))
@@ -3150,7 +3434,8 @@
   /* TODO If this were gated on something like '!(fun->curr_properties &
      PROP_gimple_oaccdevlow)', then we could easily have several instances
      in the pass pipeline? */
-  virtual bool gate (function *) { return flag_openacc; };
+  virtual bool gate (function *)
+  { return flag_openacc || (flag_openmp && flag_openmp_target == OMP_TARGET_MODE_OMPACC); };
 
   virtual unsigned int execute (function *)
     {
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index f6556af..581893f 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -31,6 +31,7 @@
 
 extern void omp_finish_file (void);
 extern void omp_discover_implicit_declare_target (void);
+extern void omp_ompacc_attribute_tagging (void);
 extern tree oacc_extract_loop_call (gcall *call);
 
 
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 3fbfca9..019ec97 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -1393,6 +1393,14 @@
     }
 
 
+  if (opts_set->x_flag_openmp_target)
+    {
+      if (opts->x_flag_openacc)
+	error ("%<-fopenacc%> not compatible with %<-fopenmp-target=%>");
+      if (!opts->x_flag_openmp)
+	error ("%<-fopenmp-target=%> requires %<-fopenmp%> setting");
+    }
+
   diagnose_options (opts, opts_set, loc);
 }
 
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index de8c009..e146140 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -68,6 +68,11 @@
 DEF_TARGET_INSN (oacc_dim_size, (rtx x0, rtx x1))
 DEF_TARGET_INSN (oacc_fork, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (oacc_join, (rtx x0, rtx x1, rtx x2))
+DEF_TARGET_INSN (gomp_barrier, (void))
+DEF_TARGET_INSN (omp_get_thread_num, (rtx x0))
+DEF_TARGET_INSN (omp_get_num_threads, (rtx x0))
+DEF_TARGET_INSN (omp_get_team_num, (rtx x0))
+DEF_TARGET_INSN (omp_get_num_teams, (rtx x0))
 DEF_TARGET_INSN (omp_simt_enter, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (omp_simt_exit, (rtx x0))
 DEF_TARGET_INSN (omp_simt_lane, (rtx x0))
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 7a5a87a..fcfa87a0 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -498,6 +498,10 @@
      loop or not.  */
   OMP_CLAUSE__SIMT_,
 
+  /* Internally used only clause, flag whether this is an "ompacc"
+     target region or not.  */
+  OMP_CLAUSE__OMPACC_,
+
   /* OpenACC clause: independent.  */
   OMP_CLAUSE_INDEPENDENT,
 
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index 777f85f..c6b23ef 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -1491,6 +1491,7 @@
 	case OMP_CLAUSE_BIND:
 	case OMP_CLAUSE__CONDTEMP_:
 	case OMP_CLAUSE__SCANTEMP_:
+	case OMP_CLAUSE__OMPACC_:
 	  break;
 
 	  /* The following clause belongs to the OpenACC cache directive, which
@@ -2287,6 +2288,7 @@
 	case OMP_CLAUSE_BIND:
 	case OMP_CLAUSE__CONDTEMP_:
 	case OMP_CLAUSE__SCANTEMP_:
+	case OMP_CLAUSE__OMPACC_:
 	  break;
 
 	  /* The following clause belongs to the OpenACC cache directive, which
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 30c8f7b..df1e860 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -1421,6 +1421,12 @@
       pp_string (pp, "_simt_");
       break;
 
+    case OMP_CLAUSE__OMPACC_:
+      pp_string (pp, "_ompacc_");
+      if (OMP_CLAUSE__OMPACC__SEQ (clause))
+	pp_string (pp, "(seq)");
+      break;
+
     case OMP_CLAUSE_GANG:
       pp_string (pp, "gang");
       if (OMP_CLAUSE_GANG_EXPR (clause) != NULL_TREE)
diff --git a/gcc/tree-ssa-loop.cc b/gcc/tree-ssa-loop.cc
index b7a5a0f..ee651ce 100644
--- a/gcc/tree-ssa-loop.cc
+++ b/gcc/tree-ssa-loop.cc
@@ -282,6 +282,11 @@
 
   /* opt_pass methods: */
   virtual bool gate (function *fn) {
+    if (flag_openmp
+	&& flag_openmp_target == OMP_TARGET_MODE_OMPACC
+	&& lookup_attribute ("ompacc", DECL_ATTRIBUTES (fn->decl)))
+      return true;
+
     if (!flag_openacc)
       return false;
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index aed566f..0192fe3 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -340,6 +340,7 @@
   1, /* OMP_CLAUSE_FILTER  */
   1, /* OMP_CLAUSE__SIMDUID_  */
   0, /* OMP_CLAUSE__SIMT_  */
+  0, /* OMP_CLAUSE__OMPACC_  */
   0, /* OMP_CLAUSE_INDEPENDENT  */
   1, /* OMP_CLAUSE_WORKER  */
   1, /* OMP_CLAUSE_VECTOR  */
@@ -437,6 +438,7 @@
   "filter",
   "_simduid_",
   "_simt_",
+  "_ompacc_",
   "independent",
   "worker",
   "vector",
diff --git a/gcc/tree.h b/gcc/tree.h
index 177f350..0b917d0 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1918,6 +1918,9 @@
 #define OMP_CLAUSE__SIMDUID__DECL(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__SIMDUID_), 0)
 
+#define OMP_CLAUSE__OMPACC__SEQ(NODE) \
+  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__OMPACC_)->base.public_flag)
+
 #define OMP_CLAUSE_SCHEDULE_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE)->omp_clause.subcode.schedule_kind)
 
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index b30b8df..146907a 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -34,6 +34,9 @@
 struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
 int __gomp_team_num __attribute__((shared,nocommon));
 
+/* Number of active target threads in team, used in ACC mode.  */
+unsigned int __nvptx_omp_num_threads __attribute__((shared,nocommon));
+
 static void gomp_thread_start (struct gomp_thread_pool *);
 
 /* There should be some .shared space reserved for us.  There's no way to
diff --git a/libgomp/testsuite/libgomp.c-c++-common/for-17.c b/libgomp/testsuite/libgomp.c-c++-common/for-17.c
new file mode 100644
index 0000000..9771aaf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/for-17.c
@@ -0,0 +1,69 @@
+/* { dg-options "-fopenmp-target=acc" } */
+/* { dg-additional-options "-std=gnu99" { target c } } */
+
+#define M(x, y, z) O(x, y, z)
+#define O(x, y, z) x ## _ ## y ## _ ## z
+
+#define DO_PRAGMA(x) _Pragma (#x)
+
+#undef OMPFROM
+#undef OMPTO
+#define OMPFROM(v) DO_PRAGMA (omp target update from(v))
+#define OMPTO(v) DO_PRAGMA (omp target update to(v))
+
+#pragma omp declare target
+
+#define OMPTGT DO_PRAGMA (omp target)
+#define F parallel for
+#define G pf
+#define S
+#define N(x) M(x, G, ompacc)
+#include "for-2.h"
+#undef S
+#undef N
+#undef F
+#undef G
+#undef OMPTGT
+
+#pragma omp end declare target
+
+#define F target parallel for
+#define G tpf
+#define S
+#define N(x) M(x, G, ompacc)
+#include "for-2.h"
+#undef S
+#undef N
+#undef F
+#undef G
+
+#define F target teams distribute
+#define G ttd
+#define S
+#define N(x) M(x, G, ompacc)
+#include "for-2.h"
+#undef S
+#undef N
+#undef F
+#undef G
+
+#define F target teams distribute parallel for
+#define G ttdpf
+#define S
+#define N(x) M(x, G, ompacc)
+#include "for-2.h"
+#undef S
+#undef N
+#undef F
+#undef G
+
+int
+main ()
+{
+  if (test_pf_ompacc ()
+      || test_tpf_ompacc ()
+      || test_ttd_ompacc ()
+      || test_ttdpf_ompacc ())
+    __builtin_abort ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/for-18.c b/libgomp/testsuite/libgomp.c-c++-common/for-18.c
new file mode 100644
index 0000000..2486d3a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/for-18.c
@@ -0,0 +1,5 @@
+/* { dg-options "-fopenmp-target=acc" } */
+/* { dg-additional-options "-std=gnu99" {target c } } */
+
+#define CONDNE
+#include "for-17.c"