Use OpenACC code to process OpenMP target regions

This is a backport of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619003.html

This patch implements '-fopenmp-target=acc', which enables internally handling
a subset of OpenMP target regions as OpenACC parallel regions. This basically
includes target, teams, parallel, distribute, for/do constructs, and atomics.

Essentially, we adjust the internal kinds to OpenACC type, and let OpenACC code
paths handle them, with various needed adjustments throughout middle-end and
nvptx backend. When using this "OMPACC" mode, if there are cases the patch
doesn't handle, it issues a warning, and reverts to normal processing for that
target region.

gcc/ChangeLog:

	* builtins.cc (expand_builtin_omp_builtins): New function.
	(expand_builtin): Add expand cases for BUILT_IN_GOMP_BARRIER,
	BUILT_IN_OMP_GET_THREAD_NUM, BUILT_IN_OMP_GET_NUM_THREADS,
	BUILT_IN_OMP_GET_TEAM_NUM, and BUILT_IN_OMP_GET_NUM_TEAMS using
	expand_builtin_omp_builtins, enabled under -fopenmp-target=acc.
	* cgraphunit.cc (analyze_functions): Add call to
	omp_ompacc_attribute_tagging, enabled under -fopenmp-target=acc.
	* common.opt (fopenmp-target=): Add new option and enums.
	* config/nvptx/mkoffload.cc (main): Handle -fopenmp-target=.
	* config/nvptx/nvptx-protos.h (nvptx_expand_omp_get_num_threads): New
	prototype.
	(nvptx_mem_shared_p): Likewise.
	* config/nvptx/nvptx.cc (omp_num_threads_sym): New global static RTX
	symbol for number of threads in team.
	(omp_num_threads_align): New var for alignment of omp_num_threads_sym.
	(need_omp_num_threads): New bool for if any function references
	omp_num_threads_sym.
	(nvptx_option_override): Initialize omp_num_threads_sym/align.
	(write_as_kernel): Disable normal OpenMP kernel entry under OMPACC mode.
	(nvptx_declare_function_name): Disable shim function under OMPACC mode.
	Disable soft-stack under OMPACC mode. Add generation of neutering init
	code under OMPACC mode.
	(nvptx_output_set_softstack): Return "" under OMPACC mode.
	(nvptx_expand_call): Set parallelism to vector for function calls with
	"ompacc for" attached.
	(nvptx_expand_oacc_fork): Set mode to GOMP_DIM_VECTOR under OMPACC mode.
	(nvptx_expand_oacc_join): Likewise.
	(nvptx_expand_omp_get_num_threads): New function.
	(nvptx_mem_shared_p): New function.
	(nvptx_mach_max_workers): Return 1 under OMPACC mode.
	(nvptx_mach_vector_length): Return 32 under OMPACC mode.
	(nvptx_single): Add adjustments for OMPACC mode, which have
	parallel-construct fork/joins, and regions of code where neutering is
	dynamically determined.
	(nvptx_reorg): Enable neutering under OMPACC mode when "ompacc for"
	attribute is attached to function. Disable uniform-simt when under
	OMPACC mode.
	(nvptx_file_end): Write __nvptx_omp_num_threads out when needed.
	(nvptx_goacc_fork_join): Return true under OMPACC mode.
	* config/nvptx/nvptx.h (struct GTY(()) machine_function): Add
	omp_parallel_predicate and omp_fn_entry_num_threads_reg fields.
	* config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID,
	UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID,
	UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries.
	(nvptx_shared_mem_operand): New predicate.
	(gomp_barrier): New expand pattern.
	(omp_get_num_threads): New expand pattern.
	(omp_get_num_teams): New insn pattern.
	(omp_get_thread_num): Likewise.
	(omp_get_team_num): Likewise.
	(get_ntid): Likewise.
	(nvptx_omp_parallel_fork): Likewise.
	(nvptx_omp_parallel_join): Likewise.

	* flag-types.h (omp_target_mode_kind): New flag value enum.
	* gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field.
	(gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_.
	(gimplify_adjust_omp_clauses): Likewise.
	(gimplify_omp_ctx_ompacc_p): New function.
	(gimplify_omp_for): Handle combined loops under OMPACC.

	* lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_.
	* omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST.
	(BUILT_IN_OMP_GET_NUM_THREADS): Likewise.
	* omp-expand.cc (remove_exit_barrier): Disable addressable-var
	processing for parallel construct child functions under OMPACC mode.
	(expand_oacc_for): Add OMPACC mode handling.
	(get_target_arguments): Force thread_limit clause value to 1 under
	OMPACC mode.
	(expand_omp): Under OMPACC mode, avoid child function expanding of
	GIMPLE_OMP_PARALLEL.
	* omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode.
	* omp-low.cc (struct omp_context): Add 'bool ompacc_p' field.
	(scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_.
	(ompacc_ctx_p): New function.
	(scan_omp_parallel): Handle OMPACC mode, avoid creating child function.
	(scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target
	construct child function, remove OMP_CLAUSE__OMPACC_ clauses.
	(lower_oacc_head_mark): Handle OMPACC mode cases.
	(lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add
	vector/gang clauses as needed. Add other OMPACC handling.
	(lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case.
	(lower_omp_target): Do OpenACC gang privatization under OMPACC case.
	(lower_omp_teams): Forward OpenACC privatization variables to outer
	target region under OMPACC mode.
	(lower_omp_1): Do OpenACC gang privatization under OMPACC case for
	GIMPLE_BIND.
	* omp-offload.cc (ompacc_supported_clauses_p): New function.
	(struct target_region_data): New struct type for tree walk.
	(scan_fndecl_for_ompacc): New function.
	(scan_omp_target_region_r): New function.
	(scan_omp_target_construct_r): New function.
	(omp_ompacc_attribute_tagging): New function.
	(oacc_dim_call): Add OMPACC case handling.
	(execute_oacc_device_lower): Make parts explicitly only OpenACC enabled.
	(pass_oacc_device_lower::gate): Enable pass under OMPACC mode.
	* omp-offload.h (omp_ompacc_attribute_tagging): New prototype.
	* opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp
	and no -fopenacc.
	* target-insns.def (gomp_barrier): New defined insn pattern.
	(omp_get_thread_num): Likewise.
	(omp_get_num_threads): Likewise.
	(omp_get_team_num): Likewise.
	(omp_get_num_teams): Likewise.
	* tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry
	for internal clause.
	* tree-nested.cc (convert_nonlocal_omp_clauses): Handle
	OMP_CLAUSE__OMPACC_.
	* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_.
	* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry.
	(omp_clause_code_name): Likewise.
	* tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_.

	* tree-ssa-loop.cc (pass_oacc_only::gate): Enable pass under OMPACC
	mode cases.

libgomp/ChangeLog:

	* config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in
	shared memory.
29 files changed