|  | <chapter xmlns="http://docbook.org/ns/docbook" version="5.0" | 
|  | xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode"> | 
|  | <?dbhtml filename="parallel_mode.html"?> | 
|  |  | 
|  | <info><title>Parallel Mode</title> | 
|  | <keywordset> | 
|  | <keyword>C++</keyword> | 
|  | <keyword>library</keyword> | 
|  | <keyword>parallel</keyword> | 
|  | </keywordset> | 
|  | </info> | 
|  |  | 
|  |  | 
|  |  | 
|  | <para> The libstdc++ parallel mode is an experimental parallel | 
|  | implementation of many algorithms of the C++ Standard Library. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | Several of the standard algorithms, for instance | 
|  | <function>std::sort</function>, are made parallel using OpenMP | 
|  | annotations. These parallel mode constructs can be invoked by | 
|  | explicit source declaration or by compiling existing sources with a | 
|  | specific compiler flag. | 
|  | </para> | 
|  |  | 
|  | <note> | 
|  | <para> | 
|  | The parallel mode has not been kept up to date with recent C++ standards | 
|  | and so it only conforms to the C++03 requirements. | 
|  | That means that move-only predicates may not work with parallel mode | 
|  | algorithms, and for C++20 most of the algorithms cannot be used in | 
|  | <code>constexpr</code> functions. | 
|  | </para> | 
|  | <para> | 
|  | For C++17 and above there are new overloads of the standard algorithms | 
|  | which take an execution policy argument. You should consider using those | 
|  | instead of the non-standard parallel mode extensions. | 
|  | </para> | 
|  | </note> | 
|  |  | 
|  | <section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info> | 
|  |  | 
|  |  | 
|  | <para>The following library components in the include | 
|  | <filename class="headerfile">numeric</filename> are included in the parallel mode:</para> | 
|  | <itemizedlist> | 
|  | <listitem><para><function>std::accumulate</function></para></listitem> | 
|  | <listitem><para><function>std::adjacent_difference</function></para></listitem> | 
|  | <listitem><para><function>std::inner_product</function></para></listitem> | 
|  | <listitem><para><function>std::partial_sum</function></para></listitem> | 
|  | </itemizedlist> | 
|  |  | 
|  | <para>The following library components in the include | 
|  | <filename class="headerfile">algorithm</filename> are included in the parallel mode:</para> | 
|  | <itemizedlist> | 
|  | <listitem><para><function>std::adjacent_find</function></para></listitem> | 
|  | <listitem><para><function>std::count</function></para></listitem> | 
|  | <listitem><para><function>std::count_if</function></para></listitem> | 
|  | <listitem><para><function>std::equal</function></para></listitem> | 
|  | <listitem><para><function>std::find</function></para></listitem> | 
|  | <listitem><para><function>std::find_if</function></para></listitem> | 
|  | <listitem><para><function>std::find_first_of</function></para></listitem> | 
|  | <listitem><para><function>std::for_each</function></para></listitem> | 
|  | <listitem><para><function>std::generate</function></para></listitem> | 
|  | <listitem><para><function>std::generate_n</function></para></listitem> | 
|  | <listitem><para><function>std::lexicographical_compare</function></para></listitem> | 
|  | <listitem><para><function>std::mismatch</function></para></listitem> | 
|  | <listitem><para><function>std::search</function></para></listitem> | 
|  | <listitem><para><function>std::search_n</function></para></listitem> | 
|  | <listitem><para><function>std::transform</function></para></listitem> | 
|  | <listitem><para><function>std::replace</function></para></listitem> | 
|  | <listitem><para><function>std::replace_if</function></para></listitem> | 
|  | <listitem><para><function>std::max_element</function></para></listitem> | 
|  | <listitem><para><function>std::merge</function></para></listitem> | 
|  | <listitem><para><function>std::min_element</function></para></listitem> | 
|  | <listitem><para><function>std::nth_element</function></para></listitem> | 
|  | <listitem><para><function>std::partial_sort</function></para></listitem> | 
|  | <listitem><para><function>std::partition</function></para></listitem> | 
|  | <listitem><para><function>std::random_shuffle</function></para></listitem> | 
|  | <listitem><para><function>std::set_union</function></para></listitem> | 
|  | <listitem><para><function>std::set_intersection</function></para></listitem> | 
|  | <listitem><para><function>std::set_symmetric_difference</function></para></listitem> | 
|  | <listitem><para><function>std::set_difference</function></para></listitem> | 
|  | <listitem><para><function>std::sort</function></para></listitem> | 
|  | <listitem><para><function>std::stable_sort</function></para></listitem> | 
|  | <listitem><para><function>std::unique_copy</function></para></listitem> | 
|  | </itemizedlist> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info> | 
|  | <?dbhtml filename="parallel_mode_semantics.html"?> | 
|  |  | 
|  |  | 
|  | <para> The parallel mode STL algorithms are currently not exception-safe, | 
|  | i.e. user-defined functors must not throw exceptions. | 
|  | Also, the order of execution is not guaranteed for some functions, of course. | 
|  | Therefore, user-defined functors should not have any concurrent side effects. | 
|  | </para> | 
|  |  | 
|  | <para> Since the current GCC OpenMP implementation does not support | 
|  | OpenMP parallel regions in concurrent threads, | 
|  | it is not possible to call parallel STL algorithm in | 
|  | concurrent threads, either. | 
|  | It might work with other compilers, though.</para> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info> | 
|  | <?dbhtml filename="parallel_mode_using.html"?> | 
|  |  | 
|  |  | 
|  | <section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info> | 
|  |  | 
|  |  | 
|  | <para> | 
|  | Any use of parallel functionality requires additional compiler | 
|  | and runtime support, in particular support for OpenMP. Adding this support is | 
|  | not difficult: just compile your application with the compiler | 
|  | flag <literal>-fopenmp</literal>. This will link | 
|  | in <code>libgomp</code>, the | 
|  | <link xmlns:xlink="http://www.w3.org/1999/xlink" | 
|  | xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and | 
|  | Multi Processing Runtime Library</link>, | 
|  | whose presence is mandatory. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | In addition, hardware that supports atomic operations and a compiler | 
|  | capable of producing atomic operations is mandatory: GCC defaults to no | 
|  | support for atomic operations on some common hardware | 
|  | architectures. Activating atomic operations may require explicit | 
|  | compiler flags on some targets (like sparc and x86), such | 
|  | as <literal>-march=i686</literal>, | 
|  | <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See | 
|  | the GCC manual for more information. | 
|  | </para> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info> | 
|  |  | 
|  |  | 
|  | <para> | 
|  | To use the libstdc++ parallel mode, compile your application with | 
|  | the prerequisite flags as detailed above, and in addition | 
|  | add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all | 
|  | use of the standard (sequential) algorithms to the appropriate parallel | 
|  | equivalents. Please note that this doesn't necessarily mean that | 
|  | everything will end up being executed in a parallel manner, but | 
|  | rather that the heuristics and settings coded into the parallel | 
|  | versions will be used to determine if all, some, or no algorithms | 
|  | will be executed using parallel variants. | 
|  | </para> | 
|  |  | 
|  | <para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the | 
|  | sizes and behavior of standard class templates such as | 
|  | <function>std::search</function>, and therefore one can only link code | 
|  | compiled with parallel mode and code compiled without parallel mode | 
|  | if no instantiation of a container is passed between the two | 
|  | translation units. Parallel mode functionality has distinct linkage, | 
|  | and cannot be confused with normal mode symbols. | 
|  | </para> | 
|  | </section> | 
|  |  | 
|  | <section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info> | 
|  |  | 
|  |  | 
|  | <para>When it is not feasible to recompile your entire application, or | 
|  | only specific algorithms need to be parallel-aware, individual | 
|  | parallel algorithms can be made available explicitly. These | 
|  | parallel algorithms are functionally equivalent to the standard | 
|  | drop-in algorithms used in parallel mode, but they are available in | 
|  | a separate namespace as GNU extensions and may be used in programs | 
|  | compiled with either release mode or with parallel mode. | 
|  | </para> | 
|  |  | 
|  |  | 
|  | <para>An example of using a parallel version | 
|  | of <function>std::sort</function>, but no other parallel algorithms, is: | 
|  | </para> | 
|  |  | 
|  | <programlisting> | 
|  | #include <vector> | 
|  | #include <parallel/algorithm> | 
|  |  | 
|  | int main() | 
|  | { | 
|  | std::vector<int> v(100); | 
|  |  | 
|  | // ... | 
|  |  | 
|  | // Explicitly force a call to parallel sort. | 
|  | __gnu_parallel::sort(v.begin(), v.end()); | 
|  | return 0; | 
|  | } | 
|  | </programlisting> | 
|  |  | 
|  | <para> | 
|  | Then compile this code with the prerequisite compiler flags | 
|  | (<literal>-fopenmp</literal> and any necessary architecture-specific | 
|  | flags for atomic operations.) | 
|  | </para> | 
|  |  | 
|  | <para> The following table provides the names and headers of all the | 
|  | parallel algorithms that can be used in a similar manner: | 
|  | </para> | 
|  |  | 
|  | <table frame="all" xml:id="table.parallel_algos"> | 
|  | <title>Parallel Algorithms</title> | 
|  |  | 
|  | <tgroup cols="4" align="left" colsep="1" rowsep="1"> | 
|  | <colspec colname="c1"/> | 
|  | <colspec colname="c2"/> | 
|  | <colspec colname="c3"/> | 
|  | <colspec colname="c4"/> | 
|  |  | 
|  | <thead> | 
|  | <row> | 
|  | <entry>Algorithm</entry> | 
|  | <entry>Header</entry> | 
|  | <entry>Parallel algorithm</entry> | 
|  | <entry>Parallel header</entry> | 
|  | </row> | 
|  | </thead> | 
|  |  | 
|  | <tbody> | 
|  | <row> | 
|  | <entry><function>std::accumulate</function></entry> | 
|  | <entry><filename class="headerfile">numeric</filename></entry> | 
|  | <entry><function>__gnu_parallel::accumulate</function></entry> | 
|  | <entry><filename class="headerfile">parallel/numeric</filename></entry> | 
|  | </row> | 
|  | <row> | 
|  | <entry><function>std::adjacent_difference</function></entry> | 
|  | <entry><filename class="headerfile">numeric</filename></entry> | 
|  | <entry><function>__gnu_parallel::adjacent_difference</function></entry> | 
|  | <entry><filename class="headerfile">parallel/numeric</filename></entry> | 
|  | </row> | 
|  | <row> | 
|  | <entry><function>std::inner_product</function></entry> | 
|  | <entry><filename class="headerfile">numeric</filename></entry> | 
|  | <entry><function>__gnu_parallel::inner_product</function></entry> | 
|  | <entry><filename class="headerfile">parallel/numeric</filename></entry> | 
|  | </row> | 
|  | <row> | 
|  | <entry><function>std::partial_sum</function></entry> | 
|  | <entry><filename class="headerfile">numeric</filename></entry> | 
|  | <entry><function>__gnu_parallel::partial_sum</function></entry> | 
|  | <entry><filename class="headerfile">parallel/numeric</filename></entry> | 
|  | </row> | 
|  | <row> | 
|  | <entry><function>std::adjacent_find</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::adjacent_find</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::count</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::count</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::count_if</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::count_if</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::equal</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::equal</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::find</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::find</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::find_if</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::find_if</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::find_first_of</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::find_first_of</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::for_each</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::for_each</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::generate</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::generate</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::generate_n</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::generate_n</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::lexicographical_compare</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::lexicographical_compare</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::mismatch</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::mismatch</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::search</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::search</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::search_n</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::search_n</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::transform</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::transform</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::replace</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::replace</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::replace_if</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::replace_if</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::max_element</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::max_element</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::merge</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::merge</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::min_element</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::min_element</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::nth_element</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::nth_element</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::partial_sort</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::partial_sort</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::partition</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::partition</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::random_shuffle</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::random_shuffle</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::set_union</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::set_union</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::set_intersection</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::set_intersection</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::set_symmetric_difference</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::set_symmetric_difference</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::set_difference</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::set_difference</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::sort</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::sort</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::stable_sort</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::stable_sort</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  |  | 
|  | <row> | 
|  | <entry><function>std::unique_copy</function></entry> | 
|  | <entry><filename class="headerfile">algorithm</filename></entry> | 
|  | <entry><function>__gnu_parallel::unique_copy</function></entry> | 
|  | <entry><filename class="headerfile">parallel/algorithm</filename></entry> | 
|  | </row> | 
|  | </tbody> | 
|  | </tgroup> | 
|  | </table> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info> | 
|  | <?dbhtml filename="parallel_mode_design.html"?> | 
|  |  | 
|  | <para> | 
|  | </para> | 
|  | <section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info> | 
|  |  | 
|  |  | 
|  | <para> | 
|  | All parallel algorithms are intended to have signatures that are | 
|  | equivalent to the ISO C++ algorithms replaced. For instance, the | 
|  | <function>std::adjacent_find</function> function is declared as: | 
|  | </para> | 
|  | <programlisting> | 
|  | namespace std | 
|  | { | 
|  | template<typename _FIter> | 
|  | _FIter | 
|  | adjacent_find(_FIter, _FIter); | 
|  | } | 
|  | </programlisting> | 
|  |  | 
|  | <para> | 
|  | Which means that there should be something equivalent for the parallel | 
|  | version. Indeed, this is the case: | 
|  | </para> | 
|  |  | 
|  | <programlisting> | 
|  | namespace std | 
|  | { | 
|  | namespace __parallel | 
|  | { | 
|  | template<typename _FIter> | 
|  | _FIter | 
|  | adjacent_find(_FIter, _FIter); | 
|  |  | 
|  | ... | 
|  | } | 
|  | } | 
|  | </programlisting> | 
|  |  | 
|  | <para>But.... why the ellipses? | 
|  | </para> | 
|  |  | 
|  | <para> The ellipses in the example above represent additional overloads | 
|  | required for the parallel version of the function. These additional | 
|  | overloads are used to dispatch calls from the ISO C++ function | 
|  | signature to the appropriate parallel function (or sequential | 
|  | function, if no parallel functions are deemed worthy), based on either | 
|  | compile-time or run-time conditions. | 
|  | </para> | 
|  |  | 
|  | <para> The available signature options are specific for the different | 
|  | algorithms/algorithm classes.</para> | 
|  |  | 
|  | <para> The general view of overloads for the parallel algorithms look like this: | 
|  | </para> | 
|  | <itemizedlist> | 
|  | <listitem><para>ISO C++ signature</para></listitem> | 
|  | <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem> | 
|  | <listitem><para>ISO C++ signature + algorithm-specific tag type | 
|  | (several signatures)</para></listitem> | 
|  | </itemizedlist> | 
|  |  | 
|  | <para> Please note that the implementation may use additional functions | 
|  | (designated with the <code>_switch</code> suffix) to dispatch from the | 
|  | ISO C++ signature to the correct parallel version. Also, some of the | 
|  | algorithms do not have support for run-time conditions, so the last | 
|  | overload is therefore missing. | 
|  | </para> | 
|  |  | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info> | 
|  |  | 
|  |  | 
|  |  | 
|  | <section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info> | 
|  |  | 
|  |  | 
|  | <para> | 
|  | Several aspects of the overall runtime environment can be manipulated | 
|  | by standard OpenMP function calls. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | To specify the number of threads to be used for the algorithms globally, | 
|  | use the function <function>omp_set_num_threads</function>. An example: | 
|  | </para> | 
|  |  | 
|  | <programlisting> | 
|  | #include <stdlib.h> | 
|  | #include <omp.h> | 
|  |  | 
|  | int main() | 
|  | { | 
|  | // Explicitly set number of threads. | 
|  | const int threads_wanted = 20; | 
|  | omp_set_dynamic(false); | 
|  | omp_set_num_threads(threads_wanted); | 
|  |  | 
|  | // Call parallel mode algorithms. | 
|  |  | 
|  | return 0; | 
|  | } | 
|  | </programlisting> | 
|  |  | 
|  | <para> | 
|  | Some algorithms allow the number of threads being set for a particular call, | 
|  | by augmenting the algorithm variant. | 
|  | See the next section for further information. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | Other parts of the runtime environment able to be manipulated include | 
|  | nested parallelism (<function>omp_set_nested</function>), schedule kind | 
|  | (<function>omp_set_schedule</function>), and others. See the OpenMP | 
|  | documentation for more information. | 
|  | </para> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info> | 
|  |  | 
|  |  | 
|  | <para> | 
|  | To force an algorithm to execute sequentially, even though parallelism | 
|  | is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>, | 
|  | add <classname>__gnu_parallel::sequential_tag()</classname> to the end | 
|  | of the algorithm's argument list. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | Like so: | 
|  | </para> | 
|  |  | 
|  | <programlisting> | 
|  | std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); | 
|  | </programlisting> | 
|  |  | 
|  | <para> | 
|  | Some parallel algorithm variants can be excluded from compilation by | 
|  | preprocessor defines. See the doxygen documentation on | 
|  | <code>compiletime_settings.h</code> and <code>features.h</code> for details. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | For some algorithms, the desired variant can be chosen at compile-time by | 
|  | appending a tag object. The available options are specific to the particular | 
|  | algorithm (class). | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | For the "embarrassingly parallel" algorithms, there is only one "tag object | 
|  | type", the enum _Parallelism. | 
|  | It takes one of the following values, | 
|  | <code>__gnu_parallel::parallel_tag</code>, | 
|  | <code>__gnu_parallel::balanced_tag</code>, | 
|  | <code>__gnu_parallel::unbalanced_tag</code>, | 
|  | <code>__gnu_parallel::omp_loop_tag</code>, | 
|  | <code>__gnu_parallel::omp_loop_static_tag</code>. | 
|  | This means that the actual parallelization strategy is chosen at run-time. | 
|  | (Choosing the variants at compile-time will come soon.) | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | For the following algorithms in general, we have | 
|  | <code>__gnu_parallel::parallel_tag</code> and | 
|  | <code>__gnu_parallel::default_parallel_tag</code>, in addition to | 
|  | <code>__gnu_parallel::sequential_tag</code>. | 
|  | <code>__gnu_parallel::default_parallel_tag</code> chooses the default | 
|  | algorithm at compiletime, as does omitting the tag. | 
|  | <code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime | 
|  | (see next section). | 
|  | For all tags, the number of threads desired for this call can optionally be | 
|  | passed to the respective tag's constructor. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | The <code>multiway_merge</code> algorithm comes with the additional choices, | 
|  | <code>__gnu_parallel::exact_tag</code> and | 
|  | <code>__gnu_parallel::sampling_tag</code>. | 
|  | Exact and sampling are the two available splitting strategies. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | For the <code>sort</code> and <code>stable_sort</code> algorithms, there are | 
|  | several additional choices, namely | 
|  | <code>__gnu_parallel::multiway_mergesort_tag</code>, | 
|  | <code>__gnu_parallel::multiway_mergesort_exact_tag</code>, | 
|  | <code>__gnu_parallel::multiway_mergesort_sampling_tag</code>, | 
|  | <code>__gnu_parallel::quicksort_tag</code>, and | 
|  | <code>__gnu_parallel::balanced_quicksort_tag</code>. | 
|  | Multiway mergesort comes with the two splitting strategies for multi-way | 
|  | merging. The quicksort options cannot be used for <code>stable_sort</code>. | 
|  | </para> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info> | 
|  |  | 
|  |  | 
|  | <para> | 
|  | The default parallelization strategy, the choice of specific algorithm | 
|  | strategy, the minimum threshold limits for individual parallel | 
|  | algorithms, and aspects of the underlying hardware can be specified as | 
|  | desired via manipulation | 
|  | of <classname>__gnu_parallel::_Settings</classname> member data. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | First off, the choice of parallelization strategy: serial, parallel, | 
|  | or heuristically deduced. This corresponds | 
|  | to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a | 
|  | value of enum <type>__gnu_parallel::_AlgorithmStrategy</type> | 
|  | type. Choices | 
|  | include: <type>heuristic</type>, <type>force_sequential</type>, | 
|  | and <type>force_parallel</type>. The default is <type>heuristic</type>. | 
|  | </para> | 
|  |  | 
|  |  | 
|  | <para> | 
|  | Next, the sub-choices for algorithm variant, if not fixed at compile-time. | 
|  | Specific algorithms like <function>find</function> or <function>sort</function> | 
|  | can be implemented in multiple ways: when this is the case, | 
|  | a <classname>__gnu_parallel::_Settings</classname> member exists to | 
|  | pick the default strategy. For | 
|  | example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can | 
|  | have any values of | 
|  | enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>, | 
|  | or <type>QS_BALANCED</type>. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | Likewise for setting the minimal threshold for algorithm | 
|  | parallelization.  Parallelism always incurs some overhead. Thus, it is | 
|  | not helpful to parallelize operations on very small sets of | 
|  | data. Because of this, measures are taken to avoid parallelizing below | 
|  | a certain, pre-determined threshold. For each algorithm, a minimum | 
|  | problem size is encoded as a variable in the | 
|  | active <classname>__gnu_parallel::_Settings</classname> object.  This | 
|  | threshold variable follows the following naming scheme: | 
|  | <code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>.  So, | 
|  | for <function>fill</function>, the threshold variable | 
|  | is <code>__gnu_parallel::_Settings::fill_minimal_n</code>, | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | Finally, hardware details like L1/L2 cache size can be hardwired | 
|  | via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | All these configuration variables can be changed by the user, if | 
|  | desired. | 
|  | There exists one global instance of the class <classname>_Settings</classname>, | 
|  | i. e. it is a singleton. It can be read and written by calling | 
|  | <code>__gnu_parallel::_Settings::get</code> and | 
|  | <code>__gnu_parallel::_Settings::set</code>, respectively. | 
|  | Please note that the first call return a const object, so direct manipulation | 
|  | is forbidden. | 
|  | See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/index.html"> | 
|  | <filename class="headerfile"><parallel/settings.h></filename></link> | 
|  | for complete details. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | A small example of tuning the default: | 
|  | </para> | 
|  |  | 
|  | <programlisting> | 
|  | #include <parallel/algorithm> | 
|  | #include <parallel/settings.h> | 
|  |  | 
|  | int main() | 
|  | { | 
|  | __gnu_parallel::_Settings s; | 
|  | s.algorithm_strategy = __gnu_parallel::force_parallel; | 
|  | __gnu_parallel::_Settings::set(s); | 
|  |  | 
|  | // Do work... all algorithms will be parallelized, always. | 
|  |  | 
|  | return 0; | 
|  | } | 
|  | </programlisting> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info> | 
|  |  | 
|  |  | 
|  | <para> One namespace contain versions of code that are always | 
|  | explicitly sequential: | 
|  | <code>__gnu_serial</code>. | 
|  | </para> | 
|  |  | 
|  | <para> Two namespaces contain the parallel mode: | 
|  | <code>std::__parallel</code> and <code>__gnu_parallel</code>. | 
|  | </para> | 
|  |  | 
|  | <para> Parallel implementations of standard components, including | 
|  | template helpers to select parallelism, are defined in <code>namespace | 
|  | std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in | 
|  | <function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel | 
|  | implementations are injected into <code>namespace | 
|  | __gnu_parallel</code> with using declarations. | 
|  | </para> | 
|  |  | 
|  | <para> Support and general infrastructure is in <code>namespace | 
|  | __gnu_parallel</code>. | 
|  | </para> | 
|  |  | 
|  | <para> More information, and an organized index of types and functions | 
|  | related to the parallel mode on a per-namespace basis, can be found in | 
|  | the generated source documentation. | 
|  | </para> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | </section> | 
|  |  | 
|  | <section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info> | 
|  | <?dbhtml filename="parallel_mode_test.html"?> | 
|  |  | 
|  |  | 
|  | <para> | 
|  | Both the normal conformance and regression tests and the | 
|  | supplemental performance tests work. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | To run the conformance and regression tests with the parallel mode | 
|  | active, | 
|  | </para> | 
|  |  | 
|  | <screen> | 
|  | <userinput>make check-parallel</userinput> | 
|  | </screen> | 
|  |  | 
|  | <para> | 
|  | The log and summary files for conformance testing are in the | 
|  | <filename class="directory">testsuite/parallel</filename> directory. | 
|  | </para> | 
|  |  | 
|  | <para> | 
|  | To run the performance tests with the parallel mode active, | 
|  | </para> | 
|  |  | 
|  | <screen> | 
|  | <userinput>make check-performance-parallel</userinput> | 
|  | </screen> | 
|  |  | 
|  | <para> | 
|  | The result file for performance testing are in the | 
|  | <filename class="directory">testsuite</filename> directory, in the file | 
|  | <filename>libstdc++_performance.sum</filename>. In addition, the | 
|  | policy-based containers have their own visualizations, which have | 
|  | additional software dependencies than the usual bare-boned text | 
|  | file, and can be generated by using the <code>make | 
|  | doc-performance</code> rule in the testsuite's Makefile. | 
|  | </para> | 
|  | </section> | 
|  |  | 
|  | <bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info> | 
|  |  | 
|  |  | 
|  | <biblioentry> | 
|  | <citetitle> | 
|  | Parallelization of Bulk Operations for STL Dictionaries | 
|  | </citetitle> | 
|  |  | 
|  | <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> | 
|  | <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author> | 
|  |  | 
|  | <copyright> | 
|  | <year>2007</year> | 
|  | <holder/> | 
|  | </copyright> | 
|  |  | 
|  | <publisher> | 
|  | <publishername> | 
|  | Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS) | 
|  | </publishername> | 
|  | </publisher> | 
|  | </biblioentry> | 
|  |  | 
|  | <biblioentry> | 
|  | <citetitle> | 
|  | The Multi-Core Standard Template Library | 
|  | </citetitle> | 
|  |  | 
|  | <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> | 
|  | <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author> | 
|  | <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author> | 
|  |  | 
|  | <copyright> | 
|  | <year>2007</year> | 
|  | <holder/> | 
|  | </copyright> | 
|  |  | 
|  | <publisher> | 
|  | <publishername> | 
|  | Euro-Par 2007: Parallel Processing. (LNCS 4641) | 
|  | </publishername> | 
|  | </publisher> | 
|  | </biblioentry> | 
|  |  | 
|  | </bibliography> | 
|  |  | 
|  | </chapter> |