| <chapter xmlns="http://docbook.org/ns/docbook" version="5.0" |
| xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode"> |
| <?dbhtml filename="parallel_mode.html"?> |
| |
| <info><title>Parallel Mode</title> |
| <keywordset> |
| <keyword>C++</keyword> |
| <keyword>library</keyword> |
| <keyword>parallel</keyword> |
| </keywordset> |
| </info> |
| |
| |
| |
| <para> The libstdc++ parallel mode is an experimental parallel |
| implementation of many algorithms of the C++ Standard Library. |
| </para> |
| |
| <para> |
| Several of the standard algorithms, for instance |
| <function>std::sort</function>, are made parallel using OpenMP |
| annotations. These parallel mode constructs can be invoked by |
| explicit source declaration or by compiling existing sources with a |
| specific compiler flag. |
| </para> |
| |
| <note> |
| <para> |
| The parallel mode has not been kept up to date with recent C++ standards |
| and so it only conforms to the C++03 requirements. |
| That means that move-only predicates may not work with parallel mode |
| algorithms, and for C++20 most of the algorithms cannot be used in |
| <code>constexpr</code> functions. |
| </para> |
| <para> |
| For C++17 and above there are new overloads of the standard algorithms |
| which take an execution policy argument. You should consider using those |
| instead of the non-standard parallel mode extensions. |
| </para> |
| </note> |
| |
| <section xml:id="manual.ext.parallel_mode.intro" xreflabel="Intro"><info><title>Intro</title></info> |
| |
| |
| <para>The following library components in the include |
| <filename class="headerfile">numeric</filename> are included in the parallel mode:</para> |
| <itemizedlist> |
| <listitem><para><function>std::accumulate</function></para></listitem> |
| <listitem><para><function>std::adjacent_difference</function></para></listitem> |
| <listitem><para><function>std::inner_product</function></para></listitem> |
| <listitem><para><function>std::partial_sum</function></para></listitem> |
| </itemizedlist> |
| |
| <para>The following library components in the include |
| <filename class="headerfile">algorithm</filename> are included in the parallel mode:</para> |
| <itemizedlist> |
| <listitem><para><function>std::adjacent_find</function></para></listitem> |
| <listitem><para><function>std::count</function></para></listitem> |
| <listitem><para><function>std::count_if</function></para></listitem> |
| <listitem><para><function>std::equal</function></para></listitem> |
| <listitem><para><function>std::find</function></para></listitem> |
| <listitem><para><function>std::find_if</function></para></listitem> |
| <listitem><para><function>std::find_first_of</function></para></listitem> |
| <listitem><para><function>std::for_each</function></para></listitem> |
| <listitem><para><function>std::generate</function></para></listitem> |
| <listitem><para><function>std::generate_n</function></para></listitem> |
| <listitem><para><function>std::lexicographical_compare</function></para></listitem> |
| <listitem><para><function>std::mismatch</function></para></listitem> |
| <listitem><para><function>std::search</function></para></listitem> |
| <listitem><para><function>std::search_n</function></para></listitem> |
| <listitem><para><function>std::transform</function></para></listitem> |
| <listitem><para><function>std::replace</function></para></listitem> |
| <listitem><para><function>std::replace_if</function></para></listitem> |
| <listitem><para><function>std::max_element</function></para></listitem> |
| <listitem><para><function>std::merge</function></para></listitem> |
| <listitem><para><function>std::min_element</function></para></listitem> |
| <listitem><para><function>std::nth_element</function></para></listitem> |
| <listitem><para><function>std::partial_sort</function></para></listitem> |
| <listitem><para><function>std::partition</function></para></listitem> |
| <listitem><para><function>std::random_shuffle</function></para></listitem> |
| <listitem><para><function>std::set_union</function></para></listitem> |
| <listitem><para><function>std::set_intersection</function></para></listitem> |
| <listitem><para><function>std::set_symmetric_difference</function></para></listitem> |
| <listitem><para><function>std::set_difference</function></para></listitem> |
| <listitem><para><function>std::sort</function></para></listitem> |
| <listitem><para><function>std::stable_sort</function></para></listitem> |
| <listitem><para><function>std::unique_copy</function></para></listitem> |
| </itemizedlist> |
| |
| </section> |
| |
| <section xml:id="manual.ext.parallel_mode.semantics" xreflabel="Semantics"><info><title>Semantics</title></info> |
| <?dbhtml filename="parallel_mode_semantics.html"?> |
| |
| |
| <para> The parallel mode STL algorithms are currently not exception-safe, |
| i.e. user-defined functors must not throw exceptions. |
| Also, the order of execution is not guaranteed for some functions, of course. |
| Therefore, user-defined functors should not have any concurrent side effects. |
| </para> |
| |
| <para> Since the current GCC OpenMP implementation does not support |
| OpenMP parallel regions in concurrent threads, |
| it is not possible to call parallel STL algorithm in |
| concurrent threads, either. |
| It might work with other compilers, though.</para> |
| |
| </section> |
| |
| <section xml:id="manual.ext.parallel_mode.using" xreflabel="Using"><info><title>Using</title></info> |
| <?dbhtml filename="parallel_mode_using.html"?> |
| |
| |
| <section xml:id="parallel_mode.using.prereq_flags"><info><title>Prerequisite Compiler Flags</title></info> |
| |
| |
| <para> |
| Any use of parallel functionality requires additional compiler |
| and runtime support, in particular support for OpenMP. Adding this support is |
| not difficult: just compile your application with the compiler |
| flag <literal>-fopenmp</literal>. This will link |
| in <code>libgomp</code>, the |
| <link xmlns:xlink="http://www.w3.org/1999/xlink" |
| xlink:href="http://gcc.gnu.org/onlinedocs/libgomp/">GNU Offloading and |
| Multi Processing Runtime Library</link>, |
| whose presence is mandatory. |
| </para> |
| |
| <para> |
| In addition, hardware that supports atomic operations and a compiler |
| capable of producing atomic operations is mandatory: GCC defaults to no |
| support for atomic operations on some common hardware |
| architectures. Activating atomic operations may require explicit |
| compiler flags on some targets (like sparc and x86), such |
| as <literal>-march=i686</literal>, |
| <literal>-march=native</literal> or <literal>-mcpu=v9</literal>. See |
| the GCC manual for more information. |
| </para> |
| |
| </section> |
| |
| <section xml:id="parallel_mode.using.parallel_mode"><info><title>Using Parallel Mode</title></info> |
| |
| |
| <para> |
| To use the libstdc++ parallel mode, compile your application with |
| the prerequisite flags as detailed above, and in addition |
| add <constant>-D_GLIBCXX_PARALLEL</constant>. This will convert all |
| use of the standard (sequential) algorithms to the appropriate parallel |
| equivalents. Please note that this doesn't necessarily mean that |
| everything will end up being executed in a parallel manner, but |
| rather that the heuristics and settings coded into the parallel |
| versions will be used to determine if all, some, or no algorithms |
| will be executed using parallel variants. |
| </para> |
| |
| <para>Note that the <constant>_GLIBCXX_PARALLEL</constant> define may change the |
| sizes and behavior of standard class templates such as |
| <function>std::search</function>, and therefore one can only link code |
| compiled with parallel mode and code compiled without parallel mode |
| if no instantiation of a container is passed between the two |
| translation units. Parallel mode functionality has distinct linkage, |
| and cannot be confused with normal mode symbols. |
| </para> |
| </section> |
| |
| <section xml:id="parallel_mode.using.specific"><info><title>Using Specific Parallel Components</title></info> |
| |
| |
| <para>When it is not feasible to recompile your entire application, or |
| only specific algorithms need to be parallel-aware, individual |
| parallel algorithms can be made available explicitly. These |
| parallel algorithms are functionally equivalent to the standard |
| drop-in algorithms used in parallel mode, but they are available in |
| a separate namespace as GNU extensions and may be used in programs |
| compiled with either release mode or with parallel mode. |
| </para> |
| |
| |
| <para>An example of using a parallel version |
| of <function>std::sort</function>, but no other parallel algorithms, is: |
| </para> |
| |
| <programlisting> |
| #include <vector> |
| #include <parallel/algorithm> |
| |
| int main() |
| { |
| std::vector<int> v(100); |
| |
| // ... |
| |
| // Explicitly force a call to parallel sort. |
| __gnu_parallel::sort(v.begin(), v.end()); |
| return 0; |
| } |
| </programlisting> |
| |
| <para> |
| Then compile this code with the prerequisite compiler flags |
| (<literal>-fopenmp</literal> and any necessary architecture-specific |
| flags for atomic operations.) |
| </para> |
| |
| <para> The following table provides the names and headers of all the |
| parallel algorithms that can be used in a similar manner: |
| </para> |
| |
| <table frame="all" xml:id="table.parallel_algos"> |
| <title>Parallel Algorithms</title> |
| |
| <tgroup cols="4" align="left" colsep="1" rowsep="1"> |
| <colspec colname="c1"/> |
| <colspec colname="c2"/> |
| <colspec colname="c3"/> |
| <colspec colname="c4"/> |
| |
| <thead> |
| <row> |
| <entry>Algorithm</entry> |
| <entry>Header</entry> |
| <entry>Parallel algorithm</entry> |
| <entry>Parallel header</entry> |
| </row> |
| </thead> |
| |
| <tbody> |
| <row> |
| <entry><function>std::accumulate</function></entry> |
| <entry><filename class="headerfile">numeric</filename></entry> |
| <entry><function>__gnu_parallel::accumulate</function></entry> |
| <entry><filename class="headerfile">parallel/numeric</filename></entry> |
| </row> |
| <row> |
| <entry><function>std::adjacent_difference</function></entry> |
| <entry><filename class="headerfile">numeric</filename></entry> |
| <entry><function>__gnu_parallel::adjacent_difference</function></entry> |
| <entry><filename class="headerfile">parallel/numeric</filename></entry> |
| </row> |
| <row> |
| <entry><function>std::inner_product</function></entry> |
| <entry><filename class="headerfile">numeric</filename></entry> |
| <entry><function>__gnu_parallel::inner_product</function></entry> |
| <entry><filename class="headerfile">parallel/numeric</filename></entry> |
| </row> |
| <row> |
| <entry><function>std::partial_sum</function></entry> |
| <entry><filename class="headerfile">numeric</filename></entry> |
| <entry><function>__gnu_parallel::partial_sum</function></entry> |
| <entry><filename class="headerfile">parallel/numeric</filename></entry> |
| </row> |
| <row> |
| <entry><function>std::adjacent_find</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::adjacent_find</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::count</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::count</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::count_if</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::count_if</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::equal</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::equal</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::find</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::find</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::find_if</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::find_if</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::find_first_of</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::find_first_of</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::for_each</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::for_each</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::generate</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::generate</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::generate_n</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::generate_n</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::lexicographical_compare</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::lexicographical_compare</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::mismatch</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::mismatch</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::search</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::search</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::search_n</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::search_n</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::transform</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::transform</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::replace</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::replace</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::replace_if</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::replace_if</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::max_element</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::max_element</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::merge</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::merge</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::min_element</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::min_element</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::nth_element</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::nth_element</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::partial_sort</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::partial_sort</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::partition</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::partition</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::random_shuffle</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::random_shuffle</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::set_union</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::set_union</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::set_intersection</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::set_intersection</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::set_symmetric_difference</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::set_symmetric_difference</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::set_difference</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::set_difference</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::sort</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::sort</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::stable_sort</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::stable_sort</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| |
| <row> |
| <entry><function>std::unique_copy</function></entry> |
| <entry><filename class="headerfile">algorithm</filename></entry> |
| <entry><function>__gnu_parallel::unique_copy</function></entry> |
| <entry><filename class="headerfile">parallel/algorithm</filename></entry> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| </section> |
| |
| </section> |
| |
| <section xml:id="manual.ext.parallel_mode.design" xreflabel="Design"><info><title>Design</title></info> |
| <?dbhtml filename="parallel_mode_design.html"?> |
| |
| <para> |
| </para> |
| <section xml:id="parallel_mode.design.intro" xreflabel="Intro"><info><title>Interface Basics</title></info> |
| |
| |
| <para> |
| All parallel algorithms are intended to have signatures that are |
| equivalent to the ISO C++ algorithms replaced. For instance, the |
| <function>std::adjacent_find</function> function is declared as: |
| </para> |
| <programlisting> |
| namespace std |
| { |
| template<typename _FIter> |
| _FIter |
| adjacent_find(_FIter, _FIter); |
| } |
| </programlisting> |
| |
| <para> |
| Which means that there should be something equivalent for the parallel |
| version. Indeed, this is the case: |
| </para> |
| |
| <programlisting> |
| namespace std |
| { |
| namespace __parallel |
| { |
| template<typename _FIter> |
| _FIter |
| adjacent_find(_FIter, _FIter); |
| |
| ... |
| } |
| } |
| </programlisting> |
| |
| <para>But.... why the ellipses? |
| </para> |
| |
| <para> The ellipses in the example above represent additional overloads |
| required for the parallel version of the function. These additional |
| overloads are used to dispatch calls from the ISO C++ function |
| signature to the appropriate parallel function (or sequential |
| function, if no parallel functions are deemed worthy), based on either |
| compile-time or run-time conditions. |
| </para> |
| |
| <para> The available signature options are specific for the different |
| algorithms/algorithm classes.</para> |
| |
| <para> The general view of overloads for the parallel algorithms look like this: |
| </para> |
| <itemizedlist> |
| <listitem><para>ISO C++ signature</para></listitem> |
| <listitem><para>ISO C++ signature + sequential_tag argument</para></listitem> |
| <listitem><para>ISO C++ signature + algorithm-specific tag type |
| (several signatures)</para></listitem> |
| </itemizedlist> |
| |
| <para> Please note that the implementation may use additional functions |
| (designated with the <code>_switch</code> suffix) to dispatch from the |
| ISO C++ signature to the correct parallel version. Also, some of the |
| algorithms do not have support for run-time conditions, so the last |
| overload is therefore missing. |
| </para> |
| |
| |
| </section> |
| |
| <section xml:id="parallel_mode.design.tuning" xreflabel="Tuning"><info><title>Configuration and Tuning</title></info> |
| |
| |
| |
| <section xml:id="parallel_mode.design.tuning.omp" xreflabel="OpenMP Environment"><info><title>Setting up the OpenMP Environment</title></info> |
| |
| |
| <para> |
| Several aspects of the overall runtime environment can be manipulated |
| by standard OpenMP function calls. |
| </para> |
| |
| <para> |
| To specify the number of threads to be used for the algorithms globally, |
| use the function <function>omp_set_num_threads</function>. An example: |
| </para> |
| |
| <programlisting> |
| #include <stdlib.h> |
| #include <omp.h> |
| |
| int main() |
| { |
| // Explicitly set number of threads. |
| const int threads_wanted = 20; |
| omp_set_dynamic(false); |
| omp_set_num_threads(threads_wanted); |
| |
| // Call parallel mode algorithms. |
| |
| return 0; |
| } |
| </programlisting> |
| |
| <para> |
| Some algorithms allow the number of threads being set for a particular call, |
| by augmenting the algorithm variant. |
| See the next section for further information. |
| </para> |
| |
| <para> |
| Other parts of the runtime environment able to be manipulated include |
| nested parallelism (<function>omp_set_nested</function>), schedule kind |
| (<function>omp_set_schedule</function>), and others. See the OpenMP |
| documentation for more information. |
| </para> |
| |
| </section> |
| |
| <section xml:id="parallel_mode.design.tuning.compile" xreflabel="Compile Switches"><info><title>Compile Time Switches</title></info> |
| |
| |
| <para> |
| To force an algorithm to execute sequentially, even though parallelism |
| is switched on in general via the macro <constant>_GLIBCXX_PARALLEL</constant>, |
| add <classname>__gnu_parallel::sequential_tag()</classname> to the end |
| of the algorithm's argument list. |
| </para> |
| |
| <para> |
| Like so: |
| </para> |
| |
| <programlisting> |
| std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); |
| </programlisting> |
| |
| <para> |
| Some parallel algorithm variants can be excluded from compilation by |
| preprocessor defines. See the doxygen documentation on |
| <code>compiletime_settings.h</code> and <code>features.h</code> for details. |
| </para> |
| |
| <para> |
| For some algorithms, the desired variant can be chosen at compile-time by |
| appending a tag object. The available options are specific to the particular |
| algorithm (class). |
| </para> |
| |
| <para> |
| For the "embarrassingly parallel" algorithms, there is only one "tag object |
| type", the enum _Parallelism. |
| It takes one of the following values, |
| <code>__gnu_parallel::parallel_tag</code>, |
| <code>__gnu_parallel::balanced_tag</code>, |
| <code>__gnu_parallel::unbalanced_tag</code>, |
| <code>__gnu_parallel::omp_loop_tag</code>, |
| <code>__gnu_parallel::omp_loop_static_tag</code>. |
| This means that the actual parallelization strategy is chosen at run-time. |
| (Choosing the variants at compile-time will come soon.) |
| </para> |
| |
| <para> |
| For the following algorithms in general, we have |
| <code>__gnu_parallel::parallel_tag</code> and |
| <code>__gnu_parallel::default_parallel_tag</code>, in addition to |
| <code>__gnu_parallel::sequential_tag</code>. |
| <code>__gnu_parallel::default_parallel_tag</code> chooses the default |
| algorithm at compiletime, as does omitting the tag. |
| <code>__gnu_parallel::parallel_tag</code> postpones the decision to runtime |
| (see next section). |
| For all tags, the number of threads desired for this call can optionally be |
| passed to the respective tag's constructor. |
| </para> |
| |
| <para> |
| The <code>multiway_merge</code> algorithm comes with the additional choices, |
| <code>__gnu_parallel::exact_tag</code> and |
| <code>__gnu_parallel::sampling_tag</code>. |
| Exact and sampling are the two available splitting strategies. |
| </para> |
| |
| <para> |
| For the <code>sort</code> and <code>stable_sort</code> algorithms, there are |
| several additional choices, namely |
| <code>__gnu_parallel::multiway_mergesort_tag</code>, |
| <code>__gnu_parallel::multiway_mergesort_exact_tag</code>, |
| <code>__gnu_parallel::multiway_mergesort_sampling_tag</code>, |
| <code>__gnu_parallel::quicksort_tag</code>, and |
| <code>__gnu_parallel::balanced_quicksort_tag</code>. |
| Multiway mergesort comes with the two splitting strategies for multi-way |
| merging. The quicksort options cannot be used for <code>stable_sort</code>. |
| </para> |
| |
| </section> |
| |
| <section xml:id="parallel_mode.design.tuning.settings" xreflabel="_Settings"><info><title>Run Time Settings and Defaults</title></info> |
| |
| |
| <para> |
| The default parallelization strategy, the choice of specific algorithm |
| strategy, the minimum threshold limits for individual parallel |
| algorithms, and aspects of the underlying hardware can be specified as |
| desired via manipulation |
| of <classname>__gnu_parallel::_Settings</classname> member data. |
| </para> |
| |
| <para> |
| First off, the choice of parallelization strategy: serial, parallel, |
| or heuristically deduced. This corresponds |
| to <code>__gnu_parallel::_Settings::algorithm_strategy</code> and is a |
| value of enum <type>__gnu_parallel::_AlgorithmStrategy</type> |
| type. Choices |
| include: <type>heuristic</type>, <type>force_sequential</type>, |
| and <type>force_parallel</type>. The default is <type>heuristic</type>. |
| </para> |
| |
| |
| <para> |
| Next, the sub-choices for algorithm variant, if not fixed at compile-time. |
| Specific algorithms like <function>find</function> or <function>sort</function> |
| can be implemented in multiple ways: when this is the case, |
| a <classname>__gnu_parallel::_Settings</classname> member exists to |
| pick the default strategy. For |
| example, <code>__gnu_parallel::_Settings::sort_algorithm</code> can |
| have any values of |
| enum <type>__gnu_parallel::_SortAlgorithm</type>: <type>MWMS</type>, <type>QS</type>, |
| or <type>QS_BALANCED</type>. |
| </para> |
| |
| <para> |
| Likewise for setting the minimal threshold for algorithm |
| parallelization. Parallelism always incurs some overhead. Thus, it is |
| not helpful to parallelize operations on very small sets of |
| data. Because of this, measures are taken to avoid parallelizing below |
| a certain, pre-determined threshold. For each algorithm, a minimum |
| problem size is encoded as a variable in the |
| active <classname>__gnu_parallel::_Settings</classname> object. This |
| threshold variable follows the following naming scheme: |
| <code>__gnu_parallel::_Settings::[algorithm]_minimal_n</code>. So, |
| for <function>fill</function>, the threshold variable |
| is <code>__gnu_parallel::_Settings::fill_minimal_n</code>, |
| </para> |
| |
| <para> |
| Finally, hardware details like L1/L2 cache size can be hardwired |
| via <code>__gnu_parallel::_Settings::L1_cache_size</code> and friends. |
| </para> |
| |
| <para> |
| </para> |
| |
| <para> |
| All these configuration variables can be changed by the user, if |
| desired. |
| There exists one global instance of the class <classname>_Settings</classname>, |
| i. e. it is a singleton. It can be read and written by calling |
| <code>__gnu_parallel::_Settings::get</code> and |
| <code>__gnu_parallel::_Settings::set</code>, respectively. |
| Please note that the first call return a const object, so direct manipulation |
| is forbidden. |
| See <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/index.html"> |
| <filename class="headerfile"><parallel/settings.h></filename></link> |
| for complete details. |
| </para> |
| |
| <para> |
| A small example of tuning the default: |
| </para> |
| |
| <programlisting> |
| #include <parallel/algorithm> |
| #include <parallel/settings.h> |
| |
| int main() |
| { |
| __gnu_parallel::_Settings s; |
| s.algorithm_strategy = __gnu_parallel::force_parallel; |
| __gnu_parallel::_Settings::set(s); |
| |
| // Do work... all algorithms will be parallelized, always. |
| |
| return 0; |
| } |
| </programlisting> |
| |
| </section> |
| |
| </section> |
| |
| <section xml:id="parallel_mode.design.impl" xreflabel="Impl"><info><title>Implementation Namespaces</title></info> |
| |
| |
| <para> One namespace contain versions of code that are always |
| explicitly sequential: |
| <code>__gnu_serial</code>. |
| </para> |
| |
| <para> Two namespaces contain the parallel mode: |
| <code>std::__parallel</code> and <code>__gnu_parallel</code>. |
| </para> |
| |
| <para> Parallel implementations of standard components, including |
| template helpers to select parallelism, are defined in <code>namespace |
| std::__parallel</code>. For instance, <function>std::transform</function> from <filename class="headerfile">algorithm</filename> has a parallel counterpart in |
| <function>std::__parallel::transform</function> from <filename class="headerfile">parallel/algorithm</filename>. In addition, these parallel |
| implementations are injected into <code>namespace |
| __gnu_parallel</code> with using declarations. |
| </para> |
| |
| <para> Support and general infrastructure is in <code>namespace |
| __gnu_parallel</code>. |
| </para> |
| |
| <para> More information, and an organized index of types and functions |
| related to the parallel mode on a per-namespace basis, can be found in |
| the generated source documentation. |
| </para> |
| |
| </section> |
| |
| </section> |
| |
| <section xml:id="manual.ext.parallel_mode.test" xreflabel="Testing"><info><title>Testing</title></info> |
| <?dbhtml filename="parallel_mode_test.html"?> |
| |
| |
| <para> |
| Both the normal conformance and regression tests and the |
| supplemental performance tests work. |
| </para> |
| |
| <para> |
| To run the conformance and regression tests with the parallel mode |
| active, |
| </para> |
| |
| <screen> |
| <userinput>make check-parallel</userinput> |
| </screen> |
| |
| <para> |
| The log and summary files for conformance testing are in the |
| <filename class="directory">testsuite/parallel</filename> directory. |
| </para> |
| |
| <para> |
| To run the performance tests with the parallel mode active, |
| </para> |
| |
| <screen> |
| <userinput>make check-performance-parallel</userinput> |
| </screen> |
| |
| <para> |
| The result file for performance testing are in the |
| <filename class="directory">testsuite</filename> directory, in the file |
| <filename>libstdc++_performance.sum</filename>. In addition, the |
| policy-based containers have their own visualizations, which have |
| additional software dependencies than the usual bare-boned text |
| file, and can be generated by using the <code>make |
| doc-performance</code> rule in the testsuite's Makefile. |
| </para> |
| </section> |
| |
| <bibliography xml:id="parallel_mode.biblio"><info><title>Bibliography</title></info> |
| |
| |
| <biblioentry> |
| <citetitle> |
| Parallelization of Bulk Operations for STL Dictionaries |
| </citetitle> |
| |
| <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> |
| <author><personname><firstname>Leonor</firstname><surname>Frias</surname></personname></author> |
| |
| <copyright> |
| <year>2007</year> |
| <holder/> |
| </copyright> |
| |
| <publisher> |
| <publishername> |
| Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS) |
| </publishername> |
| </publisher> |
| </biblioentry> |
| |
| <biblioentry> |
| <citetitle> |
| The Multi-Core Standard Template Library |
| </citetitle> |
| |
| <author><personname><firstname>Johannes</firstname><surname>Singler</surname></personname></author> |
| <author><personname><firstname>Peter</firstname><surname>Sanders</surname></personname></author> |
| <author><personname><firstname>Felix</firstname><surname>Putze</surname></personname></author> |
| |
| <copyright> |
| <year>2007</year> |
| <holder/> |
| </copyright> |
| |
| <publisher> |
| <publishername> |
| Euro-Par 2007: Parallel Processing. (LNCS 4641) |
| </publishername> |
| </publisher> |
| </biblioentry> |
| |
| </bibliography> |
| |
| </chapter> |