| \input texinfo @c -*-texinfo-*- |
| |
| @c %**start of header |
| @setfilename libitm.info |
| @settitle GNU libitm |
| @c %**end of header |
| |
| |
| @copying |
| Copyright @copyright{} 2011-2018 Free Software Foundation, Inc. |
| |
| Permission is granted to copy, distribute and/or modify this document |
| under the terms of the GNU Free Documentation License, Version 1.2 or |
| any later version published by the Free Software Foundation; with no |
| Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. |
| A copy of the license is included in the section entitled ``GNU |
| Free Documentation License''. |
| @end copying |
| |
| @ifinfo |
| @dircategory GNU Libraries |
| @direntry |
| * libitm: (libitm). GNU Transactional Memory Library |
| @end direntry |
| |
| This manual documents the GNU Transactional Memory Library. |
| |
| @insertcopying |
| @end ifinfo |
| |
| |
| @setchapternewpage odd |
| |
| @titlepage |
| @title The GNU Transactional Memory Library |
| @page |
| @vskip 0pt plus 1filll |
| @comment For the @value{version-GCC} Version* |
| @sp 1 |
| @insertcopying |
| @end titlepage |
| |
| @summarycontents |
| @contents |
| @page |
| |
| |
| @node Top |
| @top Introduction |
| @cindex Introduction |
| |
| This manual documents the usage and internals of libitm, the GNU Transactional |
| Memory Library. It provides transaction support for accesses to a process' |
| memory, enabling easy-to-use synchronization of accesses to shared memory by |
| several threads. |
| |
| |
| @comment |
| @comment When you add a new menu item, please keep the right hand |
| @comment aligned to the same column. Do not use tabs. This provides |
| @comment better formatting. |
| @comment |
| @menu |
| * Enabling libitm:: How to enable libitm for your applications. |
| * C/C++ Language Constructs for TM:: |
| Notes on the language-level interface supported |
| by gcc. |
| * The libitm ABI:: Notes on the external ABI provided by libitm. |
| * Internals:: Notes on libitm's internal synchronization. |
| * GNU Free Documentation License:: |
| How you can copy and share this manual. |
| * Library Index:: Index of this documentation. |
| @end menu |
| |
| |
| @c --------------------------------------------------------------------- |
| @c Enabling libitm |
| @c --------------------------------------------------------------------- |
| |
| @node Enabling libitm |
| @chapter Enabling libitm |
| |
| To activate support for TM in C/C++, the compile-time flag @option{-fgnu-tm} |
| must be specified. This enables TM language-level constructs such as |
| transaction statements (e.g., @code{__transaction_atomic}, @pxref{C/C++ |
| Language Constructs for TM} for details). |
| |
| @c --------------------------------------------------------------------- |
| @c C/C++ Language Constructs for TM |
| @c --------------------------------------------------------------------- |
| |
| @node C/C++ Language Constructs for TM |
| @chapter C/C++ Language Constructs for TM |
| |
| Transactions are supported in C++ and C in the form of transaction statements, |
| transaction expressions, and function transactions. In the following example, |
| both @code{a} and @code{b} will be read and the difference will be written to |
| @code{c}, all atomically and isolated from other transactions: |
| |
| @example |
| __transaction_atomic @{ c = a - b; @} |
| @end example |
| |
| Therefore, another thread can use the following code to concurrently update |
| @code{b} without ever causing @code{c} to hold a negative value (and without |
| having to use other synchronization constructs such as locks or C++11 |
| atomics): |
| |
| @example |
| __transaction_atomic @{ if (a > b) b++; @} |
| @end example |
| |
| GCC follows the @uref{https://sites.google.com/site/tmforcplusplus/, Draft |
| Specification of Transactional Language Constructs for C++ (v1.1)} in its |
| implementation of transactions. |
| |
| The precise semantics of transactions are defined in terms of the C++11/C11 |
| memory model (see the specification). Roughly, transactions provide |
| synchronization guarantees that are similar to what would be guaranteed when |
| using a single global lock as a guard for all transactions. Note that like |
| other synchronization constructs in C/C++, transactions rely on a |
| data-race-free program (e.g., a nontransactional write that is concurrent |
| with a transactional read to the same memory location is a data race). |
| |
| @c --------------------------------------------------------------------- |
| @c The libitm ABI |
| @c --------------------------------------------------------------------- |
| |
| @node The libitm ABI |
| @chapter The libitm ABI |
| |
| The ABI provided by libitm is basically equal to the Linux variant of Intel's |
| current TM ABI specification document (Revision 1.1, May 6 2009) but with the |
| differences listed in this chapter. It would be good if these changes would |
| eventually be merged into a future version of this specification. To ease |
| look-up, the following subsections mirror the structure of this specification. |
| |
| @section [No changes] Objectives |
| @section [No changes] Non-objectives |
| |
| @section Library design principles |
| @subsection [No changes] Calling conventions |
| @subsection [No changes] TM library algorithms |
| @subsection [No changes] Optimized load and store routines |
| @subsection [No changes] Aligned load and store routines |
| |
| @subsection Data logging functions |
| |
| The memory locations accessed with transactional loads and stores and the |
| memory locations whose values are logged must not overlap. This required |
| separation only extends to the scope of the execution of one transaction |
| including all the executions of all nested transactions. |
| |
| The compiler must be consistent (within the scope of a single transaction) |
| about which memory locations are shared and which are not shared with other |
| threads (i.e., data must be accessed either transactionally or |
| nontransactionally). Otherwise, non-write-through TM algorithms would not work. |
| |
| For memory locations on the stack, this requirement extends to only the |
| lifetime of the stack frame that the memory location belongs to (or the |
| lifetime of the transaction, whichever is shorter). Thus, memory that is |
| reused for several stack frames could be target of both data logging and |
| transactional accesses; however, this is harmless because these stack frames' |
| lifetimes will end before the transaction finishes. |
| |
| @subsection [No changes] Scatter/gather calls |
| @subsection [No changes] Serial and irrevocable mode |
| @subsection [No changes] Transaction descriptor |
| @subsection Store allocation |
| |
| There is no @code{getTransaction} function. |
| |
| @subsection [No changes] Naming conventions |
| |
| @subsection Function pointer encryption |
| |
| Currently, this is not implemented. |
| |
| |
| @section Types and macros list |
| |
| @code{_ITM_codeProperties} has changed, @pxref{txn-code-properties,,Starting a |
| transaction}. |
| @code{_ITM_srcLocation} is not used. |
| |
| |
| @section Function list |
| |
| @subsection Initialization and finalization functions |
| These functions are not part of the ABI. |
| |
| @subsection [No changes] Version checking |
| @subsection [No changes] Error reporting |
| @subsection [No changes] inTransaction call |
| |
| @subsection State manipulation functions |
| There is no @code{getTransaction} function. Transaction identifiers for |
| nested transactions will be ordered but not necessarily sequential (i.e., for |
| a nested transaction's identifier @var{IN} and its enclosing transaction's |
| identifier @var{IE}, it is guaranteed that @math{IN >= IE}). |
| |
| @subsection [No changes] Source locations |
| |
| @subsection Starting a transaction |
| |
| @subsubsection Transaction code properties |
| |
| @anchor{txn-code-properties} |
| The bit @code{hasNoXMMUpdate} is instead called @code{hasNoVectorUpdate}. |
| Iff it is set, vector register save/restore is not necessary for any target |
| machine. |
| |
| The @code{hasNoFloatUpdate} bit (@code{0x0010}) is new. Iff it is set, floating |
| point register save/restore is not necessary for any target machine. |
| |
| @code{undoLogCode} is not supported and a fatal runtime error will be raised |
| if this bit is set. It is not properly defined in the ABI why barriers |
| other than undo logging are not present; Are they not necessary (e.g., a |
| transaction operating purely on thread-local data) or have they been omitted by |
| the compiler because it thinks that some kind of global synchronization |
| (e.g., serial mode) might perform better? The specification suggests that the |
| latter might be the case, but the former seems to be more useful. |
| |
| The @code{readOnly} bit (@code{0x4000}) is new. @strong{TODO} Lexical or dynamic |
| scope? |
| |
| @code{hasNoRetry} is not supported. If this bit is not set, but |
| @code{hasNoAbort} is set, the library can assume that transaction |
| rollback will not be requested. |
| |
| It would be useful if the absence of externally-triggered rollbacks would be |
| reported for the dynamic scope as well, not just for the lexical scope |
| (@code{hasNoAbort}). Without this, a library cannot exploit this together |
| with flat nesting. |
| |
| @code{exceptionBlock} is not supported because exception blocks are not used. |
| |
| @subsubsection [No changes] Windows exception state |
| @subsubsection [No changes] Other machine state |
| |
| @subsubsection [No changes] Results from beginTransaction |
| |
| @subsection Aborting a transaction |
| |
| @code{_ITM_rollbackTransaction} is not supported. @code{_ITM_abortTransaction} |
| is supported but the abort reasons @code{exceptionBlockAbort}, |
| @code{TMConflict}, and @code{userRetry} are not supported. There are no |
| exception blocks in general, so the related cases also do not have to be |
| considered. To encode @code{__transaction_cancel [[outer]]}, compilers must |
| set the new @code{outerAbort} bit (@code{0x10}) additionally to the |
| @code{userAbort} bit in the abort reason. |
| |
| @subsection Committing a transaction |
| |
| The exception handling (EH) scheme is different. The Intel ABI requires the |
| @code{_ITM_tryCommitTransaction} function that will return even when the |
| commit failed and will have to be matched with calls to either |
| @code{_ITM_abortTransaction} or @code{_ITM_commitTransaction}. In contrast, |
| gcc relies on transactional wrappers for the functions of the Exception |
| Handling ABI and on one additional commit function (shown below). This allows |
| the TM to keep track of EH internally and thus it does not have to embed the |
| cleanup of EH state into the existing EH code in the program. |
| @code{_ITM_tryCommitTransaction} is not supported. |
| @code{_ITM_commitTransactionToId} is also not supported because the |
| propagation of thrown exceptions will not bypass commits of nested |
| transactions. |
| |
| @example |
| void _ITM_commitTransactionEH(void *exc_ptr) ITM_REGPARM; |
| void *_ITM_cxa_allocate_exception (size_t); |
| void _ITM_cxa_free_exception (void *exc_ptr); |
| void _ITM_cxa_throw (void *obj, void *tinfo, void *dest); |
| void *_ITM_cxa_begin_catch (void *exc_ptr); |
| void _ITM_cxa_end_catch (void); |
| @end example |
| |
| The EH scheme changed in version 6 of GCC. Previously, the compiler |
| added a call to @code{_ITM_commitTransactionEH} to commit a transaction if |
| an exception could be in flight at this position in the code; @code{exc_ptr} is |
| the address of the current exception and must be non-zero. Now, the |
| compiler must catch all exceptions that are about to be thrown out of a |
| transaction and call @code{_ITM_commitTransactionEH} from the catch clause, |
| with @code{exc_ptr} being zero. |
| |
| Note that the old EH scheme never worked completely in GCC's implementation; |
| libitm currently does not try to be compatible with the old scheme. |
| |
| The @code{_ITM_cxa...} functions are transactional wrappers for the respective |
| @code{__cxa...} functions and must be called instead of these in transactional |
| code. @code{_ITM_cxa_free_exception} is new in GCC 6. |
| |
| To support this EH scheme, libstdc++ needs to provide one additional function |
| (@code{_cxa_tm_cleanup}), which is used by the TM to clean up the exception |
| handling state while rolling back a transaction: |
| |
| @example |
| void __cxa_tm_cleanup (void *unthrown_obj, void *cleanup_exc, |
| unsigned int caught_count); |
| @end example |
| |
| Since GCC 6, @code{unthrown_obj} is not used anymore and always null; |
| prior to that, @code{unthrown_obj} is non-null if the program called |
| @code{__cxa_allocate_exception} for this exception but did not yet called |
| @code{__cxa_throw} for it. @code{cleanup_exc} is non-null if the program is |
| currently processing a cleanup along an exception path but has not caught this |
| exception yet. @code{caught_count} is the nesting depth of |
| @code{__cxa_begin_catch} within the transaction (which can be counted by the TM |
| using @code{_ITM_cxa_begin_catch} and @code{_ITM_cxa_end_catch}); |
| @code{__cxa_tm_cleanup} then performs rollback by essentially performing |
| @code{__cxa_end_catch} that many times. |
| |
| |
| |
| @subsection Exception handling support |
| |
| Currently, there is no support for functionality like |
| @code{__transaction_cancel throw} as described in the C++ TM specification. |
| Supporting this should be possible with the EH scheme explained previously |
| because via the transactional wrappers for the EH ABI, the TM is able to |
| observe and intercept EH. |
| |
| |
| @subsection [No changes] Transition to serial--irrevocable mode |
| @subsection [No changes] Data transfer functions |
| @subsection [No changes] Transactional memory copies |
| |
| @subsection Transactional versions of memmove |
| |
| If either the source or destination memory region is to be accessed |
| nontransactionally, then source and destination regions must not be |
| overlapping. The respective @code{_ITM_memmove} functions are still |
| available but a fatal runtime error will be raised if such regions do overlap. |
| To support this functionality, the ABI would have to specify how the |
| intersection of the regions has to be accessed (i.e., transactionally or |
| nontransactionally). |
| |
| @subsection [No changes] Transactional versions of memset |
| @subsection [No changes] Logging functions |
| |
| @subsection User-registered commit and undo actions |
| |
| Commit actions will get executed in the same order in which the respective |
| calls to @code{_ITM_addUserCommitAction} happened. Only |
| @code{_ITM_noTransactionId} is allowed as value for the |
| @code{resumingTransactionId} argument. Commit actions get executed after |
| privatization safety has been ensured. |
| |
| Undo actions will get executed in reverse order compared to the order in which |
| the respective calls to @code{_ITM_addUserUndoAction} happened. The ordering of |
| undo actions w.r.t. the roll-back of other actions (e.g., data transfers or |
| memory allocations) is undefined. |
| |
| @code{_ITM_getThreadnum} is not supported currently because its only purpose |
| is to provide a thread ID that matches some assumed performance tuning output, |
| but this output is not part of the ABI nor further defined by it. |
| |
| @code{_ITM_dropReferences} is not supported currently because its semantics and |
| the intention behind it is not entirely clear. The |
| specification suggests that this function is necessary because of certain |
| orderings of data transfer undos and the releasing of memory regions (i.e., |
| privatization). However, this ordering is never defined, nor is the ordering of |
| dropping references w.r.t. other events. |
| |
| @subsection [New] Transactional indirect calls |
| |
| Indirect calls (i.e., calls through a function pointer) within transactions |
| should execute the transactional clone of the original function (i.e., a clone |
| of the original that has been fully instrumented to use the TM runtime), if |
| such a clone is available. The runtime provides two functions to |
| register/deregister clone tables: |
| |
| @example |
| struct clone_entry |
| @{ |
| void *orig, *clone; |
| @}; |
| |
| void _ITM_registerTMCloneTable (clone_entry *table, size_t entries); |
| void _ITM_deregisterTMCloneTable (clone_entry *table); |
| @end example |
| |
| Registered tables must be writable by the TM runtime, and must be live |
| throughout the life-time of the TM runtime. |
| |
| @strong{TODO} The intention was always to drop the registration functions |
| entirely, and create a new ELF Phdr describing the linker-sorted table. Much |
| like what currently happens for @code{PT_GNU_EH_FRAME}. |
| This work kept getting bogged down in how to represent the @var{N} different |
| code generation variants. We clearly needed at least two---SW and HW |
| transactional clones---but there was always a suggestion of more variants for |
| different TM assumptions/invariants. |
| |
| The compiler can then use two TM runtime functions to perform indirect calls in |
| transactions: |
| @example |
| void *_ITM_getTMCloneOrIrrevocable (void *function) ITM_REGPARM; |
| void *_ITM_getTMCloneSafe (void *function) ITM_REGPARM; |
| @end example |
| |
| If there is a registered clone for supplied function, both will return a |
| pointer to the clone. If not, the first runtime function will attempt to switch |
| to serial--irrevocable mode and return the original pointer, whereas the second |
| will raise a fatal runtime error. |
| |
| @subsection [New] Transactional dynamic memory management |
| |
| @example |
| void *_ITM_malloc (size_t) |
| __attribute__((__malloc__)) ITM_PURE; |
| void *_ITM_calloc (size_t, size_t) |
| __attribute__((__malloc__)) ITM_PURE; |
| void _ITM_free (void *) ITM_PURE; |
| @end example |
| |
| These functions are essentially transactional wrappers for @code{malloc}, |
| @code{calloc}, and @code{free}. Within transactions, the compiler should |
| replace calls to the original functions with calls to the wrapper functions. |
| |
| libitm also provides transactional clones of C++ memory management functions |
| such as global operator new and delete. They are part of libitm for historic |
| reasons but do not need to be part of this ABI. |
| |
| |
| @section [No changes] Future Enhancements to the ABI |
| |
| @section Sample code |
| |
| The code examples might not be correct w.r.t. the current version of the ABI, |
| especially everything related to exception handling. |
| |
| |
| @section [New] Memory model |
| |
| The ABI should define a memory model and the ordering that is guaranteed for |
| data transfers and commit/undo actions, or at least refer to another memory |
| model that needs to be preserved. Without that, the compiler cannot ensure the |
| memory model specified on the level of the programming language (e.g., by the |
| C++ TM specification). |
| |
| For example, if a transactional load is ordered before another load/store, then |
| the TM runtime must also ensure this ordering when accessing shared state. If |
| not, this might break the kind of publication safety used in the C++ TM |
| specification. Likewise, the TM runtime must ensure privatization safety. |
| |
| |
| |
| @c --------------------------------------------------------------------- |
| @c Internals |
| @c --------------------------------------------------------------------- |
| |
| @node Internals |
| @chapter Internals |
| |
| @section TM methods and method groups |
| |
| libitm supports several ways of synchronizing transactions with each other. |
| These TM methods (or TM algorithms) are implemented in the form of |
| subclasses of @code{abi_dispatch}, which provide methods for |
| transactional loads and stores as well as callbacks for rollback and commit. |
| All methods that are compatible with each other (i.e., that let concurrently |
| running transactions still synchronize correctly even if different methods |
| are used) belong to the same TM method group. Pointers to TM methods can be |
| obtained using the factory methods prefixed with @code{dispatch_} in |
| @file{libitm_i.h}. There are two special methods, @code{dispatch_serial} and |
| @code{dispatch_serialirr}, that are compatible with all methods because they |
| run transactions completely in serial mode. |
| |
| @subsection TM method life cycle |
| |
| The state of TM methods does not change after construction, but they do alter |
| the state of transactions that use this method. However, because |
| per-transaction data gets used by several methods, @code{gtm_thread} is |
| responsible for setting an initial state that is useful for all methods. |
| After that, methods are responsible for resetting/clearing this state on each |
| rollback or commit (of outermost transactions), so that the transaction |
| executed next is not affected by the previous transaction. |
| |
| There is also global state associated with each method group, which is |
| initialized and shut down (@code{method_group::init()} and @code{fini()}) |
| when switching between method groups (see @file{retry.cc}). |
| |
| @subsection Selecting the default method |
| |
| The default method that libitm uses for freshly started transactions (but |
| not necessarily for restarted transactions) can be set via an environment |
| variable (@env{ITM_DEFAULT_METHOD}), whose value should be equal to the name |
| of one of the factory methods returning abi_dispatch subclasses but without |
| the "dispatch_" prefix (e.g., "serialirr" instead of |
| @code{GTM::dispatch_serialirr()}). |
| |
| Note that this environment variable is only a hint for libitm and might not |
| be supported in the future. |
| |
| |
| @section Nesting: flat vs. closed |
| |
| We support two different kinds of nesting of transactions. In the case of |
| @emph{flat nesting}, the nesting structure is flattened and all nested |
| transactions are subsumed by the enclosing transaction. In contrast, |
| with @emph{closed nesting}, nested transactions that have not yet committed |
| can be rolled back separately from the enclosing transactions; when they |
| commit, they are subsumed by the enclosing transaction, and their effects |
| will be finally committed when the outermost transaction commits. |
| @emph{Open nesting} (where nested transactions can commit independently of the |
| enclosing transactions) are not supported. |
| |
| Flat nesting is the default nesting mode, but closed nesting is supported and |
| used when transactions contain user-controlled aborts |
| (@code{__transaction_cancel} statements). We assume that user-controlled |
| aborts are rare in typical code and used mostly in exceptional situations. |
| Thus, it makes more sense to use flat nesting by default to avoid the |
| performance overhead of the additional checkpoints required for closed |
| nesting. User-controlled aborts will correctly abort the innermost enclosing |
| transaction, whereas the whole (i.e., outermost) transaction will be restarted |
| otherwise (e.g., when a transaction encounters data conflicts during |
| optimistic execution). |
| |
| |
| @section Locking conventions |
| |
| This section documents the locking scheme and rules for all uses of locking |
| in libitm. We have to support serial(-irrevocable) mode, which is implemented |
| using a global lock as explained next (called the @emph{serial lock}). To |
| simplify the overall design, we use the same lock as catch-all locking |
| mechanism for other infrequent tasks such as (de)registering clone tables or |
| threads. Besides the serial lock, there are @emph{per-method-group locks} that |
| are managed by specific method groups (i.e., groups of similar TM concurrency |
| control algorithms), and lock-like constructs for quiescence-based operations |
| such as ensuring privatization safety. |
| |
| Thus, the actions that participate in the libitm-internal locking are either |
| @emph{active transactions} that do not run in serial mode, @emph{serial |
| transactions} (which (are about to) run in serial mode), and management tasks |
| that do not execute within a transaction but have acquired the serial mode |
| like a serial transaction would do (e.g., to be able to register threads with |
| libitm). Transactions become active as soon as they have successfully used the |
| serial lock to announce this globally (@pxref{serial-lock-impl,,Serial lock |
| implementation}). Likewise, transactions become serial transactions as soon as |
| they have acquired the exclusive rights provided by the serial lock (i.e., |
| serial mode, which also means that there are no other concurrent active or |
| serial transactions). Note that active transactions can become serial |
| transactions when they enter serial mode during the runtime of the |
| transaction. |
| |
| @subsection State-to-lock mapping |
| |
| Application data is protected by the serial lock if there is a serial |
| transaction and no concurrently running active transaction (i.e., non-serial). |
| Otherwise, application data is protected by the currently selected method |
| group, which might use per-method-group locks or other mechanisms. Also note |
| that application data that is about to be privatized might not be allowed to be |
| accessed by nontransactional code until privatization safety has been ensured; |
| the details of this are handled by the current method group. |
| |
| libitm-internal state is either protected by the serial lock or accessed |
| through custom concurrent code. The latter applies to the public/shared part |
| of a transaction object and most typical method-group-specific state. |
| |
| The former category (protected by the serial lock) includes: |
| @itemize @bullet |
| @item The list of active threads that have used transactions. |
| @item The tables that map functions to their transactional clones. |
| @item The current selection of which method group to use. |
| @item Some method-group-specific data, or invariants of this data. For example, |
| resetting a method group to its initial state is handled by switching to the |
| same method group, so the serial lock protects such resetting as well. |
| @end itemize |
| In general, such state is immutable whenever there exists an active |
| (non-serial) transaction. If there is no active transaction, a serial |
| transaction (or a thread that is not currently executing a transaction but has |
| acquired the serial lock) is allowed to modify this state (but must of course |
| be careful to not surprise the current method group's implementation with such |
| modifications). |
| |
| @subsection Lock acquisition order |
| |
| To prevent deadlocks, locks acquisition must happen in a globally agreed-upon |
| order. Note that this applies to other forms of blocking too, but does not |
| necessarily apply to lock acquisitions that do not block (e.g., trylock() |
| calls that do not get retried forever). Note that serial transactions are |
| never return back to active transactions until the transaction has committed. |
| Likewise, active transactions stay active until they have committed. |
| Per-method-group locks are typically also not released before commit. |
| |
| Lock acquisition / blocking rules: |
| @itemize @bullet |
| |
| @item Transactions must become active or serial before they are allowed to |
| use method-group-specific locks or blocking (i.e., the serial lock must be |
| acquired before those other locks, either in serial or nonserial mode). |
| |
| @item Any number of threads that do not currently run active transactions can |
| block while trying to get the serial lock in exclusive mode. Note that active |
| transactions must not block when trying to upgrade to serial mode unless there |
| is no other transaction that is trying that (the latter is ensured by the |
| serial lock implementation. |
| |
| @item Method groups must prevent deadlocks on their locks. In particular, they |
| must also be prepared for another active transaction that has acquired |
| method-group-specific locks but is blocked during an attempt to upgrade to |
| being a serial transaction. See below for details. |
| |
| @item Serial transactions can acquire method-group-specific locks because there |
| will be no other active nor serial transaction. |
| |
| @end itemize |
| |
| There is no single rule for per-method-group blocking because this depends on |
| when a TM method might acquire locks. If no active transaction can upgrade to |
| being a serial transaction after it has acquired per-method-group locks (e.g., |
| when those locks are only acquired during an attempt to commit), then the TM |
| method does not need to consider a potential deadlock due to serial mode. |
| |
| If there can be upgrades to serial mode after the acquisition of |
| per-method-group locks, then TM methods need to avoid those deadlocks: |
| @itemize @bullet |
| @item When upgrading to a serial transaction, after acquiring exclusive rights |
| to the serial lock but before waiting for concurrent active transactions to |
| finish (@pxref{serial-lock-impl,,Serial lock implementation} for details), |
| we have to wake up all active transactions waiting on the upgrader's |
| per-method-group locks. |
| @item Active transactions blocking on per-method-group locks need to check the |
| serial lock and abort if there is a pending serial transaction. |
| @item Lost wake-ups have to be prevented (e.g., by changing a bit in each |
| per-method-group lock before doing the wake-up, and only blocking on this lock |
| using a futex if this bit is not group). |
| @end itemize |
| |
| @strong{TODO}: Can reuse serial lock for gl-*? And if we can, does it make |
| sense to introduce further complexity in the serial lock? For gl-*, we can |
| really only avoid an abort if we do -wb and -vbv. |
| |
| |
| @subsection Serial lock implementation |
| @anchor{serial-lock-impl} |
| |
| The serial lock implementation is optimized towards assuming that serial |
| transactions are infrequent and not the common case. However, the performance |
| of entering serial mode can matter because when only few transactions are run |
| concurrently or if there are few threads, then it can be efficient to run |
| transactions serially. |
| |
| The serial lock is similar to a multi-reader-single-writer lock in that there |
| can be several active transactions but only one serial transaction. However, |
| we do want to avoid contention (in the lock implementation) between active |
| transactions, so we split up the reader side of the lock into per-transaction |
| flags that are true iff the transaction is active. The exclusive writer side |
| remains a shared single flag, which is acquired using a CAS, for example. |
| On the fast-path, the serial lock then works similar to Dekker's algorithm but |
| with several reader flags that a serial transaction would have to check. |
| A serial transaction thus requires a list of all threads with potentially |
| active transactions; we can use the serial lock itself to protect this list |
| (i.e., only threads that have acquired the serial lock can modify this list). |
| |
| We want starvation-freedom for the serial lock to allow for using it to ensure |
| progress for potentially starved transactions (@pxref{progress-guarantees,, |
| Progress Guarantees} for details). However, this is currently not enforced by |
| the implementation of the serial lock. |
| |
| Here is pseudo-code for the read/write fast paths of acquiring the serial |
| lock (read-to-write upgrade is similar to write_lock: |
| @example |
| // read_lock: |
| tx->shared_state |= active; |
| __sync_synchronize(); // or STLD membar, or C++0x seq-cst fence |
| while (!serial_lock.exclusive) |
| if (spinning_for_too_long) goto slowpath; |
| |
| // write_lock: |
| if (CAS(&serial_lock.exclusive, 0, this) != 0) |
| goto slowpath; // writer-writer contention |
| // need a membar here, but CAS already has full membar semantics |
| bool need_blocking = false; |
| for (t: all txns) |
| @{ |
| for (;t->shared_state & active;) |
| if (spinning_for_too_long) @{ need_blocking = true; break; @} |
| @} |
| if (need_blocking) goto slowpath; |
| @end example |
| |
| Releasing a lock in this spin-lock version then just consists of resetting |
| @code{tx->shared_state} to inactive or clearing @code{serial_lock.exclusive}. |
| |
| However, we can't rely on a pure spinlock because we need to get the OS |
| involved at some time (e.g., when there are more threads than CPUs to run on). |
| Therefore, the real implementation falls back to a blocking slow path, either |
| based on pthread mutexes or Linux futexes. |
| |
| |
| @subsection Reentrancy |
| |
| libitm has to consider the following cases of reentrancy: |
| @itemize @bullet |
| |
| @item Transaction calls unsafe code that starts a new transaction: The outer |
| transaction will become a serial transaction before executing unsafe code. |
| Therefore, nesting within serial transactions must work, even if the nested |
| transaction is called from within uninstrumented code. |
| |
| @item Transaction calls either a transactional wrapper or safe code, which in |
| turn starts a new transaction: It is not yet defined in the specification |
| whether this is allowed. Thus, it is undefined whether libitm supports this. |
| |
| @item Code that starts new transactions might be called from within any part |
| of libitm: This kind of reentrancy would likely be rather complex and can |
| probably be avoided. Therefore, it is not supported. |
| |
| @end itemize |
| |
| @subsection Privatization safety |
| |
| Privatization safety is ensured by libitm using a quiescence-based approach. |
| Basically, a privatizing transaction waits until all concurrent active |
| transactions will either have finished (are not active anymore) or operate on |
| a sufficiently recent snapshot to not access the privatized data anymore. This |
| happens after the privatizing transaction has stopped being an active |
| transaction, so waiting for quiescence does not contribute to deadlocks. |
| |
| In method groups that need to ensure publication safety explicitly, active |
| transactions maintain a flag or timestamp in the public/shared part of the |
| transaction descriptor. Before blocking, privatizers need to let the other |
| transactions know that they should wake up the privatizer. |
| |
| @strong{TODO} Ho to implement the waiters? Should those flags be |
| per-transaction or at a central place? We want to avoid one wake/wait call |
| per active transactions, so we might want to use either a tree or combining |
| to reduce the syscall overhead, or rather spin for a long amount of time |
| instead of doing blocking. Also, it would be good if only the last transaction |
| that the privatizer waits for would do the wake-up. |
| |
| @subsection Progress guarantees |
| @anchor{progress-guarantees} |
| |
| Transactions that do not make progress when using the current TM method will |
| eventually try to execute in serial mode. Thus, the serial lock's progress |
| guarantees determine the progress guarantees of the whole TM. Obviously, we at |
| least need deadlock-freedom for the serial lock, but it would also be good to |
| provide starvation-freedom (informally, all threads will finish executing a |
| transaction eventually iff they get enough cycles). |
| |
| However, the scheduling of transactions (e.g., thread scheduling by the OS) |
| also affects the handling of progress guarantees by the TM. First, the TM |
| can only guarantee deadlock-freedom if threads do not get stopped. Likewise, |
| low-priority threads can starve if they do not get scheduled when other |
| high-priority threads get those cycles instead. |
| |
| If all threads get scheduled eventually, correct lock implementations will |
| provide deadlock-freedom, but might not provide starvation-freedom. We can |
| either enforce the latter in the TM's lock implementation, or assume that |
| the scheduling is sufficiently random to yield a probabilistic guarantee that |
| no thread will starve (because eventually, a transaction will encounter a |
| scheduling that will allow it to run). This can indeed work well in practice |
| but is not necessarily guaranteed to work (e.g., simple spin locks can be |
| pretty efficient). |
| |
| Because enforcing stronger progress guarantees in the TM has a higher runtime |
| overhead, we focus on deadlock-freedom right now and assume that the threads |
| will get scheduled eventually by the OS (but don't consider threads with |
| different priorities). We should support starvation-freedom for serial |
| transactions in the future. Everything beyond that is highly related to proper |
| contention management across all of the TM (including with TM method to |
| choose), and is future work. |
| |
| @strong{TODO} Handling thread priorities: We want to avoid priority inversion |
| but it's unclear how often that actually matters in practice. Workloads that |
| have threads with different priorities will likely also require lower latency |
| or higher throughput for high-priority threads. Therefore, it probably makes |
| not that much sense (except for eventual progress guarantees) to use |
| priority inheritance until the TM has priority-aware contention management. |
| |
| |
| @c --------------------------------------------------------------------- |
| @c GNU Free Documentation License |
| @c --------------------------------------------------------------------- |
| |
| @include fdl.texi |
| |
| @c --------------------------------------------------------------------- |
| @c Index |
| @c --------------------------------------------------------------------- |
| |
| @node Library Index |
| @unnumbered Library Index |
| |
| @printindex cp |
| |
| @bye |