| <chapter xmlns="http://docbook.org/ns/docbook" version="5.0" |
| xml:id="std.iterators" xreflabel="Iterators"> |
| <?dbhtml filename="iterators.html"?> |
| |
| <info><title> |
| Iterators |
| <indexterm><primary>Iterators</primary></indexterm> |
| </title> |
| <keywordset> |
| <keyword>ISO C++</keyword> |
| <keyword>library</keyword> |
| </keywordset> |
| </info> |
| |
| |
| |
| <!-- Sect1 01 : Predefined --> |
| <section xml:id="std.iterators.predefined" xreflabel="Predefined"><info><title>Predefined</title></info> |
| |
| |
| <section xml:id="iterators.predefined.vs_pointers" xreflabel="Versus Pointers"><info><title>Iterators vs. Pointers</title></info> |
| |
| <para> |
| The following |
| FAQ <link linkend="faq.iterator_as_pod">entry</link> points out that |
| iterators are not implemented as pointers. They are a generalization |
| of pointers, but they are implemented in libstdc++ as separate |
| classes. |
| </para> |
| <para> |
| Keeping that simple fact in mind as you design your code will |
| prevent a whole lot of difficult-to-understand bugs. |
| </para> |
| <para> |
| You can think of it the other way 'round, even. Since iterators |
| are a generalization, that means |
| that <emphasis>pointers</emphasis> are |
| <emphasis>iterators</emphasis>, and that pointers can be used |
| whenever an iterator would be. All those functions in the |
| Algorithms section of the Standard will work just as well on plain |
| arrays and their pointers. |
| </para> |
| <para> |
| That doesn't mean that when you pass in a pointer, it gets |
| wrapped into some special delegating iterator-to-pointer class |
| with a layer of overhead. (If you think that's the case |
| anywhere, you don't understand templates to begin with...) Oh, |
| no; if you pass in a pointer, then the compiler will instantiate |
| that template using T* as a type, and good old high-speed |
| pointer arithmetic as its operations, so the resulting code will |
| be doing exactly the same things as it would be doing if you had |
| hand-coded it yourself (for the 273rd time). |
| </para> |
| <para> |
| How much overhead <emphasis>is</emphasis> there when using an |
| iterator class? Very little. Most of the layering classes |
| contain nothing but typedefs, and typedefs are |
| "meta-information" that simply tell the compiler some |
| nicknames; they don't create code. That information gets passed |
| down through inheritance, so while the compiler has to do work |
| looking up all the names, your runtime code does not. (This has |
| been a prime concern from the beginning.) |
| </para> |
| |
| |
| </section> |
| |
| <section xml:id="iterators.predefined.end" xreflabel="end() Is One Past the End"><info><title>One Past the End</title></info> |
| |
| |
| <para>This starts off sounding complicated, but is actually very easy, |
| especially towards the end. Trust me. |
| </para> |
| <para>Beginners usually have a little trouble understand the whole |
| 'past-the-end' thing, until they remember their early algebra classes |
| (see, they <emphasis>told</emphasis> you that stuff would come in handy!) and |
| the concept of half-open ranges. |
| </para> |
| <para>First, some history, and a reminder of some of the funkier rules in |
| C and C++ for builtin arrays. The following rules have always been |
| true for both languages: |
| </para> |
| <orderedlist inheritnum="ignore" continuation="restarts"> |
| <listitem> |
| <para>You can point anywhere in the array, <emphasis>or to the first element |
| past the end of the array</emphasis>. A pointer that points to one |
| past the end of the array is guaranteed to be as unique as a |
| pointer to somewhere inside the array, so that you can compare |
| such pointers safely. |
| </para> |
| </listitem> |
| <listitem> |
| <para>You can only dereference a pointer that points into an array. |
| If your array pointer points outside the array -- even to just |
| one past the end -- and you dereference it, Bad Things happen. |
| </para> |
| </listitem> |
| <listitem> |
| <para>Strictly speaking, simply pointing anywhere else invokes |
| undefined behavior. Most programs won't puke until such a |
| pointer is actually dereferenced, but the standards leave that |
| up to the platform. |
| </para> |
| </listitem> |
| </orderedlist> |
| <para>The reason this past-the-end addressing was allowed is to make it |
| easy to write a loop to go over an entire array, e.g., |
| while (*d++ = *s++);. |
| </para> |
| <para>So, when you think of two pointers delimiting an array, don't think |
| of them as indexing 0 through n-1. Think of them as <emphasis>boundary |
| markers</emphasis>: |
| </para> |
| <programlisting> |
| |
| beginning end |
| | | |
| | | This is bad. Always having to |
| | | remember to add or subtract one. |
| | | Off-by-one bugs very common here. |
| V V |
| array of N elements |
| |---|---|--...--|---|---| |
| | 0 | 1 | ... |N-2|N-1| |
| |---|---|--...--|---|---| |
| |
| ^ ^ |
| | | |
| | | This is good. This is safe. This |
| | | is guaranteed to work. Just don't |
| | | dereference 'end'. |
| beginning end |
| |
| </programlisting> |
| <para>See? Everything between the boundary markers is chapter of the array. |
| Simple. |
| </para> |
| <para>Now think back to your junior-high school algebra course, when you |
| were learning how to draw graphs. Remember that a graph terminating |
| with a solid dot meant, "Everything up through this point," |
| and a graph terminating with an open dot meant, "Everything up |
| to, but not including, this point," respectively called closed |
| and open ranges? Remember how closed ranges were written with |
| brackets, <emphasis>[a,b]</emphasis>, and open ranges were written with parentheses, |
| <emphasis>(a,b)</emphasis>? |
| </para> |
| <para>The boundary markers for arrays describe a <emphasis>half-open range</emphasis>, |
| starting with (and including) the first element, and ending with (but |
| not including) the last element: <emphasis>[beginning,end)</emphasis>. See, I |
| told you it would be simple in the end. |
| </para> |
| <para>Iterators, and everything working with iterators, follows this same |
| time-honored tradition. A container's <code>begin()</code> method returns |
| an iterator referring to the first element, and its <code>end()</code> |
| method returns a past-the-end iterator, which is guaranteed to be |
| unique and comparable against any other iterator pointing into the |
| middle of the container. |
| </para> |
| <para>Container constructors, container methods, and algorithms, all take |
| pairs of iterators describing a range of values on which to operate. |
| All of these ranges are half-open ranges, so you pass the beginning |
| iterator as the starting parameter, and the one-past-the-end iterator |
| as the finishing parameter. |
| </para> |
| <para>This generalizes very well. You can operate on sub-ranges quite |
| easily this way; functions accepting a <emphasis>[first,last)</emphasis> range |
| don't know or care whether they are the boundaries of an entire {array, |
| sequence, container, whatever}, or whether they only enclose a few |
| elements from the center. This approach also makes zero-length |
| sequences very simple to recognize: if the two endpoints compare |
| equal, then the {array, sequence, container, whatever} is empty. |
| </para> |
| <para>Just don't dereference <code>end()</code>. |
| </para> |
| |
| </section> |
| </section> |
| |
| <!-- Sect1 02 : Stream --> |
| |
| </chapter> |