Collections with stable iterators

Oh oracle of the interwebs –

I am writing a GPCE paper, and one of the contributions is a model of sequences (AKA collections) that has stable unshifting positions. Standard collections use an array abstraction where access is by integer ordinal position, which shifts under insertion and deletion. This manifests as the fact that iterators are broken by insertion and deletion operations. Linked list iterators also break under deletion.

Does anyone know of alternative proposals with stable positions/iterators? I don’t know of any, but I can’t be the first person to do this.

[Thomas lord points out Emacs markers. I also know about Operational Transformation and Darcs. But have such ideas ever been abstracted into a general purpose collection class?]

[OK – I seem to have confused everyone. Lets say you have the sequence: {a,b,c,d}. You are iterating along and see a, then b. Just then, some other code drives by and decides to insert something at the front: {foo, a, b, c, d}.
Or maybe they decide to delete the first element, leaving: {b, c, d}. Then you advance your iterator. In all cases you should get c as the result, not b, not d, and not an InvalidState exception.]

[Thanks, everyone. Got some great references to related work. Maybe I should call this “blogademics”. Some people have asked about the use case for such a thing. It is to abstract editing operations. Collaborative editors have this problem, because edits can’t be propagated when their location is unstable. Version control systems like Darcs need something like it reorder patches. And I need it to express composable transformations on code that don’t shift each other’s position.]

14 Replies to “Collections with stable iterators”

  1. I think it depends on what you mean by “position” and “iterator”. What are the operations that can be performed on these values, w.r.t the collection and each other?

    [Nat – I want an iterator that keeps pointing at the same place in the sequence even after insertions and deletions. – Jonathan]

  2. What else can I do with an iterator apart from point?

    Can I move the iterator forwards and backwards? E.g., is it stateful? Or does the “next” operation (if there is one) return a new iterator?

    And what does “insert” and “delete” mean? Are collections mutable or not?

    Must I use an iterator to delete an element or is it done through the collections API? If the latter, how are elements referred to, if not by iterators — an index? What is the difference between an index and an iterator? Are indices stable w.r.t. deletion and insertion? Can I convert an index into an iterator and vice versa? Can I find the distance between two iterators? Can I use two iterators to identify a range?

    What does “keep pointing at the same place” mean when you delete the element that is pointed at by the iterator?

    There is prior art, but what is relevant depends on the features you’ve selected for your iterator and collection abstractions. For example, functional languages mostly sidestep the problem altogether. Java lets you delete elements through an iterator — you can either step to the next element or delete the current element — and so avoids the problem of stale iterators. C++ (STL) algorithms use two iterators to identify a range of elements. Databases return “result sets” that iterate over queries and are isolated from insertions and deletions.

    [Nat – Mutable collections. Iterator.next gives you object A. But if someone comes in first with a different iterator and does a delete or insert anywhere, you should still get object A, unless A was deleted, in which case you should get the next element after A. Nailed you down yet? – Jonathan]

  3. I wonder if this could be implemented using a concept similar toconsistent hashing — if you put your entries in a 64-bit space, you can keep their positions therein sorted. Deletion pretty obviously keeps the list sorted; insertion could simply be done by halving the distance between the position of an object and its next value. That might be the limiting case: how many consecutive inserts can you do before some overarching adjustment would need to take place?

    Of course, there could be some really glaring flaw in this implementation that I’m not seeing since I’m shooting from the hip. I don’t think I’ve seen any iterator implementations based on this concept, though.

  4. You might want to check out Boost.MultiIndex

    Beware – many algorithms will need to be changed, since you’re not removing elements any more, but you reorder an index. (The price of stable iterators…)

  5. Ah, I misread — you still want next to return object A, even if an insertion from another iterator has changed the structure of the collection?

    So, you want an Iterator that reflects the state of your Collection upon creation, that keeps pace with other Iterator’s deletes, but ignores other Iterator’s inserts?

  6. I am not really sure this is a solvable problem because there are too many variables and situations to account for and no answer is right for all situations. We all have different needs.

    Do you want iteration begun to reflect the same results regardless of further operations on the collection? Then copy it.

    Why do you handle insert changes differently than delete changes?

    What about reorderings?

  7. Thanks Jonathon. That’s much clearer.

    This is implemented in the CursorableLinkedList class in the Apache Collections library (org/apache/commons/collections/CursorableLinkedList.html)

    Related work might include http://www.cs.cornell.edu/andru/papers/jmatch2.pdf.

    [Nat – that is exactly the kind of references I was looking for. Thanks – Jonathan]

  8. Flexichain provides ‘cursors’ which are both stable positions and iterators.

    [Kevin – Great, that’s the kind of stuff I am looking for. That also lead me to javax.swing.text.Position. Thanks – Jonathan]

  9. Maybe your requirement is more significant than this, but the relatively ancient doubly-linked list easily supported this sort of behavior. When you moved from the current position, you simply checked the “next” or “previous” pointer, and that was your new item.

  10. I am just making this up. But, can’t we use versioning of a collection?

    In this hypothetical scenario, each collection will have a version number. When one creates an iterator to a collection, we return them an iterator to the latest version of the collection. If some other code adds updates or delete a few elements in the collection the version number is updated and all iterators are notified of the change. Then it’s upto the iterator to decide if it should continue using the old iterator or should fail and start all over again with the new iterator.

    The big question is: “how we can efficiently maintain multiple versions of the collection while maintaining the same complexity?”. That’s a research problem i guess.

  11. I have solved this to my understanding with a prototype here: http://codepad.org/WodYoClk

    This only ever yields the same entry once, but still works if you insert the same object more than once (and yields it the appropriate number of times). It works on the simple rule of yielding you the next object in order which you have not yet received through that iterator. So, deletes and inserts before the cursor don’t change iteration, but within items you haven’t seen you, they do and you get the most recent items in the container. If you sort the container, so items move within it (rather than being deleted and inserted as two separate operations), you still don’t receive the same object twice.

  12. An iterator is just one metaphor. Software engineering researchers / computer scientists who write papers on the Interposition problem in byte-code engineering libraries (ASM, BCEL, etc.) also use the visitor metaphor e.g. making sure a GOTO instruction still works as specified in the source code file. There’s a lot of functionality that is often duplicated, primarily because problem domain metamodel constraints prohibit using an existing component for performance reasons, particularly in circumstances where heuristics are used to improve performance or when more than one thread traverses the list using an iterator. See: http://www.priorartdatabase.com/IPCOM/000021751/

    In his C5 collection library for .NET, Peter Sestoft uses the notion of a “zero-item view” as an “inter-item cursor”. Insertions can be done on the zero-item view, using it as an inter-item pointer. Doing so slides the offsets of all other views. Here – http://channel9.msdn.com/ShowPost.aspx?PostID=370368 – watch this video and your question should be answered.

    [Z-Bo – Thanks – that is exactly what I was looking for. – Jonathan]

  13. This is an issue I have with the Scala IDE I was working on. Basically, we have a “position” in the text buffer that needs to remain stable (the same) as the text buffer is being edited. I wound up throwing all interesting positions in a linked list so that insertion and deletion are non-disruptive to existing positions. A single cursor is then used to navigate/update the list so that the positions are accessed efficiently and the list isn’t trampled.

Comments are closed.