Hacker News new | past | comments | ask | show | jobs | submit login

Great explanation. If anyone has good references for learning this by making a minimal editor like sed from scratch, please share. Wait, now that I think about it, does the name “Zed” draw inspiration from “sed”? If yes, that’s rad.



It draws from the original UNIX text editor called "ed".

https://zed.dev/faq#why-zed


I have a simple (well, simple by my standards because I've spent a long time with it) implementation of a rope in Standard ML [0], OCaml [1] and F# [2]. (See tiny_rope.sml, brolib.fs or lib/tiny_rope.ml for implementation if you can read any of those languages; the F# implementation is probably easiest to read.)

[0] https://github.com/hummy123/brolib-sml

[1] https://github.com/hummy123/brolib

[2] https://github.com/hummy123/brolib-fs

The essence of a data structure is a binary tree where the internal nodes (the N2 case) contains a pointer to the left subtree, an integer containing the total length of the strings in the left subtree and a pointer to the right subtree. Then there are leaf nodes (the N0 case) which contain simply strings. (There are some other cases in the type like N1, N3 and L2, but those are solely for balancing because I built my ropes on top of 1-2 Brother Trees as described by Ralf Hinze, and those aren't essential to the rope data structure.)

When indexing (which is necessary for the insertion and deletion operations), you have a simple recursive algorithm which can be best seen in the recursive "ins" function. In the internal N2 nodes, the algorithm is to compare the index (given as an argument) with the left metadata. If the index argument is less than the left metadata, recurse to the left subtree passing the same index; otherwise, recurse to the right subtree, subtracting the index argument with the left metadata.

By the end, when you eventually reach the leaf case, the index argument is equal to the position you want to insert into in the current node. (I haven't tried to understand the maths behind this but it's how the data structure works.) At that point, all you do is insert into the leaf node's string (this is the same as inserting at an arbitrary index in any normal string) if you can without exceeding the maximum limit, or else you can insert another node.

Then you unroll the recursion. Unrolling the recursion involves updating the left subtree metadata when you reach the parent, and it also involves balancing. (I'm using 1-2 Brother Trees for balancing but ropes don't really care which balancing you use or if you use one at all.)

That's pretty much all there is to ropes. The deletion and substring algorithms just require minor modifications (the user might specify a range that includes more than one subtree, so you might need to recurse on both subtrees).

You can extend the idea behind ropes to hold more metadata too. For example, rope.sml (Standard ML) also tracks line metadata to allow indexing by line number. The changes required for this are: store an array at the leaf nodes containing indices of line breaks in the string also at this node, and at internal N2 nodes you should also store an additional integer indicating the number of lines in the left subtree.

There is an idea I haven't found too useful which is, if two leaf nodes have strings that can be joined without reaching the maximum limit, then join them. I haven't found this idea to improve performance much although it theoretically should.

I want to give a shout out to the MLton compiler for Standard ML here - the two rope implementations compiled with it handily beat the fastest ropes in Rust which is surprising. (My code performs well with F# and OCaml, but MLton takes it a level beyond that.)


Max line length can be aggregated by using a 1-2-3 element type, for example, using scalar, 2-tuple and 3-tuple:

  open length |
  { open left length, open right length } |
  { open left length, max contained line, open right length }
The cases are obvious depending on the number of newlines in the substring (think of newlines as the commas in the tuples :)

  0 newlines:  open length
  1 newline:   { open left length, open right length } 
  2+ newlines: { open left length, max contained line, open right length }
Merging (summing) two maxline objects is also fairly obvious. There are 3*3 cases: contract (add) adjacent open counts, max with any contained line counts, preserve left and right open lengths.

The maxline ops are (obviously) non-commutative, but are associative, so you can apply them in any grouping, left-to-right, right-to-left, and even in parallel if you feel the need.

At the root of the tree, the left and right open lengths become actual lines (beginning and end of buffer are virtual line delimiters), so to get the global maximum line length, just max the 1, 2, or 3 values in the root maxline summary.

I like to separate newlines from other substrings as leaves in the tree. So the leaf maxlines are just:

  substring:   length
  newline:     {0,0}
Then sum as given above across any number of child nodes.

The Zed blog claims to have such a thing, but didn't explain it. Apologies if it's in one of the repos. I didn't check.


I assume you know this from your reference to Hinze, but just to make the point clear:

In a pointer-to-mutable-memory language, a rope becomes a lot of recursive pointer-chasing inside a tree structure, typically referenced from the root down.

In a functional language, a rope may be built as a tree (or several subtree edits), but is processed from local cursors that have a bottom-up view from a concrete leaf position in the buffer.

A rope in a purely functional language is better described as a Finger Tree (Hinze, Paterson) [1] or Zipper (Huet) [2]:

[1] https://www.staff.city.ac.uk/~ross/papers/FingerTree.html

[2] https://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced...


I assumed it was the alternative pronunciation of Zero.


can't edit my comment anymore, but I meant "Z", not zero. I think my phone autocorrected or something.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: