January 11, 2013

Recursive-descent in Python, using generators

Writing recursive descent parsers is easy, especially in Python (just like everything is easy in Python). But Python defines a low limit on the number of recursive calls that can be made, and recursive descent parsers are prone to exceed this limit.

We should thus write our parser without real recursion. Fortunately, Python offers us a way out: Coroutines, introduced in Python 2.5 as per PEP 342. Using coroutines and a simple trampoline function, we can convert every mutually recursive set of functions into a set of coroutines that require constant stack space.

Let’s say our parser looks like this (tokens being an iterator over characters):

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        return token
    elif token == '(':
         result = parse(tokens)
         if next(tokens) != ')': raise ...
         return result
         raise ...

We now apply the following transformations: return X => yield (X) parse() => (yield parse())

That is, we yield the value we want to return, and we yield a generator whenever we want to call (using a yield expression). Our rewritten parser reads:

def parse(tokens):
    token = next(tokens)
    if token.isalnum():
        yield token
    elif token == '(':
         result = yield parse(tokens)
         if next(tokens) != ')': raise ...
         return result
         raise ...

We obviously cannot call that generator like the previous function. What we need to introduce is a trampoline. A trampoline is a function that manages that yielded generators, calls them, and passes their result upwards. It looks like this:

def trampoline(generator):
    """Execute the generator using trampolining. """
    queue = collections.deque()

    def schedule(coroutine, stack=(), value=None):
        def resume():
            if 0:
                global prev
                now = stack_len(stack)
                if now < prev:
                    print("Length", now)
                    prev = -1
                elif prev != -1:
                    prev = now
            result = coroutine.send(value)
            if isinstance(result, types.GeneratorType):     # Normal call
                schedule(result, (coroutine, stack))
            elif isinstance(result, Tail):                  # Tail call (if you want to)
                schedule(result.value, stack)
            elif stack:                                     # Return to parent
                schedule(stack[0], stack[1], result)
            else:                                           # Final Return
                return result



    result = None
    while queue:
        func = queue.popleft()
        result = func()

    return result

This function is based on the code in PEP 342, the difference being that

  • we do not correctly propagate exceptions through the stack, but directly unwind to the caller of the parser (we don’t handle exceptions inside our parser generators anyway)
  • the code actually compiles (code in PEP used ‘value = coroutine.send(value)’ which does not work)
  • the code returns a value (code in PEP was missing a return in schedule)
  • we don’t use a class, and allow only one function to be scheduled at once (yes, we could get rid of the queue)
  • we allow tail calls [where Tail = namedtuple(“Tail”, [“result”])] to save some more memory.

For a more generic version of that, you might want to re-add exception passing, but the exceptions will then have long tracebacks, so I’m not sure how well they will work if you have deep recursion.

Now, the advantage of that is that our parser now requires constant stack space, because the actual real stack is stored in the heap using tuples which are used like singly-linked lists in scheme here. So, the only recursion limit is available memory.

A disadvantage of that transformation is that the code will run slightly slower for small inputs that could be handled using a normally recursive parser.

Reactions from Mastodon

Copyright © 2018-2020 Julian Andres Klode, articles licensed under CC BY-SA 4.0.
Comments are provided by Mastodon and copyright of their authors.

This website does not store any personally identifiable information. As part of standard web server access_log logging, it stores requests and the user agents and shortened IP addresses used to make them. It does, however, load some avatars from mastodon.

Powered by Hugo, and the Ernest theme.