Writing a Brainfuck interpreter in Python.

Leave a comment

September 2, 2012 by Vedat

For those unaware, Brainfuck is a programming language.
This is how wikipedia describes Brainfuck:

Brainfuck is an esoteric programming language noted for its extreme minimalism. It is a Turing tarpit, designed to challenge and amuse programmers, and was not made to be suitable for practical use. It was created in 1993 by Urban Müller.

The name of the language is generally not capitalized except at the start of a sentence, although it is a proper noun.

I discovered the language a while ago, when I got interested in esoteric languages. Since the language is very simple, I figured that it wouldn’t be too hard to write an interpreter for it.

Brainfuck has 8 commands:

  1. <  :  Increment the data pointer (go to the next cell).
  2. >  :  Decrement the data pointer (go to the previous cell).
  3. +  :  Increment the value in current cell.
  4. –    :  Decrement the value in current cell.
  5. ,    : Read a character from stdin, and write it to the current cell.
  6. .    : Print the character in the current cell.
  7. [   : If the value in the current cell is greater than 0, go read the next instruction else jump to the closing “]”.
  8. ]   : Jump to the opening “[“.

Of course, each cell represents a char and the tape which contains the cell is a char array of the length 30000.

The Primitives
The first thing to do is representing the cells and the data pointer.  Quite simply we can do that by using a list to represent the cells and an integer to represent the current value of the data pointer.

# simulate the cells with a list
tape = [0]*(3*10**4)
# the data pointer
ptr = 0

The next thing to do is to represent the first 6 commands.
Since they have no scope, they could be handled directly by simple functions.
We have to be careful while implementing “” to make sure that the data pointer doesn’t leave the tape (this would cause a segmentation fault).

def lt():
    """ performs a < """
    global ptr
    if ptr <= 0:
        raise ValueError, "Segmentation fault!"
    ptr -= 1

def gt():
    """" performs a > """
    global ptr
    if ptr >= 3*10**4 - 1:
        raise ValueError, "Segmentation fault!"
    ptr += 1

While implementing + and – we should make sure, that the value stays in the range 0,256. If the value of a cell gets smaller than 0 o greater than 256 we interpret it as an underflow or as an overflow and quitely let it pass.

def plus():
    """ performs a + """
    global ptr
    tape[ptr] = (tape[ptr] + 1) % 256

def minus():
    """ perfoms a - """
    global ptr
    tape[ptr] = (tape[ptr] - 1) % 256

Implementing “,” and “.” is quite straight forward as well. The only edge case is handling the EOFs while reading from stdin. Since the ASCII code for EOF is 26, we can simply test whether the byte value of the newly read character is 26 and write it to the cell only after we make sure, that this isn’t the case.

def dot():
    """ performs a . """
    global ptr
    sys.stdout.write(chr(tape[ptr]))

def comma():
    """ performs a , """
    global ptr
    c = ord(sys.stdin.read(1))
    if c != 26:
        tape[ptr] = c

Handling Loops
(Observation, the “[“s and “]”s can be interpretted as while loops where the condition is tape[ptr] > 0 with a scope from “[” to “]”.)
Now we must implement the commands “[” and “]”. To do this, we must map the “[“s on the corresponding “]”s (we map the indices of the opening brackets, to the indices of the balancing closing brackets). The way we do this is quite straight forward: We keep a stack for the indices of the “[“s and every time we encounter a “[“, we push its index to the stack. Every time we encounter a “]” we pop an index from the stack and map the current index to it (`it` being the index of the “]” encountered).

The reason why this does the right thing is quite simple. Assume we haven’t yet encountered a “]” but have encountered n “[“s. The stack must contain n indices. Obviously the first “]” we encounter always balances the last “[” encountered, so its index should be mapped on it. For the second “]” we encounter this is also similiar, the first “]” we encounter balanced the last “[” so the second “[” we encounter should balance the “[” we encountered second from last, which is the one that is currently on top of the stack. It isn’t hard to imagine why this also works when we encounter a further “[” popping all the indices from the stack or when we encounter a further “]”. There are multiple ways of proving this formally, but the one I like the most is simply induction over the language of balanced brackets (the Dyck Language), which could be generated by the CFG with the terminals “[“, “]” the nonterminal and the start symbol S and the production S \Rightarrow \epsilon|[S]|SS. I will spare the proof and just give the code.

But before doing that let’s think about the two cases where this can go south:

  1. The stack is empty, and we encounter a “]”. The program cannot possibly be a valid Brainfuck program, because the brackets aren’t balanced.
  2. After reading the whole program, the stack isn’t empty. This means that there are more “[“s then “]”s. This also indicates that the program cannot possibly be valid, as having more “[“s then “]”s is a syntactic violation.

We should therefore test whether these are present in the code we are to read:

def parse(code):
    """ maps the "["s to the corresponding "]"s """
    # stack to contain the indices of the opening brackets
    opening = []
    # dict which maps the indices of the opening brackets
    # to the closing brackets
    loop = {}
    for i,c in enumerate(code):
        if c == "[":
            opening.append(i)
        elif c == "]":
            try:
                begin = opening.pop()
                loop[begin] = i
            except IndexError:
                raise ValueError, "Supplied string isn't balanced, too many ]s!"
    # if the stack isn't empty, the string cannot be balanced
    if opening != []:
        raise ValueError, "Supplied string isn't balanced, too many [s"
    else:
        return loop

We could of course also use the same dictionary to map the indices of the opening brackets to the closing brackets, the resulting eval code might be faster, but I like this version better.

Evaluation

Now we must write a function which evaluates valid Brainfuck programs which should also be quite simple.
Whenever we encounter a brainfuck command which isn’t “[” or “]” we handle it by the functions we have already written, else whenver we encounter a “[” we check whether the current cell contains a value greater than 0 if so, we push the current index to the stack and enter the loop body, else we jump to the instruction to the right of the corresponding balancing “]”. Whenever we encounter a “]” we pop the index of the corresponding “[” from the stack and jump there.

This is how it looks like

# primitives which can directly be handled
handle_directly = { "." : dot, "," : comma, "<" : lt, ">" : gt, "+" : plus, "-" : minus}


def eval_bf(code):
    """ evaluates brainfuck code """
    global ptr
    # get the scopes of the "[", "]"s
    loop = parse(code)
    # initialize the program counter
    pc = 0
    # a stack to store the pc for loops
    stack = []
    while pc < len(code):
        instruction = code[pc]
        # handle the primitives directly
        if instruction in handle_directly:
            apply(handle_directly[instruction])
        elif instruction == "[":
            # if loop condition is fullfiled
            # enter loop block
            if tape[ptr] > 0:
                stack.append(pc)
            else: # else go to the end of the block
                pc = loop[pc]
        elif instruction == "]":
            # jump back where you came from!
            pc = stack.pop() - 1
        pc += 1

Notice the “-1” on line 28, if it weren’t for that we would jump directly in the loop block (because of the incrementation in line 29), thus would never have any terminating loops (i.e for [] the program would first read the “[” then read the “]”, and continually jump on “]”).

So this was all.
The whole code is in my github, though it differs from the one I posted here in minor ways. The script in my github can also be used as an interactive brainfuck shell.

Note to self: Use pygments the next time you want to post code on your blog.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

September 2012
M T W T F S S
« Aug    
 12
3456789
10111213141516
17181920212223
24252627282930

Categories

%d bloggers like this: