Emulating Python’s Lexical Scope Semantics
That’s the sound of my head exploding.
“I slept on it. I won’t make any more changes to ex35. It was a learning exercise and I’m still learning from it, but the existing interpreter is hairy enough that I’d need to rewrite the entire thing from scratch. Instead, I’m going to start a new language, a baby BASIC, with the knowledge I gained from ex35.
Thanks for all the memories.”
That was earlier this morning. Draft saved. Then came the pivot. I dove back into LMPTHW ex35 for a quick hack. And hack I did. Problem solved. Surprisingly, it was easy. Only a few lines of the existing interpreter needed a rewrite.
In my previous solution to emulating Python’s lexical scope semantics, I believed that each function definition was bundled into a tuple with a copy of the environment’s symbol table. This meant that every time the function was called, I was in fact calling a closure of that function. The environment variables, and in particular the free variables, would not reflect any changes after the function was defined. The resulting semantics would have looked something like this:
VVV TEST CODE STARTS BELOW VVV def outside(): free_var = 1 print("outside free_var ==", free_var) def inside(): print("inside free_var == ", free_var) inside() free_var = 2 print("outside free_var ==", free_var) inside() VVV TEST OUTPUT STARTS BELOW VVV outside free_var == 1 inside free_var == 1 outside free_var == 2 inside free_var == 1
THOSE ARE NOT PYTHON SEMANTICS! My understanding was wrong, wrong, wrong. The second call to inside() should print
free_var == 2. (There’s a whole story about how I got it wrong, but I’ll save it for a future post with code and diagrams.)
Here’s how I fixed it.
Once you get variable shadowing, know the difference between local variables and free variables, and understand how function definitions are stored in the symbol table, the solution is pretty simple. There were three main mechanisms I had to fix. First:
1. Reference the Definition’s Enclosing ScopeFrame.
The ScopeFrame is the class which has a symbol table attribute and a parent ScopeFrame attribute. The ScopeFrame represents the scope of the current function. The symbol table is a dict of which the keys are variable/function identifiers and the values are values, expressions and functions declared within scope. The parent Scopeframe attribute is a reference to the ScopeFrame which spawned the current ScopeFrame. A ScopeFrame with no parent is the root/global ScopeFrame.
class ScopeFrame(object): def __init__(self, parent=None): self.parent = parent self.symbol_table = dict()
When a function is defined within a scope, the symbol table should store a reference to that scope to be bundled with that function definition as a tuple: (function_definition, ref_to_enclosing_scope). Here’s the important change: Originally, the parent ScopeFrame spawned a copy of itself for bundling as (function_definition, copy_of_parent_scope) like this
def spawn_scope_frame(self): new_symbol_table = dict(self.symbol.table) new_scope_frame = ScopeFrame() new_scope_frame.symbol_table = new_symbol_table
This copies the parent table’s keys into the enclosed ScopeFrame. It also copies references to the parent table’s corresponding values. If any values are changed in the scope of the parent table, the references only change in the parent, not in the child. Enclosed free variables get frozen in time, at the moment of spawning, for the scope of the child. This is how we make a closure. We don’t want this, so we remove the spawning routine from the function definition (aka FuncDef) (function, scope) bundling process, but we’ll save it. It is critical for the next stage.
2. Calling the Function
When we call a function, a few things need to happen. To more accurately emulate Python’s lexical scoping, the FuncCall production uses the ScopeFrame’s spawn_scope_frame method to “push” a new ScopeFrame onto a “call stack” (it’s not really a call stack in the strictest sense, but the ScopeFrame hierarchy fills a similar role). I changed the ScopeFrame spawning to this:
def spawn_scope_frame(self): return ScopeFrame(self)
self argument is how the current ScopeFrame gets the child ScopeFrame to remember it as the calling “parent” (ie, the enclosing lexical scope) for future symbol lookups (more on that soon).
Now that there’s a scope ready for the callee, the function call (FuncCall) arguments can be matched up with the FuncDef’s parameters to be inserted and assigned in the symbol table of the child ScopeFrame. At this point, we can pass the new scope to the FuncDef’s
3. Variable Shadowing: Lookup from Leaf to Root
So far, we have laid the groundwork to ensure that our ScopeFrame hierarchy supports Python-like semantics.
When our interpreter encounters a variable expression (ExprID) in the parse tree (aka AST), the ExprID calls the ScopeFrame’s lookup_symbol method and so it can return the value associated with the variable (or function) name.
Initially, because all the environment symbol tables were being copied into enclosed symbol tables, symbol lookup was as simple as accessing a dict key-value pair from:
class ScopeFrame(object): def lookup_symbol(self, name): return self.symbol_table[name]
Unfortunately, performing the lookup in only the local scope gives us the problematic semantics we saw at the beginning of this post.
We need a “leaf to root” (definitely not compsci jargon, but I’m treading water here) symbol table lookup so that if the symbol can’t be locally found, ScopeFrame recursively performs lookups back up the tree until we either find the symbol, or we run out of parents (ie, reach the root ScopeFrame).
class ScopeFrame(object): def lookup_symbol(self, name): try: value = self.symbol_table[name] except: value = self.parent.symbol_lookup(name) return value
…and then my head exploded
Once these changes were made, I ran a test. Reading of the output was followed by a suitably emotional display.
VVV CODE STARTS BELOW VVV def outside(): local_var = 1 free_var = 2 def inside(): print(local_var) local_var = 3 print(local_var) print(free_var) local_var = 4 print(local_var) print(free_var) inside() print(local_var) print(free_var) local_var = 7 inside() outside() VVV TEST OUTPUT STARTS BELOW (comments added) VVV # outside(): local_var = 1 1 # outside(): free_var = 2 2 # inside() lookup to outside: local_var = 1 1 # inside() declares its own local_var = 3 3 # inside() lookup to outside: free_var = 2 2 # inside() sets its own local_var = 4 before exiting (no output) # outside() lookup to its own local_var (unchanged) 1 # outside() lookup to its own free_var (unchanged) 2 # outside() reassigns local_var = 7 (no output) # inside() lookup to outside()'s reassigned local_var # because inside()'s previous local_var = 4 is now out of scope 7 # inside() assigns a new local_var = 3 3 # inside() lookup to outside()'s unchanged free_var = 2 2
I’ve run a few variations of this test, and the interpreter passed them all.
Thank you for reading my story!
If you’d like to see how the interpreter is stapled and duct-taped together, please visit my GitHub repo and start experimenting on test_interpreter.py. Just keep in mind that the parser can’t handle strings and the print() builtin function only accepts a single argument. Also, please excuse the outdated code commenting. Enjoy!