links: JS MOC


What’s the scope?

By the time you have written the first few programs, you’re likely getting somewhat comfortable with creating variables and storing values in them.

But you may not have considered very closely the underlying mechanisms used by the engine to manage these variables. I don’t mean how the memory is allocated on the computer, but rather; how does JS know which variables are accessible by any given statement, and how does it handle two variables of the same name?

The answers to questions like these take the form of well-defines rules called scope. This book will dig through all aspects of scope — how it works, what it’s useful for, gotchas to avoid — and then point toward common scope patterns that guide the structure of programs

Our first step is to uncover how the JS engine processes our program before it runs.

About this book

Our focus will be first of the three pillars of JS language: the scope system and it’s function closures, as well as the power of module design pattern.

JS is typically classified as an interpreted scripting language, so it’s assumed by most that JS programs are processed in a single, top-down pass. But JS is in fact parsed/compiled in a separate phase before execution begins. The code author’s decisions on where to place variables, functions and blocks with respect to each other are analyzed according to the rules of the scope, during the initial parsing/compilation phase. The resulting scope structure is generally unaffected by runtime conditions.

JS functions themselves first class values; they can be assigned and passed around just like number or strings. But since these functions hold and access variables, they maintain their original scope no matter where in the program the functions are eventually executed. This is called closure.

Modules are a code organization pattern characterized by public methods that have privileged access (via closure) to hidden variables and functions in the internal scope of the module.

Compiled vs Interpreted

Read more about Compiled vs Interpreted here

Are these two processing models mutually exclusive? Generally, yes. However, the issue is more nuanced, because interpretation can actually take other forms than just operating line by line on source code text. Modern JS engines actually employ numerous variations of both compilation and interpretation in the handling of JS programs.

Compiling code

But first, why does it even matter whether JS is compiled or not?

Scope is determined during compilation, so understanding how compilation and execution relate is key in mastering scope.

In classic compiler theory, a program is processed by a compiler in three basic stages:

  1. Tokenizing/Lexing: breaking up a string of characters into meaningful (to the language) chunks, called tokens. For instance consider the program var a = 2;. This program would likely to broken up into the following tokens var, a, 2, ;`. Whitespace may or may not be persisted as a token, depending on whether it’s meaningful or not

    (The difference between tokenizing and lexing is subtle and academic but it centers whether or not these tokens are identified in a stateful or stateless way. Put simply, if the tokenizer were to invoke stateful parsing rules to figure out a should be considered a distinct token or just part of another token, that would be lexing.)

  2. Parsing: taking a stream(array) of tokens and turning it into a tree of nested elements, which collectively represent the grammatical structure of the program. This is called an Abstract Syntax Tree(AST)

    For example the tree for var a = 2; might start with a top level node called VariableDeclaration, with a child node called Identifier (whose value is a), and another child called AssignmentExpression which itself has a child called NumericLiteral (whose value is 2).

  3. Code Generation: taking an AST and turning it into executable code. This part varies depending on the language, the platform it’s targeting and other factors

    The JS engine takes the just described AST for var a = 2; and turns it into a set of machine instructions to actually create a variable called a (including reserving memory etc), and then store a value into a.

    Note:
    The implementation details of a JS engine (utilizing system memory resources etc) is much deeper than we will dig here. We’ll keep our focus on the observable behavior of our programs and let the JS engine manage those deeper system-level abstractions

    The JS engine is vastly more complex than just these three stages. In the process of parsing and code generation, there are steps to optimize the performance of the execution (i.e., collapsing redundant elements). In fact code can even be re-compiled and re-optimized during the progression of execution.

    JS engines doesn’t have the luxury of abundance of time to perform their work and optimizations, because JS compilation doesn’t happen in a build step ahead of time, as with other languages. It usually just happen in mere microseconds (or less!) right before the code is executed. To ensure the fastest performance under these constraints, JS engines use all kinds of tricks (like JITs, which lazy compile and even hot re-compile); these are well beyond the “scope” of our discussion here.

    Required: Two Phases

    The most important observation we can make about processing of JS programs is that it occurs in (atleast) two phases: parsing/compilation first then execution

    The separation of parsing/compilation phase from the subsequent execution phase is observable fact, not theory or opinion. While the JS specification does not require “compilation” phase explicitly. It requires behavior that is essentially only practical with a compile then execution approach.

    There are three program characteristics you can observe to prove this to yourself: syntax errors, early errors and hoisting

    Syntax errors from the start

    Consider this program:

var greeting = "Hello"
 
console.log(greeting)
 
greeting = ."Hi"
// SyntaxError: unexpected token .

This program produces no output (Hello is not printed), but instead throws a SyntaxError about the unexpected . token before the Hi string. Since the syntax errors happen after the well formed console.log(..) statement, if JS was executing top-down line by line, one would expect the Hello message being printed before the syntax error being thrown. That doesn’t happen.

In fact, the only way the JS engine could know about the syntax error on the third line, before executing the first and second lines, is by the JS engine first parsing of the entire program before any of it is executed.

Early errors

Next, consider:

console.log("Howdy")
 
saySomething("Hi", "Hello")
// Uncaught SyntaxError: duplicate parameter name
// not allowed in this context
 
function saySomething(greeting, greeting) {
	"use strict";
	console.log(greeting)
}

The Howdy message is not printed, despite being a well-formed statement.

Instead, just like the snippet in previous section, the SyntaxError is thrown before the program is executed. In this case, it’s because strict-mode (opted in only for saySomething(..) function here) forbids, among many other things, functions to have duplicate parameter names; this has always been allowed in non-string mode.

The error is not a syntax error in the sense of being a malformed string of tokens (like ."Hi" prior), but in strict-mode is nonetheless required by the specification to be thrown an “early error” before any execution begins

But how does the JS engine know that the greeting parameter has been duplicated? How does it know that the saySomething(..) function is even in strict-mode while processing the parameter list (the "use strict" pragma only appears later, in the function body)?

Again the only reasonable explanation is that the code must be fully parsed before any execution occurs.

Hoisting

finally consider:

function saySomething() {
	var greeting = "Hello"
	{
		greeting = "Howdy" // error comes from here
		let greeeting = "Hi"
		console.log(greeting)
	}
}
 
saySomething()
// ReferenceError: cannot access 'greeting' before
// initialization

The noted ReferenceError occurs from the line with the statement greeting = "Howdy". What’s happening is that the greeting variable for that statement belongs to the declaration on next line, let greeting = "Hi" rather than to the previous var greeting = "Hello" statement

The only way the JS engine could know, at the line where the error is thrown, that the next statement would declare a block scoped variable of the same name (greeting) is if the JS engine had already processed this code in an earlier pass, and already setup all the scopes and their variable associations. The processing of scopes and declarations can only accurately be accomplished by parsing the program before execution.

The ReferenceError technically comes from greeting = "Howdy" accessing the greeting variable too early, a conflict referred to as Temporal Dead Zone(TDZ)

In spirit and practice, what the JS engine is doing in processing JS programs is much more alike compilation

Classifying JS as a compiled language is not concerned with the distribution model for it’s binary (byte-code) executable representations, but rather keeping clear distinction in our minds about the phase where JS code is processed and analyzed; this phase observable and indisputedly happens before the code starts to be executed.

We need proper mental models of how the JS engine treats our code if we want to understand JS and scope effectively.

Compiler speak

With awareness of the two-phase processing of a JS program (compile, then execute), let’s turn our attention to how JS engine identifies variables and determines the scopes of a program as it is compiled.

First, let’s examine a simple JS program to use for analysis over the next several chapters:

var students = [
    { id: 14, name: "Kyle" },
    { id: 73, name: "Suzy" },
    { id: 112, name: "Frank" },
    { id: 6, name: "Sarah" }
];
 
 
function getStudentName(studentID) {
	for(let student of students) {
		if(student.id === studentID) {
			return student.name
		}
	}
}
 
var nextStudent = getStudentName(73)
 
console.log(nextStudent)
// suzy

Other than declarations, all occurrences of variables/identifiers in a program serve in one of two “roles”: either they’re the target of an assignment or they’re the source of the value

How do you know if a variable is a target? Check if there is a value being assigned to it; if so it’s a target, if not then the variable is source.

For the JS engine to properly handle a JS program’s variables, it must first label each occurrence of a variable as a target or source. We’ll dig in now how each role is determined

Targets

What makes a variable target? Consider:

students = [ // ...

This statement is clearly an assignment operation; remember the var students part is handled entirely as a declaration at compile time, and is thus irrelevant during execution; we left it for clarity and focus. Same with the nextStudent = getStudentName(73) statement.

But there are three other target assignment operations in the code that are perhaps less obvious. One of them:

for(let student of students)

That statement assigns a value to student for each iteration of the loop. Another target reference:

getStudentName(73)

But how is that an assignment to a target? Look closely the argument 73 is assigned to the parameter studentID.

And there’s one last (subtle) target reference in our program.

function getStudentName(studentID)

A function declaration is a special case of a target reference. You can think of it sort of like var getStudentName = function(studenID), but that’s not exactly accurate. An identifier getStudentName is declared (at compile time), but the part is also handled at compilation; the association between getStudentName and the function is automatically setup at the beginning of the scope rather than waiting for an for(let student of students), we said that studentis a *target*, butstudentsis a *source* reference. In the statementif(student.id === studentID), both student andstudentIDare *source* references.studentis also a *source* reference inreturn student.name`

you can think of it like this:

var studentIDOriginal = student.id
var studentIDToCompare = studentID
if(studentIDORiginal === studentIDToCompare)

In getStudentName(73), getStudentName is a source reference (which we hope resolves to a function reference value). In console.log(nextStudent) , console is a source reference, as is nextStudent

What’s the practical importance of understanding targets vs sources? How a variable role impacts it’s lookup (specifically if the lookup fails).

Cheating: Runtime Scope Modifications

It is clear that scope is determined as the program is compiled, and should not generally be affected by runtime conditions. However in non-strict-mode, there are technically still two ways to cheat this rule, modifying a program’s scope during runtime.

Neither of these techniques should be used — they’re both dangerous and confusing, and you should be using strict-mode(where they are disallowed) anyway. But it’s important to be aware of them in case you run across them in some programs.

The eval(...) function receives a string of code to compile and execute on the fly during the program runtime. If that string of code has a var or function declaration in it, those declarations will modify the current scope that the eval(..) is currently executing in:

function badIdea() {
	eval("var foo = 'Ugh!';")
	console.log(foo)
}
 
badIdea() // Ugh!

if the eval(..) function has not been present the foo variable might not be existed in console.log and throws ReferenceError. But eval(..) modifies the scope of badIdea() function at runtime. This is bad for many reasons, including performance hit of modifying the already compiled and optimized scope, every time badIdea() runs.

The second cheat is the with keyword, which is essentially dynamically turning an object into local scope. It’s properties are treated as identifiers in the new scope’s block.

var badIdea = { foo: 'Ugh!' }
with (badIdea) {
	console.log(badIdea) // Ugh!
}

The global scope is not modified here, but badIdea was turned into a scope at runtime rather than compile time, and it’s properties oops becomes a variable in that scope. Again, this is a terrible idea, for performance and readability reasons.

At all costs, avoid eval(..) and with. Again neither of these cheats is available during strict-mode. So if you just use strict-mode (you should ! ) the temptation goes away.

Lexical Scope

JS’s scope is determined at compile time, the term for this kind of scope is “lexical scope”. “Lexical” is associated with the “lexing” stage of compilation, as discussed earlier.

The key idea of “lexical scope” is that it’s controlled entirely by the placement of functions, blocks, and variable declarations, in relation to one another.

If you place a variable declaration inside a function, the compiler handles this declaration as it’s parsing the function, and associate that declaration with the function’s scope. if a variable is block-scoped (let/ const), then it’s associated with the nearest enclosing {..} block, rather than it’s enclosing function (as with var)

Furthermore, a reference (source or target role) for a variable must be resolved as coming from one of the scopes that are lexically available to it; otherwise a variable is said to be “undeclared” (which usually returns in an error!). If the variable is not declared in the current scope, the next outer/enclosing scope will be consulted. This process of stepping out one level of scope nesting continues until either a matching variable declaration can be found, or the global scope is reached and there’s no where else to go.

It’s important to note that compilation doesn’t actually do anything in terms of reserving memory for scopes and variables. None of the program has been executed yet.

Instead, compilation creates a map of all the lexical scopes that lays out what the program needs while it executes. You can think of this plan is inserted code for use at runtime, which defines all the scopes (aka, “lexical environments”) and registers all the identifiers (variables) for each scope.

In other words, while scopes are identified during compilation, they’re not actually created until runtime, each time a scope needs to run


tags: fundamentals