Subscribe to PHP Freaks RSS

ReactJS in PHP: Writing Compilers Is Easy and Fun!

syndicated from on August 25, 2017

I used to use an extension called XHP. It enables html-in-PHP syntax for generating front-end markup. I reached for it recently, and was surprised to find that it was no longer officially supported for modern PHP versions.

So, I decided to implement a user-land version of it, using a basic state-machine compiler. It seemed like it would be a fun project to do with you!

The code for this tutorial can be found on Github.

Abstract image of blocks coming together

Creating Compilers

Many developers avoid writing their own compilers or interpreters, thinking that the topic is too complex or difficult to explore properly. I used to feel like that too. Compilers can be difficult to make well, and the topic can be incredibly complex and difficult. But, that doesn’t mean you can’t make a compiler.

Making a compiler is like making a sandwich. Anyone can get the ingredients and put it together. You can make a sandwich. You can also go to chef school and learn how to make the best damn sandwich the world has ever seen. You can study the art of sandwich making for years, and people can talk about your sandwiches in other lands. You’re not going to let the breadth and complexity of sandwich-making prevent you from making your first sandwich, are you?

Compilers (and interpreters) begin with humble string manipulation and temporary variables. When they’re sufficiently popular (or sufficiently slow) then the experts can step in; to replace the string manipulation and temporary variables with unicorn tears and cynicism.

At a fundamental level, compilers take a string of code and run it through a couple of steps:

  1. The code is split into tokens – meaningful characters and sub-strings – which the compiler will use to derive meaning. The statement if (isEmergency) alert("there is an emergency") could be considered to contain tokens like if, isEmergency, alert, and "there is an emergency"; and these all mean something to the compiler.

    The first step is to split the entire source code up into these meaningful bits, so that the compiler can start to organize them in a logical hierarchy, so it knows what to do with the code.

  2. The tokens are arranged into the logical hierarchy (sometimes called an Abstract Syntax Tree) which represents what needs to be done in the program. The previous statement could be understood as “Work out if the condition (isEmergency) evaluates to true. If it does, run the function (alert) with the parameter ("there is an emergency")”.

Using this hierarchy, the code can be immediately executed (in the case of an interpreter or virtual machine) or translated into other languages (in the case of languages like CoffeeScript and TypeScript, which are both compile-to-Javascript languages).

In our case, we want to maintain most of the PHP syntax, but we also want to add our own little bit of syntax on top. We could create a whole new interpreter…or we could preprocess the new syntax, compiling it to syntactically valid PHP code.

I’ve written about preprocessing PHP before, and it’s my favorite approach to adding new syntax. In this case, we need to write a more complex script; so we’re going to deviate from how we’ve previously added new syntax.

Generating Tokens

Let’s create a function to split code into tokens. It begins like this:

function tokens($code) {
    $tokens = [];

$length = strlen($code); $cursor = 0;

while ($cursor < $length) { if ($code[$cursor] === "{") { print "ATTRIBUTE STARTED ({$cursor})" . PHP_EOL; }

if ($code[$cursor] === "}") { print "ATTRIBUTE ENDED ({$cursor})" . PHP_EOL; }

if ($code[$cursor] === "<") { print "ELEMENT STARTED ({$cursor})" . PHP_EOL; }

if ($code[$cursor] === ">") { print "ELEMENT ENDED ({$cursor})" . PHP_EOL; }

$cursor++; } }

$code = ' <?php

$classNames = "foo bar"; $message = "hello world";

$thing = ( <div className={() => { return "outer-div"; }} nested={<span className={"nested-span"}>with text</span>} > a bit of text before <span> {$message} with a bit of extra tex

Truncated by Planet PHP, read more at the original (another 12830 bytes)