Subscribe to PHP Freaks RSS

Case Study: Optimizing CommonMark Markdown Parser with Blackfire.io

syndicated from planet-php.net on November 23, 2017

As you may know, I am the author and maintainer of the PHP League's CommonMark Markdown parser. This project has three primary goals:

  1. fully support the entire CommonMark spec
  2. match the behavior of the JS reference implementation
  3. be well-written and super-extensible so that others can add their own functionality.

This last goal is perhaps the most challenging, especially from a performance perspective. Other popular Markdown parsers are built using single classes with massive regex functions. As you can see from this benchmark, it makes them lightning fast:

Library Avg. Parse Time File/Class Count
Parsedown 1.6.0 2 ms 1
PHP Markdown 1.5.0 4 ms 4
PHP Markdown Extra 1.5.0 7 ms 6
CommonMark 0.12.0 46 ms 117

Unfortunately, because of the tightly-coupled design and overall architecture, it's difficult (if not impossible) to extend these parsers with custom logic.

For the League's CommonMark parser, we chose to prioritize extensibility over performance. This led to a decoupled object-oriented design which users can easily customize. This has enabled others to build their own integrations, extensions, and other custom projects.

The library's performance is still decent --- the end user probably can't differentiate between 42ms and 2ms (you should be caching your rendered Markdown anyway). Nevertheless, we still wanted to optimize our parser as much as possible without compromising our primary goals. This blog post explains how we used Blackfire to do just that.

Profiling with Blackfire

Blackfire is a fantastic tool from the folks at SensioLabs. You simply attach it to any web or CLI request and get this awesome, easy-to-digest performance trace of your application's request. In this post, we'll be examining how Blackfire was used to identify and optimize two performance issues found in version 0.6.1 of the league/commonmark library.

Let's start by profiling the time it takes league/commonmark to parse the contents of the CommonMark spec document:

Initial benchmark of league/commonark 0.6.1

Later on we'll compare this benchmark to our changes in order to measure the performance improvements.

Quick side-note: Blackfire adds overhead while profiling things, so the execution times will always be much higher than usual. Focus on the relative percentage changes instead of the absolute "wall clock" times.

Optimization 1

Looking at our initial benchmark, you can easily see that inline parsing with InlineParserEngine::parse() accounts for a whopping 43.75% of the execution time. Clicking this method reveals more information about why this happens:

Detailed view of InlineParseEngine::parse()

Here we see that InlineParserEngine::parse() is calling Cursor::getCharacter() 79,194 times --- once for every single character in the Markdown text. Here's a partial (slightly-modified) excerpt of this method from 0.6.1:

public function parse(ContextInterface $context, Cursor $cursor)
{
    // Iterate through every single character in the current line
    while (($character = $cursor->getCharacter()) !== null) {
        // Check to see whether this character is a special Markdown character
        // If so, let it try to parse this part of the string
        foreach ($matchingParsers as $parser) {
            if ($res = $parser->parse($context, $inlineParserContext)) {
                continue 2;
            }
   

Truncated by Planet PHP, read more at the original (another 2728 bytes)