Apache Allura™ / Tickets / #6734 Research using re2 as a replacement for re module

#6734 Research using re2 as a replacement for re module

Milestone: v1.0.1

Status: closed

Owner: Tim Van Steenburgh

Labels: sf-2 (994)

Component: General

Reviewer: nobody

Updated: 2015-08-20

Created: 2013-10-03

Creator: Tim Van Steenburgh

Private: No

In researching slow-rendering discussion threads, it was discovered that some individual posts are taking a long time to render. The posts in question are sizable (23k) chunks of plain html. See is re2 can be used to speed up the rendering.

Discussion

Tim Van Steenburgh - 2013-10-03

status: in-progress --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Van Steenburgh - 2013-10-03

allura:tv/6734

TL;DR: re2 turns out to be 20x slower than re for our use cases.

Added md_perf.py script for timing/profiling discussion thread rendering.

re2 can't be used as a drop-in replacement in Markdown b/c back-referencing regexes are prevalent in Markdown, and not supported by re2.

Profiling showed that for the slow-to-render posts, all the time was being spent regex matching in the HtmlPattern class in markdown.inlinepatterns. I tried to use re2 just in this class, but the resulting performance was 20x slower than with re.

Conclusion: Speeding up our MD-rendering won't be as easy as just dropping in re2. Rendering with ForgeExtensions is about 50% slower than with a vanilla Markdown() instance, so perf gains there may be possible. Best option may be to simply cache the MD-converted values instead of rendering them on-the-fly.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Apache Allura™

Forge software for hosting software projects

Milestone

Searches

Help

#6734 Research using re2 as a replacement for re module

Discussion