In researching slow-rendering discussion threads, it was discovered that some individual posts are taking a long time to render. The posts in question are sizable (23k) chunks of plain html. See is re2 can be used to speed up the rendering.
TL;DR: re2 turns out to be 20x slower than re for our use cases.
Added md_perf.py script for timing/profiling discussion thread rendering.
re2 can't be used as a drop-in replacement in Markdown b/c back-referencing regexes are prevalent in Markdown, and not supported by re2.
Profiling showed that for the slow-to-render posts, all the time was being spent regex matching in the HtmlPattern class in markdown.inlinepatterns. I tried to use re2 just in this class, but the resulting performance was 20x slower than with re.
Conclusion: Speeding up our MD-rendering won't be as easy as just dropping in re2. Rendering with ForgeExtensions is about 50% slower than with a vanilla Markdown() instance, so perf gains there may be possible. Best option may be to simply cache the MD-converted values instead of rendering them on-the-fly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
allura:tv/6734
TL;DR:
re2
turns out to be 20x slower thanre
for our use cases.Added
md_perf.py
script for timing/profiling discussion thread rendering.re2 can't be used as a drop-in replacement in Markdown b/c back-referencing regexes are prevalent in Markdown, and not supported by re2.
Profiling showed that for the slow-to-render posts, all the time was being spent regex matching in the HtmlPattern class in markdown.inlinepatterns. I tried to use re2 just in this class, but the resulting performance was 20x slower than with re.
Conclusion: Speeding up our MD-rendering won't be as easy as just dropping in re2. Rendering with ForgeExtensions is about 50% slower than with a vanilla Markdown() instance, so perf gains there may be possible. Best option may be to simply cache the MD-converted values instead of rendering them on-the-fly.