(I've used {
instead of [
around the plain
command, so that it renders properly in this ticket)
The {plain} tags inserted by MDHTMLParser end up inside of tags (e.g. links). This causes the rendered text to have lots of line breaks.
Example:
http://screencast.com/t/RDaualrZG
{plain}It’s almost time for the annual trek to the {/plain}[ {plain}California Wine Country{/plain}](http://bit.ly/GOs13s) {plain}, known as the {/plain}[ {plain}Open Source Think Tank{/plain}](http://bit.ly/wh3qVg) {plain}. Open Source Think Tank, without a doubt, is one of the highlight events of every year for me, and one I really look forward to attending.{/plain}
Another example, using a tweet text:
{plain}OpenatAdobe: Check out the typekit projects on github.{/plain} {plain}http://t.co/jywtK23g #opensource #adobe{/plain} [link](http://twitter.com/OpenatAdobe/statuses/204578104633597952)
We should either make the {plain}
tags not cause line breaks when rendered, or adjust MDHTMLParser so they are not inserted inside of other tags that cause problems.
These are a few examples, changes should be tested with more real rss feeds.
I think perhaps if we convert the following list of characters to HTML character entities (�) we could just have MDHTMLParser escape the Markdown instead of using the [plain] tag: *, _, >, (, [, =, -, +, |, !, #, and any number at the start of a line. Am I missing any?
created #83: [#4345] Forgeblog import formatting issues (1cp)
I think there should be proper escaping, we'll see how to fix this in #83.
Related
Tickets:
#4345Originally by: tramadolmen
should i support old style [plain] tags for the web?
What do you mean by old style plain tags?
Originally by: tramadolmen
I mean in Allura/allura/lib/markdown_extensions.py for
PLAINTEXT_BLOCK_RE = re.compile( \ r'(?P<bplain>[plain])(?P
.*?)(?P<eplain>[\/plain])',
re.MULTILINE|re.DOTALL
)
leave old regexp for [plain] tag or only parce {plain}?
You can keep that as it is. I was using
{plain}
in this ticket only, because[plain]
tags were rendering weirdly. I actually meant to keep square brackets everywhere.closed #83, changes are in 42cc_4345
I added some tests to exemplify an issue I came across, in commit [b92144].
The last test case shows the issue I came across; if you send in HTML with lines wrapped in <p> or <div> tags (and possibly others), it triggers the raw HTML mode, which apparently has a bug that causes the htmlStash to not restore all of the placeholders. It ends up with fragments like
\x02wzxhzdk:5\x03
in the output. I think adding an extra instance ofRawHtmlPostprocessor
might work, but is a slightly hacky solution, so if you can come up with a better one, it would be preferred.(I also refactored
MDHTMLParser.handle_data()
in [8d6c61] to be a bit clearer; as far as I can tell it shouldn't change the semantics and the other tests pass. But if I missed anything, feel free to correct it or let me know.)I just realized that I forgot the html2text step in the last two tests, which may negate this bug. Looking into it further.
Ok, that one failing test was my mistake. I fixed and expanded the tests, which uncovered that the text/html processing was disabled to temporarily work around this issue.
Then I tested with these feeds: http://blogs.adobe.com/labs/feed, https://twitter.com/statuses/user_timeline/openatadobe.atom and verified that the formatting was working correctly.
Looks good now, but please review the tests, as they should have been added originally.