(I've used {
instead of [
around the plain
command, so that it renders properly in this ticket)
The {plain} tags inserted by MDHTMLParser end up inside of tags (e.g. links). This causes the rendered text to have lots of line breaks.
Example:
http://screencast.com/t/RDaualrZG
{plain}It’s almost time for the annual trek to the {/plain}[ {plain}California Wine Country{/plain}](http://bit.ly/GOs13s) {plain}, known as the {/plain}[ {plain}Open Source Think Tank{/plain}](http://bit.ly/wh3qVg) {plain}. Open Source Think Tank, without a doubt, is one of the highlight events of every year for me, and one I really look forward to attending.{/plain}
Another example, using a tweet text:
{plain}OpenatAdobe: Check out the typekit projects on github.{/plain} {plain}http://t.co/jywtK23g #opensource #adobe{/plain} [link](http://twitter.com/OpenatAdobe/statuses/204578104633597952)
We should either make the {plain}
tags not cause line breaks when rendered, or adjust MDHTMLParser so they are not inserted inside of other tags that cause problems.
These are a few examples, changes should be tested with more real rss feeds.
I think perhaps if we convert the following list of characters to HTML character entities (�) we could just have MDHTMLParser escape the Markdown instead of using the [plain] tag: *, _, >, (, [, =, -, +, |, !, #, and any number at the start of a line. Am I missing any?
created #83: [#4345] Forgeblog import formatting issues (1cp)
I think there should be proper escaping, we'll see how to fix this in #83.
Related
Tickets:
#4345View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally by: tramadolmen
should i support old style [plain] tags for the web?
What do you mean by old style plain tags?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally by: tramadolmen
I mean in Allura/allura/lib/markdown_extensions.py for
PLAINTEXT_BLOCK_RE = re.compile( \ r'(?P<bplain>[plain])(?P
.*?)(?P<eplain>[\/plain])',
re.MULTILINE|re.DOTALL
)
leave old regexp for [plain] tag or only parce {plain}?
You can keep that as it is. I was using
{plain}
in this ticket only, because[plain]
tags were rendering weirdly. I actually meant to keep square brackets everywhere.closed #83, changes are in 42cc_4345
I added some tests to exemplify an issue I came across, in commit [b92144].
The last test case shows the issue I came across; if you send in HTML with lines wrapped in <p> or <div> tags (and possibly others), it triggers the raw HTML mode, which apparently has a bug that causes the htmlStash to not restore all of the placeholders. It ends up with fragments like
\x02wzxhzdk:5\x03
in the output. I think adding an extra instance ofRawHtmlPostprocessor
might work, but is a slightly hacky solution, so if you can come up with a better one, it would be preferred.(I also refactored
MDHTMLParser.handle_data()
in [8d6c61] to be a bit clearer; as far as I can tell it shouldn't change the semantics and the other tests pass. But if I missed anything, feel free to correct it or let me know.)I just realized that I forgot the html2text step in the last two tests, which may negate this bug. Looking into it further.
Ok, that one failing test was my mistake. I fixed and expanded the tests, which uncovered that the text/html processing was disabled to temporarily work around this issue.
Then I tested with these feeds: http://blogs.adobe.com/labs/feed, https://twitter.com/statuses/user_timeline/openatadobe.atom and verified that the formatting was working correctly.
Looks good now, but please review the tests, as they should have been added originally.