#4345 Forgeblog import formatting issues

v1.0.0
closed
nobody
None
General
Cory Johns
2015-08-20
2012-06-11
No

(I've used { instead of [ around the plain command, so that it renders properly in this ticket)

The {plain} tags inserted by MDHTMLParser end up inside of tags (e.g. links). This causes the rendered text to have lots of line breaks.

Example:

http://screencast.com/t/RDaualrZG

{plain}It’s almost time for the annual trek to the {/plain}[ {plain}California Wine Country{/plain}](http://bit.ly/GOs13s) {plain}, known as the {/plain}[ {plain}Open Source Think Tank{/plain}](http://bit.ly/wh3qVg) {plain}. Open Source Think Tank, without a doubt, is one of the highlight events of every year for me, and one I really look forward to attending.{/plain}

Another example, using a tweet text:

{plain}OpenatAdobe: Check out the typekit projects on github.{/plain} {plain}http://t.co/jywtK23g #opensource #adobe{/plain}
 [link](http://twitter.com/OpenatAdobe/statuses/204578104633597952)

We should either make the {plain} tags not cause line breaks when rendered, or adjust MDHTMLParser so they are not inserted inside of other tags that cause problems.

These are a few examples, changes should be tested with more real rss feeds.

Related

Tickets: #4254
Tickets: #4345

Discussion

  • Cory Johns - 2012-06-11

    I think perhaps if we convert the following list of characters to HTML character entities (�) we could just have MDHTMLParser escape the Markdown instead of using the [plain] tag: *, _, >, (, [, =, -, +, |, !, #, and any number at the start of a line. Am I missing any?

     
  • Yaroslav Luzin - 2012-06-13

    created #83: [#4345] Forgeblog import formatting issues (1cp)

    I think there should be proper escaping, we'll see how to fix this in #83.

     

    Related

    Tickets: #4345

  • Anonymous - 2012-06-14

    Originally by: tramadolmen

    should i support old style [plain] tags for the web?

     
  • Dave Brondsema

    Dave Brondsema - 2012-06-14

    What do you mean by old style plain tags?

     
  • Anonymous - 2012-06-15

    Originally by: tramadolmen

    I mean in Allura/allura/lib/markdown_extensions.py for

    PLAINTEXT_BLOCK_RE = re.compile( \ r'(?P<bplain>[plain])(?P.*?)(?P<eplain>[\/plain])',
    re.MULTILINE|re.DOTALL
    )

    leave old regexp for [plain] tag or only parce {plain}?

     
  • Dave Brondsema

    Dave Brondsema - 2012-06-15

    You can keep that as it is. I was using {plain} in this ticket only, because [plain] tags were rendering weirdly. I actually meant to keep square brackets everywhere.

     
  • Yaroslav Luzin - 2012-06-19

    closed #83, changes are in 42cc_4345

     
  • Yaroslav Luzin - 2012-06-19
    • status: open --> code-review
     
  • Cory Johns - 2012-06-21

    I added some tests to exemplify an issue I came across, in commit [b92144].


    The last test case shows the issue I came across; if you send in HTML with lines wrapped in <p> or <div> tags (and possibly others), it triggers the raw HTML mode, which apparently has a bug that causes the htmlStash to not restore all of the placeholders. It ends up with fragments like \x02wzxhzdk:5\x03 in the output. I think adding an extra instance of RawHtmlPostprocessor might work, but is a slightly hacky solution, so if you can come up with a better one, it would be preferred.


    (I also refactored MDHTMLParser.handle_data() in [8d6c61] to be a bit clearer; as far as I can tell it shouldn't change the semantics and the other tests pass. But if I missed anything, feel free to correct it or let me know.)

    • status: code-review --> open
    • qa: Cory Johns
     
  • Cory Johns - 2012-06-22

    I just realized that I forgot the html2text step in the last two tests, which may negate this bug. Looking into it further.

     
  • Cory Johns - 2012-06-22

    Ok, that one failing test was my mistake. I fixed and expanded the tests, which uncovered that the text/html processing was disabled to temporarily work around this issue.


    Then I tested with these feeds: http://blogs.adobe.com/labs/feed, https://twitter.com/statuses/user_timeline/openatadobe.atom and verified that the formatting was working correctly.

    Looks good now, but please review the tests, as they should have been added originally.

    • status: open --> closed
     
  • Cory Johns - 2012-06-22
    • milestone: forge-backlog --> forge-jun-29
     

Log in to post a comment.