Apache Allura™

Forge software for hosting software projects

Brought to you by: alexluberg, brondsem, ccruz, deshani, and 11 others

#4345 Forgeblog import formatting issues

Milestone: v1.0.0

Status: closed

Owner: nobody

Labels: None

Component: General

Reviewer: Cory Johns

Updated: 2015-08-20

Created: 2012-06-11

Creator: Dave Brondsema

Private: No

(I've used { instead of [ around the plain command, so that it renders properly in this ticket)

The {plain} tags inserted by MDHTMLParser end up inside of tags (e.g. links). This causes the rendered text to have lots of line breaks.

Example:

http://screencast.com/t/RDaualrZG

{plain}It’s almost time for the annual trek to the {/plain}[ {plain}California Wine Country{/plain}](http://bit.ly/GOs13s) {plain}, known as the {/plain}[ {plain}Open Source Think Tank{/plain}](http://bit.ly/wh3qVg) {plain}. Open Source Think Tank, without a doubt, is one of the highlight events of every year for me, and one I really look forward to attending.{/plain}

Another example, using a tweet text:

{plain}OpenatAdobe: Check out the typekit projects on github.{/plain} {plain}http://t.co/jywtK23g #opensource #adobe{/plain}
 [link](http://twitter.com/OpenatAdobe/statuses/204578104633597952)

We should either make the {plain} tags not cause line breaks when rendered, or adjust MDHTMLParser so they are not inserted inside of other tags that cause problems.

These are a few examples, changes should be tested with more real rss feeds.

Related

Tickets: ~~#4254~~
Tickets: ~~#4345~~

Discussion

Cory Johns - 2012-06-11

I think perhaps if we convert the following list of characters to HTML character entities () we could just have MDHTMLParser escape the Markdown instead of using the [plain] tag: *, _, >, (, [, =, -, +, |, !, #, and any number at the start of a line. Am I missing any?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yaroslav Luzin - 2012-06-13

created #83: [#4345] Forgeblog import formatting issues (1cp)

I think there should be proper escaping, we'll see how to fix this in #83.

Related

Tickets: ~~#4345~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2012-06-14

Originally by: tramadolmen

should i support old style [plain] tags for the web?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2012-06-14

What do you mean by old style plain tags?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2012-06-15

Originally by: tramadolmen

I mean in Allura/allura/lib/markdown_extensions.py for

PLAINTEXT_BLOCK_RE = re.compile( \ r'(?P<bplain>[plain])(?P.*?)(?P<eplain>[\/plain])', re.MULTILINE|re.DOTALL )
leave old regexp for [plain] tag or only parce {plain}?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2012-06-15

You can keep that as it is. I was using {plain} in this ticket only, because [plain] tags were rendering weirdly. I actually meant to keep square brackets everywhere.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yaroslav Luzin - 2012-06-19

closed #83, changes are in 42cc_4345

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yaroslav Luzin - 2012-06-19

status: open --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2012-06-21

I added some tests to exemplify an issue I came across, in commit [b92144].

The last test case shows the issue I came across; if you send in HTML with lines wrapped in <p> or <div> tags (and possibly others), it triggers the raw HTML mode, which apparently has a bug that causes the htmlStash to not restore all of the placeholders. It ends up with fragments like \x02wzxhzdk:5\x03 in the output. I think adding an extra instance of RawHtmlPostprocessor might work, but is a slightly hacky solution, so if you can come up with a better one, it would be preferred.

(I also refactored MDHTMLParser.handle_data() in [8d6c61] to be a bit clearer; as far as I can tell it shouldn't change the semantics and the other tests pass. But if I missed anything, feel free to correct it or let me know.)

status: code-review --> open

qa: Cory Johns
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2012-06-22

I just realized that I forgot the html2text step in the last two tests, which may negate this bug. Looking into it further.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2012-06-22

Ok, that one failing test was my mistake. I fixed and expanded the tests, which uncovered that the text/html processing was disabled to temporarily work around this issue.

Then I tested with these feeds: http://blogs.adobe.com/labs/feed, https://twitter.com/statuses/user_timeline/openatadobe.atom and verified that the formatting was working correctly.

Looks good now, but please review the tests, as they should have been added originally.

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2012-06-22

milestone: forge-backlog --> forge-jun-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.