Apache Allura™ / Tickets / #4771 TOC macro doesn't add headers of included pages [ss246]

Chris Tsai - 2012-09-26

labels: support --> support, p3
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2014-07-01

Originally by: ligc

HI,

We are running into the same issue with the TOC supporting included pages, see https://sourceforge.net/p/forge/site-support/8032/.

This is very important for us, we are using included pages everywhere, without the TOC for these included pages, our documentation seems quite messed up.

Any idea on when this bug will fixed? Thank you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-08-15

labels: support, p3 --> support, p3, 42cc

status: open --> in-progress

assigned_to: Igor Bondarenko
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-08-22

Milestone: forge-backlog --> forge-sep-5
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-08-22

Size: --> 2
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2014-08-26

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2014-08-26

Phew! This was a tough one. Make sure you test it against different wiki pages with different markup, macros, etc, since it changes markdown pipeline a bit.

Closed #635. je/42cc_4771

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-09-08

Milestone: forge-sep-5 --> forge-sep-19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-09-22

Milestone: forge-sep-19 --> forge-oct-3
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-10-06

Milestone: forge-oct-3 --> forge-oct-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-10-18

QA: Dave Brondsema
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-10-18

Can you comment (here or in the code) about what the change in ForgeMacroPattern really does? The inline comment seems to be about the strip() and join() but it's not obvious to me what calling etree.fromstring() does in this context.

The custom slugify function took me a bit to get (maybe just because its a friday afternoon) but looks like its the same as the original but with better unicode handling.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2014-10-20

The change in ForgeMacroPattern is essentially about making included page's content available to TOC extension. We need to include processed page contents (html) into parent page, so that TOC will find it. Old code just put macro result onto html stash, which is processed after all the extensions (and there's no way around it), so it was hidden from TOC extension.

Returning html as a string does not work in this context. Actually, it includes page contents as needed, but TOC extension still can't find it (don't sure why). If we return html as a etree, then TOC can find it and do the job.

The except might happen if macro returned error, which is just text and can't be parsed to html, so it's ok to return that text, since there will not be any headers that we need to include in TOC anyway.

Regarding slugify you're completely right, it is only for better unicode handling. I was getting some "TypeError: must be unicode, not str" errors from within markdown, bacause etree that I return from ForgeMacroPattern is always returns a bytestring, when coerced into string, and slugify needed unicode.

Hope, this clarifies it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-10-20

status: code-review --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-10-20

If the include macro is used to include a repo file, the line numbers get squashed together. HTML changed from

<div class="linenodiv"><pre>1 2</pre></div>

to:

<div class="linenodiv"> <pre>12</pre> </div>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-10-20

Milestone: forge-oct-17 --> forge-oct-31
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-10-30

Labels: support, p3, 42cc, sf-2 --> p3, support, 42cc, sf-current, sf-2
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2014-10-31

status: in-progress --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2014-10-31

Closed #672. ib/4771

I have added comments and fixed the case when page includes file from a repo directly, but there's still a problem if you include a wiki page, which, in turn, includes a file from a repo. I managed to fix issues with the line numbers in that case, but not with file contents (code). It gets extra <br>'s which break layout. I've tried a lot of things to fix that, but nothing works.

Entire approach is very fragile... Any ideas on how to tackle it in more robust way and fix the issue with multi-level includes?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Dave Brondsema - 2014-11-07
  
  I'm actually not seeing the extra <br>s with a double include. Current example: http://sf-dbrondsema-1015.sb.sf.net/p/testit/wiki/Home/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Igor Bondarenko - 2014-11-07
    
    I forgot to mention it. Add a header to the included page. If there's no headers entire TOC-related logic just skipped to avoid those nasty effects
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Latest commits cause text to be omitted sometimes. Specifically it seems like a child page with a header will only include the text from the first line after the header.

Test case improvement:

diff --git Allura/allura/tests/test_globals.py Allura/allura/tests/test_globals.py
index a68096d..5631026 100644
--- Allura/allura/tests/test_globals.py
+++ Allura/allura/tests/test_globals.py
@@ -256,34 +256,32 @@ def test_macro_project_admins_one_br():
 @td.with_wiki
 def test_macro_include_no_extra_br():
     p_nbhd = M.Neighborhood.query.get(name='Projects')
     p_test = M.Project.query.get(shortname='test', neighborhood_id=p_nbhd._id)
     wiki = p_test.app_instance('wiki')
     with h.push_context(p_test._id, app_config_id=wiki.config._id):
         p = WM.Page.upsert(title='Include_1')
-        p.text = 'included page 1'
+        p.text = '# page 1 title\nincluded page 1 line 1\nincluded page 1 line 2\nincluded page 1 line 3'
         p.commit()
         p = WM.Page.upsert(title='Include_2')
         p.text = 'included page 2'
         p.commit()
         p = WM.Page.upsert(title='Include_3')
         p.text = 'included page 3'
         p.commit()
         ThreadLocalORMSession.flush_all()
         md = '[[include ref=Include_1]]\n[[include ref=Include_2]]\n[[include ref=Include_3]]'
         html = g.markdown_wiki.convert(md)

     expected_html = '''
 <div class="markdown_content">
-<p>
-<div><div class="markdown_content"><p>included page 1</p></div></div>
+<div><div class="markdown_content"><h1 id="page-1-title">page 1 title</h1>
+<p>included page 1 line 1<br />included page 1 line 2<br />included page 1 line 3</p></div></div>
 <div><div class="markdown_content"><p>included page 2</p></div></div>
 <div><div class="markdown_content"><p>included page 3</p></div></div>
-</p>
-<p></p>
 </div>
 '''.strip().replace('\n', '')
     assert html.strip().replace('\n', '') == expected_html, html

 @with_setup(setUp, tearDown)
 @td.with_wiki
 @td.with_tool('test', 'Wiki', 'wiki2')

Dave Brondsema - 2014-11-11

status: review --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-11-11

I agree this approach is fragile, and getting quite messy. I don't really have a complete picture of how all the pieces fit together, so I can't really help with a specific recommendation. What it looks like to me though is the challenge is getting the includes to run before the TOC. And you've worked on moving includes earlier in the process but that is causing problems with how they get rendered. What about moving the TOC to later in the process, and leave the includes how they originally were? Would that be any better?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Igor Bondarenko - 2014-11-12
  
  That was my first thought too. Unfortunately, I don't see how it's possible. According to markdown's code treeprocessors run before postprocessors. The TOC is implemented as treeprocessor. But output of "include" macros inserted into final page only on postprocessors stage.
  
  The only way we can make TOC run later is re-implementing it from scratch as a postprocessor, which would search for headers and modify final html string to include table of contents. This approach would be also error-prone and hard to maintain...
  
  Last edit: Igor Bondarenko 2014-11-12
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dave Brondsema - 2014-11-12
    
    It makes sense to me that TOC be a treeprocessor, that seems appropriate.
    
    I think ideally the "include" functionality would happen very early, so that all the content would be present for all the rest of the stages. "include" is a macro, which is identified relatively as one of the inlinePatterns, but as you say the output isn't inserted until postprocessor stage when the htmlStash items get inserted. That makes sense for all the other macros, as well as including a file (since it isn't markdown). But for including another markdown page, I think we should grab the child markdown content and insert it into the current text right there (not using htmlStash). Then the rest of the markdown pipeline will apply to it, and not even know it came from a child source. How does that sound? Perhaps the "include markdown" handling would have to be moved forward as a pre-processor, not sure.
    
    That all said, I am willing to put this back on the backlog if we don't have a good solution. I don't think we should spend tons of time or use a fragile solution.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Igor Bondarenko - 2014-11-13
      
      Sounds good. Do you want me to try it now? If this will cause any subtle issues we can postpone it then.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Dave Brondsema - 2014-11-13
        
        Sounds like a good plan to me.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Igor Bondarenko - 2014-11-20
      
      This approach works better. I've removed a lot of html-joggling code, so entire approach is more robust and understandable.
      
      However, there're also some subtle issues, that I can't fix quickly:
      
      page with some html in it (especially <pre> tags) gets extra newlines when included example of it here
      
      specifying custom attributes like [[include ref=page id=foo]] does not work, since there's no <div>-wrapper around included page. I've tried to add that, but it breaks page rendering (see commented code in my latest commit).
      
      This code is in ib/4771a (there are test failures)
      
      Last edit: Igor Bondarenko 2014-11-20
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2014-11-20

status: in-progress --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-11-20

status: review --> open
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-11-20

Yeah looks like further work is needed to adjust the markdown pipeline properly so that included pages don't get extraneous processing. I'm going to set this ticket back to 'open' and somebody can revisit in the future if they want.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-12-01

labels: p3, support, 42cc, sf-current, sf-2 --> p3, support

assigned_to: Igor Bondarenko --> nobody

Reviewer: Dave Brondsema --> nobody
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Apache Allura™

Forge software for hosting software projects

Milestone

Searches

Help

#4771 TOC macro doesn't add headers of included pages [ss246]

Discussion