#4771 TOC macro doesn't add headers of included pages [ss246]

unreleased
open
nobody
General
nobody
2015-02-22
2012-08-23
Chris Tsai
No

[forge:site-support:#246]

On the page above there is a table of contents just underneath the Versions Heading, and then each of the headings after that are included from seperate pages per version. I would expect that the table of contents to display the headings from the included pages as well as the headings from the current page. This does not happen.

It may just be me not quite understanding it, however it would be nice to have

I've put together a page demonstrating this here: https://sourceforge.net/p/strawhat/wiki/top/

Discussion

  • Chris Tsai - 2012-09-26
    • labels: support --> support, p3
     
  • Anonymous - 2014-07-01

    Originally by: ligc

    HI,

    We are running into the same issue with the TOC supporting included pages, see https://sourceforge.net/p/forge/site-support/8032/.

    This is very important for us, we are using included pages everywhere, without the TOC for these included pages, our documentation seems quite messed up.

    Any idea on when this bug will fixed? Thank you.

     
  • Dave Brondsema

    Dave Brondsema - 2014-08-15
    • labels: support, p3 --> support, p3, 42cc
    • status: open --> in-progress
    • assigned_to: Igor Bondarenko
     
  • Dave Brondsema

    Dave Brondsema - 2014-08-22
    • Milestone: forge-backlog --> forge-sep-5
     
  • Dave Brondsema

    Dave Brondsema - 2014-08-22
    • Size: --> 2
     
  • Igor Bondarenko - 2014-08-26
    • status: in-progress --> code-review
     
  • Igor Bondarenko - 2014-08-26

    Phew! This was a tough one. Make sure you test it against different wiki pages with different markup, macros, etc, since it changes markdown pipeline a bit.

    Closed #635. je/42cc_4771

     
  • Dave Brondsema

    Dave Brondsema - 2014-09-08
    • Milestone: forge-sep-5 --> forge-sep-19
     
  • Dave Brondsema

    Dave Brondsema - 2014-09-22
    • Milestone: forge-sep-19 --> forge-oct-3
     
  • Dave Brondsema

    Dave Brondsema - 2014-10-06
    • Milestone: forge-oct-3 --> forge-oct-17
     
  • Dave Brondsema

    Dave Brondsema - 2014-10-18
    • QA: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2014-10-18

    Can you comment (here or in the code) about what the change in ForgeMacroPattern really does? The inline comment seems to be about the strip() and join() but it's not obvious to me what calling etree.fromstring() does in this context.

    The custom slugify function took me a bit to get (maybe just because its a friday afternoon) but looks like its the same as the original but with better unicode handling.

     
  • Igor Bondarenko - 2014-10-20

    The change in ForgeMacroPattern is essentially about making included page's content available to TOC extension. We need to include processed page contents (html) into parent page, so that TOC will find it. Old code just put macro result onto html stash, which is processed after all the extensions (and there's no way around it), so it was hidden from TOC extension.

    Returning html as a string does not work in this context. Actually, it includes page contents as needed, but TOC extension still can't find it (don't sure why). If we return html as a etree, then TOC can find it and do the job.

    The except might happen if macro returned error, which is just text and can't be parsed to html, so it's ok to return that text, since there will not be any headers that we need to include in TOC anyway.

    Regarding slugify you're completely right, it is only for better unicode handling. I was getting some "TypeError: must be unicode, not str" errors from within markdown, bacause etree that I return from ForgeMacroPattern is always returns a bytestring, when coerced into string, and slugify needed unicode.

    Hope, this clarifies it.

     
  • Dave Brondsema

    Dave Brondsema - 2014-10-20
    • status: code-review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2014-10-20

    If the include macro is used to include a repo file, the line numbers get squashed together. HTML changed from

    <div class="linenodiv"><pre>1
    2</pre></div>
    

    to:

    <div class="linenodiv">
    <pre>12</pre>
    </div>
    
     
  • Dave Brondsema

    Dave Brondsema - 2014-10-20
    • Milestone: forge-oct-17 --> forge-oct-31
     
  • Dave Brondsema

    Dave Brondsema - 2014-10-30
    • Labels: support, p3, 42cc, sf-2 --> p3, support, 42cc, sf-current, sf-2
     
  • Igor Bondarenko - 2014-10-31
    • status: in-progress --> review
     
  • Igor Bondarenko - 2014-10-31

    Closed #672. ib/4771

    I have added comments and fixed the case when page includes file from a repo directly, but there's still a problem if you include a wiki page, which, in turn, includes a file from a repo. I managed to fix issues with the line numbers in that case, but not with file contents (code). It gets extra <br>'s which break layout. I've tried a lot of things to fix that, but nothing works.

    Entire approach is very fragile... Any ideas on how to tackle it in more robust way and fix the issue with multi-level includes?

     
    • Dave Brondsema

      Dave Brondsema - 2014-11-07

      I'm actually not seeing the extra <br>s with a double include. Current example: http://sf-dbrondsema-1015.sb.sf.net/p/testit/wiki/Home/

       
      • Igor Bondarenko - 2014-11-07

        I forgot to mention it. Add a header to the included page. If there's no headers entire TOC-related logic just skipped to avoid those nasty effects

         
  • Dave Brondsema

    Dave Brondsema - 2014-11-11

    Latest commits cause text to be omitted sometimes. Specifically it seems like a child page with a header will only include the text from the first line after the header.

    Test case improvement:

    diff --git Allura/allura/tests/test_globals.py Allura/allura/tests/test_globals.py
    index a68096d..5631026 100644
    --- Allura/allura/tests/test_globals.py
    +++ Allura/allura/tests/test_globals.py
    @@ -256,34 +256,32 @@ def test_macro_project_admins_one_br():
     @td.with_wiki
     def test_macro_include_no_extra_br():
         p_nbhd = M.Neighborhood.query.get(name='Projects')
         p_test = M.Project.query.get(shortname='test', neighborhood_id=p_nbhd._id)
         wiki = p_test.app_instance('wiki')
         with h.push_context(p_test._id, app_config_id=wiki.config._id):
             p = WM.Page.upsert(title='Include_1')
    -        p.text = 'included page 1'
    +        p.text = '# page 1 title\nincluded page 1 line 1\nincluded page 1 line 2\nincluded page 1 line 3'
             p.commit()
             p = WM.Page.upsert(title='Include_2')
             p.text = 'included page 2'
             p.commit()
             p = WM.Page.upsert(title='Include_3')
             p.text = 'included page 3'
             p.commit()
             ThreadLocalORMSession.flush_all()
             md = '[[include ref=Include_1]]\n[[include ref=Include_2]]\n[[include ref=Include_3]]'
             html = g.markdown_wiki.convert(md)
    
         expected_html = '''
     <div class="markdown_content">
    -<p>
    -<div><div class="markdown_content"><p>included page 1</p></div></div>
    +<div><div class="markdown_content"><h1 id="page-1-title">page 1 title</h1>
    +<p>included page 1 line 1<br />included page 1 line 2<br />included page 1 line 3</p></div></div>
     <div><div class="markdown_content"><p>included page 2</p></div></div>
     <div><div class="markdown_content"><p>included page 3</p></div></div>
    -</p>
    -<p></p>
     </div>
     '''.strip().replace('\n', '')
         assert html.strip().replace('\n', '') == expected_html, html
    
     @with_setup(setUp, tearDown)
     @td.with_wiki
     @td.with_tool('test', 'Wiki', 'wiki2')
    
     
  • Dave Brondsema

    Dave Brondsema - 2014-11-11
    • status: review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2014-11-11

    I agree this approach is fragile, and getting quite messy. I don't really have a complete picture of how all the pieces fit together, so I can't really help with a specific recommendation. What it looks like to me though is the challenge is getting the includes to run before the TOC. And you've worked on moving includes earlier in the process but that is causing problems with how they get rendered. What about moving the TOC to later in the process, and leave the includes how they originally were? Would that be any better?

     
    • Igor Bondarenko - 2014-11-12

      That was my first thought too. Unfortunately, I don't see how it's possible. According to markdown's code treeprocessors run before postprocessors. The TOC is implemented as treeprocessor. But output of "include" macros inserted into final page only on postprocessors stage.

      The only way we can make TOC run later is re-implementing it from scratch as a postprocessor, which would search for headers and modify final html string to include table of contents. This approach would be also error-prone and hard to maintain...

       

      Last edit: Igor Bondarenko 2014-11-12
      • Dave Brondsema

        Dave Brondsema - 2014-11-12

        It makes sense to me that TOC be a treeprocessor, that seems appropriate.

        I think ideally the "include" functionality would happen very early, so that all the content would be present for all the rest of the stages. "include" is a macro, which is identified relatively as one of the inlinePatterns, but as you say the output isn't inserted until postprocessor stage when the htmlStash items get inserted. That makes sense for all the other macros, as well as including a file (since it isn't markdown). But for including another markdown page, I think we should grab the child markdown content and insert it into the current text right there (not using htmlStash). Then the rest of the markdown pipeline will apply to it, and not even know it came from a child source. How does that sound? Perhaps the "include markdown" handling would have to be moved forward as a pre-processor, not sure.

        That all said, I am willing to put this back on the backlog if we don't have a good solution. I don't think we should spend tons of time or use a fragile solution.

         
        • Igor Bondarenko - 2014-11-13

          Sounds good. Do you want me to try it now? If this will cause any subtle issues we can postpone it then.

           
          • Dave Brondsema

            Dave Brondsema - 2014-11-13

            Sounds like a good plan to me.

             
        • Igor Bondarenko - 2014-11-20

          This approach works better. I've removed a lot of html-joggling code, so entire approach is more robust and understandable.

          However, there're also some subtle issues, that I can't fix quickly:

          • page with some html in it (especially <pre> tags) gets extra newlines when included example of it here
          • specifying custom attributes like [[include ref=page id=foo]] does not work, since there's no <div>-wrapper around included page. I've tried to add that, but it breaks page rendering (see commented code in my latest commit).

          This code is in ib/4771a (there are test failures)

           

          Last edit: Igor Bondarenko 2014-11-20
  • Igor Bondarenko - 2014-11-20
    • status: in-progress --> review
     
  • Dave Brondsema

    Dave Brondsema - 2014-11-20
    • status: review --> open
     
  • Dave Brondsema

    Dave Brondsema - 2014-11-20

    Yeah looks like further work is needed to adjust the markdown pipeline properly so that included pages don't get extraneous processing. I'm going to set this ticket back to 'open' and somebody can revisit in the future if they want.

     
  • Dave Brondsema

    Dave Brondsema - 2014-12-01
    • labels: p3, support, 42cc, sf-current, sf-2 --> p3, support
    • assigned_to: Igor Bondarenko --> nobody
    • Reviewer: Dave Brondsema --> nobody
     

Log in to post a comment.