#6534 Wiki importer for github

v1.1.0
closed
nobody
General
2015-08-20
2013-08-07
No

Wikis are git repositories and can be accessed like git clone https://github.com/OpenRefine/OpenRefine.wiki for example. Check the main repo API first to see if the repo has wiki enabled. You can see https://sourceforge.net/p/googlecodewikiimporter/git/ for reference as an example of another wiki importer. It is a separate repo because it needs the "html2text" package to convert html to markdown, and that is a GPL library.

Github supports many markup types. Find a full list and determine what the best way to convert them to markdown is. My guess is that few formats will have tools available to convert them directly to markdown, so my likely recommendation would be to render them as HTML (using pypeline as a generic way to handle many of those formats) and then html2text to get it into markdown.

If html2text or any other GPL library is needed, this will have to be a separate repo from the main Allura repo. So please evaluate & test the conversion options first, before putting code into place.

A second phase to all this (i.e. do it separately, after the basic import is all working) would be to handle revision history. This would mean going through each commit in the wiki git repo, and converting & updating every file that changes. This may be very time consuming, so when we get to it, we may want it to be a checkbox option, so users only do it if they want it.

Related

Tickets: #6534

Discussion

<< < 1 2 (Page 2 of 2)
  • Igor Bondarenko - 2013-09-26
    • status: in-progress --> code-review
     
  • Igor Bondarenko - 2013-09-26

    Closed #439, #442.

    All changes in je/42cc_6534

     
  • Dave Brondsema

    Dave Brondsema - 2013-09-28
    • status: code-review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2013-09-28

    I've rebased this and pushed it to branch db/6534. I resolved several conflicts, so please start further work off of that branch. All specific examples below are from https://github.com/mxcl/homebrew/wiki/

    • When a wiki is deleted, the title is changed. This should be refactored in to a delete() method on the model. Even though it's just one line, it shouldn't be repeated in multiple places.
    • In convert_markup()
      • name_and_ext = filename.split('.', 1) should be changed to use os.path.splitext (or at least rsplit). It causes some pages not to be handled right: "Not a wiki page Homebrew-0.9.3.md. Skipping"
      • we need to handle markdown specially: don't do any conversion. We lose some formatting when it goes through render_any_markup and then back through html2text (e.g. External-Commands.md loses its table structure). We'll also need to keep the original markdown anyway, for [#6622].
    • alignment of "Import wiki history" checkbox on the individual import form is weird
    • it looks like gollum is case-insensitive, e.g. [Tips n' Tricks] on Home.textile Can we cleanly support that too?
    • Acceptable-Formulae.textile
      • extra newlines are inserted (iirc, fix with: html2text.BODY_WIDTH = 0)
      • "&" in gollum tag doesn't work
    • textile specific issues (mostly from Acceptable-Formulae.textile) These can be a separate ticket that we merge later. I want to merge this main wiki branch soon :)
      • after "There are good reasons for this:" should be a numbered list
      • table structure is lost
      • Niche Stuff <a name="Niche_Stuff"></a>
      • *[[this checklist|Troubleshooting]]* doesn't convert right (Home.textile)
     

    Related

    Tickets: #6622

  • Igor Bondarenko - 2013-09-30

    Created:

    • #449: [#6534] Wiki importer for github: markup fixes (3cp)
    • #450: [#6534] Wiki importer for github: textile markup fixes (3cp)

    After #449 is done you'll be able to merge this and then we can work on textile-related issues in #450.

     

    Related

    Tickets: #6534

  • Igor Bondarenko - 2013-10-02
    • status: in-progress --> code-review
     
  • Igor Bondarenko - 2013-10-02

    Closed #449. Force-pushed je/42cc_6534

    You can review fixes now. Textile related issues we'll address in a #450 later.

     
  • Dave Brondsema

    Dave Brondsema - 2013-10-04

    Merged je/42cc_6534 to master.

     
  • Dave Brondsema

    Dave Brondsema - 2013-10-04
    • status: code-review --> in-progress
     
  • Igor Bondarenko - 2013-10-11
    • status: in-progress --> code-review
     
  • Igor Bondarenko - 2013-10-11

    Closed #450. je/42cc_6534_textile_fix

    • table structure is lost

    It lost on all markups (except markdown, which we're not converting at all), not
    just on textile. It caused by html2text, if we need to fix it, than it could be
    separate ticket, since we need to fix it for all markups. It may require changes
    in html2text. Though, I didn't investigate it further, so don't sure.

    • Niche Stuff <a name="Niche_Stuff"></a>

    If you take a look at the source of textile page, you'll see that <a> is
    actually in the source, so it's not a bug of a converter. https://github.com/mxcl/homebrew/wiki/Acceptable-Formulae/_edit

     
  • Dave Brondsema

    Dave Brondsema - 2013-10-14

    Looks like this commit tried to create 2 wiki pages with the same title (since extension is dropped).

    * commit 95faaa840a1676378ead4bf87310d04428ffa3a2
    | Author: berinle <berinle@gmail.com>
    | Date:   Thu Jun 16 09:40:38 2011 -0700
    | 
    |     Updated Unexpected Error (markdown => textile)
    | 
    |  Unexpected-Error.md => Unexpected-Error.textile | 2 ++
    |  1 file changed, 2 insertions(+)
    
     
  • Dave Brondsema

    Dave Brondsema - 2013-10-15

    Created [#6758] separately for table support. Guess its not easy.

     

    Related

    Tickets: #6758

  • Dave Brondsema

    Dave Brondsema - 2013-10-15
    • status: code-review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2013-10-15

    Merged je/42cc_6534_textile_fix. Going to leave open for the above issue Tim posted, and then we'll be done :)

     
  • Igor Bondarenko - 2013-10-16

    Created #460: [#6534] GH wiki importer: duplicate key error (1cp) for that

     

    Related

    Tickets: #6534

  • Igor Bondarenko - 2013-10-21
    • status: in-progress --> code-review
     
  • Igor Bondarenko - 2013-10-21

    Closed #460. je/42cc_6534_last_fix

    Basically, there was the issue with multiple deletions for the same file in consecutive commits. It happens if there are sequence of commits like "delete file -> revert -> delete the same file", or sequence of commits with file renames, which change only file extension, like "Page.md -> Page.textile -> Page.md". taskd processes such commits within one second, so we end up creating pages with the same title (like 'Page 12:12:12 ...'). Fixed that by adding microseconds to deleted pages' titles.

    Also added some logic to handle page renames better.

     
  • Dave Brondsema

    Dave Brondsema - 2013-10-25
    • status: code-review --> closed
    • Milestone: forge-backlog --> forge-nov-01
     
<< < 1 2 (Page 2 of 2)

Log in to post a comment.