#6458 Create wiki importer for Google Code

v1.0.1
closed
General
2015-08-20
2013-07-12
Cory Johns
No

Create an importer for wiki pages for Google Code projects. The importer should follow the framework discussed on the mailing list and integrate with the project importer from [#6456].

The list of pages will have to be parsed out from the list page (e.g., https://code.google.com/p/support/w/list with a jQuery selector of #resultstable tbody a). The page contents and comments will need to be extracted from the elements with an id of wikicontents and commentlist, respectively.

Support for a user-mapping might be nice, but is unlikely to be terribly useful, so comments should probably just be attributed to *anonymous, perhaps with a link prepended to the comment that links to the Google Code user profile page. (Or maybe we should only import the page contents?)

Related

Tickets: #6456

Discussion

  • Dave Brondsema

    Dave Brondsema - 2013-07-15
    • Milestone: forge-backlog --> forge-jul-26
     
  • Dave Brondsema

    Dave Brondsema - 2013-07-15
    • Size: --> 4
     
  • Cory Johns - 2013-07-15

    The HTML to Markdown library is GPL so we'll want to put this in a separate library / package, and we should move the MediaWiki importer with it.

    Also, there is some example code parsing the wiki data here

     
  • Cory Johns - 2013-07-15

    The example code only parses out Featured Wiki links from the project summary, not the actual wiki content.

     
  • Cory Johns - 2013-07-15
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,4 @@
    -Create an importer for wiki pages for Google Code projects.  The importer should follow the framework discussed in the [mailing list](http://mail-archives.apache.org/mod_mbox/incubator-allura-dev/201307.mbox/%3CCAEMb8zUg7Kem2aDxVzAqF3U4aKEj7jL3UO=UpX=2+NfY_P8kXQ@mail.gmail.com%3E).
    +Create an importer for wiki pages for Google Code projects.  The importer should follow the framework discussed on the [mailing list](http://mail-archives.apache.org/mod_mbox/incubator-allura-dev/201307.mbox/%3CCAEMb8zUg7Kem2aDxVzAqF3U4aKEj7jL3UO=UpX=2+NfY_P8kXQ@mail.gmail.com%3E) and integrate with the project importer from [#6456].
    
     The list of pages will have to be parsed out from the list page (e.g., https://code.google.com/p/support/w/list with a jQuery selector of `#resultstable tbody a`).  The page contents and comments  will need to be extracted from the elements with an `id` of `wikicontents` and `commentlist`, respectively.
    
    • Milestone: forge-jul-26 --> forge-backlog
     

    Related

    Tickets: #6456

  • Cory Johns - 2013-07-15
    • Milestone: forge-backlog --> forge-jul-12
     
  • Cory Johns - 2013-07-15
    • Milestone: forge-jul-12 --> forge-jul-26
     
    • status: open --> in-progress
    • assigned_to: Tim Van Steenburgh
     
  • Dave Brondsema

    Dave Brondsema - 2013-07-26
    • Milestone: forge-jul-26 --> forge-aug-09
     
    • status: in-progress --> code-review
     
  • forge:tv/6458
    googlecodewikiimporter:master

     
  • Dave Brondsema

    Dave Brondsema - 2013-08-06
    • QA: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2013-08-06
    • status: code-review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2013-08-06
    • It seems a little weird to have the allura commit to add a few bits of wiki support into the main GoogleCodeProjectExtractor. Some bits of that are certainly good (adding BASE_URL and setting self.gc_project_name) but the wiki-specific parts would seem better in the googlecodewikiimporter repo (e.g. as a subclass of GoogleCodeProjectExtractor)
    • links between wiki pages should get special handling so that they don't stay as e.g. href="/p/modwsgi/wiki/InstallationInstructions" which just happens to still work since the URL structure matches, but fails to work if the project or mount point is changed.
    • The GC wiki has a notion of a "main page" that you go to if you click on the Wiki tab in the title bar. Can we detect that, and set the wiki's home page setting to that? Then when you go to the wiki on Allura, you'll see new content right away instead of it being hidden under "Browse Pages"
     
    • status: in-progress --> code-review
     
  • Changes on:

    forge:tv/6458
    googlecodewikiimporter:tv/6458

     
  • Dave Brondsema

    Dave Brondsema - 2013-08-08
    • status: code-review --> closed
     

Log in to post a comment.