#4186 Import from mediawiki into wiki tool

v1.0.0
closed
nobody
General
2015-08-20
2012-05-10
No

We should have a script to import from a mediawiki database dump into a discussion tool. Maybe use sqlite to load the database dump? But also have an option to connect to a live database, since we'll be able to take advantage of that in some situations.

See scripts/teamforge-import.py for reference on how to create wiki pages programmatically. That script uses 2 phases, first to extract the data and save it, second to load the data into Allura. We should do that, but make the data files be free from phpbb details, so the loading script can be generic. Some day people can write extractors from other wiki software and use the same loading script.

This code should be within the ForgeWiki tool, and ideally exposed as a paster command.

The import should handle as much as is possible: wiki pages (including converting the format to markdown - not sure if there is any existing conversion libraries available), attachments, history of each page (our wiki pages are versioned too), permissions, "Talk" pages (can go into the discussion for a page - not sure the best way to split up into separate comments), other config options, etc.

Related

Tickets: #4186
Tickets: #4660
Tickets: #5190

Discussion

1 2 > >> (Page 1 of 2)
  • Dave Brondsema

    Dave Brondsema - 2012-05-10
    • labels: --> import
    • milestone: limbo --> forge-backlog
     
  • Dave Brondsema

    Dave Brondsema - 2012-05-14
    • Description has changed:

    Diff:

    --- old 
    +++ new 
    @@ -1,5 +1,5 @@
     We should have a script to import from a mediawiki database dump into a discussion tool.  Maybe use sqlite to load the database dump?  But also have an option to connect to a live database, since we'll be able to take advantage of that in some situations.
    
    -See scripts/teamforge-import.py for reference on how to create wiki pages programmatically.
    +See scripts/teamforge-import.py for reference on how to create wiki pages programmatically.  That script uses 2 phases, first to extract the data and save it, second to load the data into Allura.  We should do that, but make the data files be free from phpbb details, so the loading script can be generic.  Some day people can write extractors from other wiki software and use the same loading script.
    
     This code should be within the ForgeWiki tool, and ideally exposed as a paster command.
    
     
  • Dave Brondsema

    Dave Brondsema - 2012-05-18
    • Description has changed:

    Diff:

    --- old 
    +++ new 
    @@ -3,3 +3,6 @@
     See scripts/teamforge-import.py for reference on how to create wiki pages programmatically.  That script uses 2 phases, first to extract the data and save it, second to load the data into Allura.  We should do that, but make the data files be free from phpbb details, so the loading script can be generic.  Some day people can write extractors from other wiki software and use the same loading script.
    
     This code should be within the ForgeWiki tool, and ideally exposed as a paster command.
    +
    +The import should handle as much as is possible: wiki pages (including converting the format to markdown - not sure if there is any existing conversion libraries available), attachments, history of each page (our wiki pages are versioned too), permissions, "Talk" pages (can go into the discussion for a page - not sure the best way to split up into separate comments), other config options, etc.
    +
    
     
  • Dave Brondsema

    Dave Brondsema - 2012-05-18
    • labels: import --> import, 42cc
     
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-05-21

    I've created the following tickets:
    - #57: [#4186] Convert mediawiki to markdown (3cp)
    - #58: [#4186] Create basic paster command skeleton to import from mediawiki (2cp)
    - #59: [#4186] Import pages (1cp)
    - #60: [#4186] Import history of each page (1cp)
    - #61: [#4186] Import attachments (1cp)
    - #62: [#4186] Import talk pages (1cp)
    - #63: [#4186] Import refactoring and optimization (2cp)
    Total: 11cp

     

    Related

    Tickets: #4186

  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-05-22

    Originally by: tramadolmen

    Can i use for conversion mediawiki to markdown third party package https://github.com/zikzakmedia/python-mediawiki?

    In requirements-common.txt will be something like that:
    -e git+https://github.com/zikzakmedia/python-mediawiki.git...

     
  • Dave Brondsema

    Dave Brondsema - 2012-05-22

    Yes, you can use that package for the conversion. Our deployment process requires that libraries be packaged up, so the github reference won't actually work for us. For now can you just manually install python-mediawiki? And then when we're done, we can make a package of it for our deployment.

     
  • Dave Brondsema

    Dave Brondsema - 2012-05-22

    Looks like https://github.com/erikrose/mediawiki-parser might be another option. The python-mediawiki is GPL and I'm not sure what license mediawiki-parser uses. It would be preferable not to depend on a GPL licensed package, but if it is the only option, its okay for this.

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-05-23

    Originally by: tramadolmen

    mediawiki-parser depends on pijnu. pijnu have GPL licence and can't install without patching setup.py file.

     
    Last edit: Anonymous 2015-07-08
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-05-28

    closed #57

     
  • Dave Brondsema

    Dave Brondsema - 2012-05-29

    Looking good so far. I'm not exactly clear why you need bbcode, but if it works, great :)

    I would recommend adding a test specifically for mediawiki formatting, rather than bbcode though. For example:

    +    mw_formatting = "'''bold''' ''italics''"
    +    mw_output = converters.mediawiki2markdown(mw_formatting)
    +    assert "**bold** _italics_" in mw_output
    
     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-05-30

    Originally by: tramadolmen

    I have added bbcode because in ticket you say:

    We should do that, but make the data files be free from phpbb details, so the loading script can be generic.

     
  • Dave Brondsema

    Dave Brondsema - 2012-05-30

    Oh, my fault :( I made a very similar ticket for migrating phpbb, and then copied it for this mediawiki ticket. I meant to say "make the data files be free from mediawiki details, so the loading script can be generic."

     
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-05-30

    closed #58

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-06-04
     
    Last edit: Anonymous 2016-12-24
  • Dave Brondsema

    Dave Brondsema - 2012-06-04

    We need the file itself attached to the wiki page.

    And then any links or references (e.g. image tags) to be be converted to appropriate Markdown references. Here's how you link to an attachment [link text here](PageNameHere/attachment/FileNameHere.pdf) and to use an attachment as an image inline, do: [[img src=attached-image.jpg alt=foobar]]

     
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-06-05

    closed #59

     
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-06-05

    closed #60

     
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-06-05

    closed #62

     
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-06-06

    closed #61, changes are in 42cc_4186

    We have one ticket left regarding this feature - #63 [#4186] Import refactoring and optimization.

    Please review our changes or give us some test data to check the whole import procedure

    • status: open --> code-review
     

    Related

    Tickets: #4186

  • Dave Brondsema

    Dave Brondsema - 2012-06-11
    • status: code-review --> open
     
  • Dave Brondsema

    Dave Brondsema - 2012-06-11

    Sent test data in email.

     
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-06-13

    closed #63 and pushed changes to 42cc_4186

    After a complete review we decided to re-write some parts of the code, and it was already almost done in #63 but we need to finish a few new tickets:
    - #85: [#4186] Import/export talk, attachments and history (2cp)
    - #86: [#4186] Extract data from sqlite (1cp)

    At the moment it works with a MySQL mediawiki database directly and imports pages only. Here's how we run it:

    $ paster wiki2markdown ../Allura/development.ini -d dump_dir -n Projects -p test -s mysql --user root --password qwerty --db_name mediawiki_test
    
    • status: open --> in-progress
     

    Related

    Tickets: #4186

  • Dave Brondsema

    Dave Brondsema - 2012-06-14

    If loading the msyql dump into sqlite is not real easy, I think we have options for reading from mysql directly. I can imagine there are syntax differences which would be annoying to deal with (and perhaps a lot of work to truly parse & translate correctly)

     
  • Yaroslav Luzin

    Yaroslav Luzin - 2012-06-25

    ok, that's easier, closed #86: [#4186] Extract data from sqlite

     

    Related

    Tickets: #4186

1 2 > >> (Page 1 of 2)

Log in to post a comment.