Apache Allura™ / Tickets / #6534 Wiki importer for github

Anton Kasyanov - 2013-08-13

Created
#415 [#6534] Wiki importer for github: basic (3cp)
#416 [#6534] Wiki importer for github: revision history (3cp)

Related

Tickets: ~~#6534~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anton Kasyanov - 2013-08-13

status: open --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-05

Closed #415. je/42cc_6534

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-10

Found some bugs in markup conversion. Created #435: [#6534] Wiki importer for github: convert markup properly (3cp)

Related

Tickets: ~~#6534~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-10

Perhaps similar to [#6622]?

Related

Tickets: ~~#6622~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-10

Exactly that :) Will do that separately, then.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-10

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-10

Closed #416. Force-pushed je/42cc_6534

We'll implement proper markup conversion in [#6622].

Related

Tickets: ~~#6622~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-13

QA: Dave Brondsema
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-13

status: code-review --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-13

Docstring for tool_option is copied from another method, should be its own.

Minor style nitpick: import git is a 3rd-party lib so should be in the 2nd section of imports, not the 1st section

the forgewiki/templates/wiki/page_history.html change doesn't seem right. It previously showed the revision date and now it shows the previous revision date, it seems.

the github project name needs to allow uppercase characters (e.g. OpenRefine/OpenRefine)

formatting

render_any_markup returns an HTML string. If we're handling Markdown input, we should keep the Markdown and not render it at all (just special conversions later in [#6622]). For all others, we might want to run it through html2text so that it can be markdown instead of HTML, but so far things are looking pretty good just saving HTML in the wiki markdown content, and staying free from the html2text dependency is nice.

I tested with https://github.com/mxcl/homebrew/wiki/_pages and found a few issues:

there's a mediawiki page which isn't supported by pypeline at all: http://pypeline.sourceforge.net/tour.html#getting-started

for all the formats supported by github and not by pypeline, can you evaluate adding support to pypeline? Actually, for mediawiki, we already have a mediawiki2markdown function which we might want to use as a special case. http://pypeline.sourceforge.net/tour.html#extending-pypeline shows how to extend pypeline and we can do that in Allura, but I'd rather see the support added to pypeline itself, so if there are good conversion methods we can create, lets go ahead and add it to pypeline directly.

textile pages end up displaying as plain text because they have a tab in front of each line of HTML, and that indentation triggers markdown's preformatted mode. Can you figure out where that's coming from and make sure we don't get leading whitespace on lines?

page links

links go back to github still. We should rewrite all links that match the wiki URL prefix. I think you've done this for the trac import already, so that technique can be re-used (perhaps factored out into a helper).

many page names have dashes instead of spaces in them. I haven't investigated this fully to know how we want to handle.

Related

Tickets: ~~#6622~~
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-13

There are also gollum tags (e.g. links to other wiki pages) that can be in any source format. We'll need to handle those.

That could be done as part of [#6622], but so far [#6622] is just for handling github markdown -> Allura markdown conversions. And gollum tags need to be handled for all formats of conversion (markdown included?)

Related

Tickets: ~~#6622~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-16

Ok, that's what I found about github-supported formats, that pypeline can't handle. We can add support for couple of formats to pypeline pretty easily (asciidoc, mediawiki), but the rest of them require pretty much work.

ASCIIDoc: .asciidoc

http://asciidoc.org/asciidocapi.html

Uses GPLv2

Requires installing asciidoc package system-wide. API is distributed as a standalone python script, so should be included directly into pypeline repo, or installed manually.

I think pypeline support can be done in 2-3 cp.

Org Mode: .org

There are couple of org mode parsing libraries for python, but it seems that all of them just parse org mode files into tree of Orgnode objects, and there is no support for converting that into html. I'm not familiar with this format at all. I think adding such support might be pretty heavy.

Pod: .pod

Perl documentation system. Seems like only perl tool exists for converting this. Should be possible to write a python-wraaper around command-line tool and use it in pypeline, but this may take awhile.

RDoc: .rdoc

Ruby documentation system. Seems like only ruby tool exists for converting this. Also wrapper for command-line tool can be created, I think.

MediaWiki: .mediawiki, .wiki

Uses GPLv3

Can add support to pypeline using python-mediawiki. Allura's mediawiki2markdown already uses it.

Also, can convert to Allura-markdown using mediawiki2markdown. Both cases shouldn't be hard to implement. 1-2 cp, I guess.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-16

Created:

#438: [#6534] Wiki importer for github: small fixes (1cp)

#439: [#6534] Wiki importer for github: textile pages fix (1cp)

#440: [#6534] Wiki importer for github: handle links (2cp)

#441: [#6534] Wiki importer for github: handle gollum tags (4cp)

Related

Tickets: ~~#6534~~
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-16

Using searches just on README files, to get a ballpark on popularity (e.g. https://github.com/search?q=path%3AREADME.asciidoc&type=Code&ref=searchresults) I get:

asciidoc: 1000

org: 3750

pod: 2200

rdoc: 135,000

mediawiki: 800

Rdoc is definitely popular and mediawiki is not. However, since we have an easy approach for mediawiki (which may be more popular for wikis than readmes) let's just do mediawiki. Go with the mediawiki2markdown approach. Remember that depends on optional GPL libraries so keep this conversion optional too.

Let's leave the rest for later, we'll see what demand is for them.

If it's possible to list the supported formats on the import form's description text, that would be great.

Which reminds me, a separate issue is that we need an individual tool importer for github wiki. That is, specifically, a GitHubWikiImportController set on the importer's controller attribute.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-17

Created:

#442: [#6534] Wiki importer for github: handle mediawiki (2cp)

#443: [#6534] Wiki importer for github: import into existing project (1cp)

Related

Tickets: ~~#6534~~
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-17

Closed #438. Force-pushed je/42cc_6534

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-18

It's possible to embed images, link to images and files (external and internal) using gollum tags https://github.com/gollum/gollum/wiki#file-links Should we handle those too? It'll require importing all the images/files from the github wiki repo into allura as attachments or something. Should we do this right now?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-18

Now I think we should go with html2text conversion of html rendered by h.render_any_markup() for all input formats. There are several reasons for that:

Simplifies handling of gollum tags. For all formats we could just convert them to appropriate markdown tags. And with current approach we should handle two cases for that: one is when wiki in markdown and we keeping that, and the other - when wiki in any other markup and we keeping html.

We'll be able to convert gollum [[_TOC_]] tag directly into markdown [TOC] tag.

It will keep imported history cleaner (seeing diffs of generated html isn't very pleasant)

Are you ok with that?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-18

Yep, makes sense.

Like I've mentioned before, html2text needs to stay optional, so keep that conversion (and everything that must happen after it) as optional depending on the presence of html2text. If its not installed, you'll just get a simpler less complete conversion.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-18

What sort of internal files are possible? To files in the git repo? Or can you have images & files in the wiki itself? Have any examples?

I think external references can be kept as-is.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-19

You can have any type of file in the wiki repo and can reference it from the page. When referencing image from the page text it ends up embeded into a page, and when referencing something else (e.g. pdf) it just displayed as link to the file.

I've created example page showing this capabilities https://github.com/jetmind/dot/wiki/Files

Source looks like this:

[[Example pdf|example.pdf|width=400px]] [[Link to image|/image.jpg|width=400px]] [[image.jpg|frame|alt=hello|width=400px]] [[http://eofdreams.com/data_images/dreams/image/image-07.jpg|width=400px]]

Also, when embedding an image there are couple of available options (like show in frame, resize, align, etc). Don't sure if we can convert all of those.

I think we can upload such files as attachments to pages from where they are referenced, and convert links/embedd tags to markdown format.

External references can be kept as-is, indeed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-19

Closed #443. je/42cc_6534

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-09-19

Ok, I see. I made [#6673] for it. We should definitely track this need, but I don't want to try to do too much all at once on this ticket :)

Related

Tickets: #6673

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-09-20

Closed #441, #440. je/42cc_6534

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Apache Allura™

Forge software for hosting software projects

Milestone

Searches

Help

#6534 Wiki importer for github

Related

Discussion

Related

Related

Related

Related

formatting

page links

Related

Related

ASCIIDoc: .asciidoc

Org Mode: .org

Pod: .pod

RDoc: .rdoc

MediaWiki: .mediawiki, .wiki

Related

Related

Related