Wikis are git repositories and can be accessed like git clone https://github.com/OpenRefine/OpenRefine.wiki
for example. Check the main repo API first to see if the repo has wiki enabled. You can see https://sourceforge.net/p/googlecodewikiimporter/git/ for reference as an example of another wiki importer. It is a separate repo because it needs the "html2text" package to convert html to markdown, and that is a GPL library.
Github supports many markup types. Find a full list and determine what the best way to convert them to markdown is. My guess is that few formats will have tools available to convert them directly to markdown, so my likely recommendation would be to render them as HTML (using pypeline as a generic way to handle many of those formats) and then html2text to get it into markdown.
If html2text or any other GPL library is needed, this will have to be a separate repo from the main Allura repo. So please evaluate & test the conversion options first, before putting code into place.
A second phase to all this (i.e. do it separately, after the basic import is all working) would be to handle revision history. This would mean going through each commit in the wiki git repo, and converting & updating every file that changes. This may be very time consuming, so when we get to it, we may want it to be a checkbox option, so users only do it if they want it.
Created
#415 [#6534] Wiki importer for github: basic (3cp)
#416 [#6534] Wiki importer for github: revision history (3cp)
Related
Tickets:
#6534Closed #415.
je/42cc_6534
Found some bugs in markup conversion. Created #435: [#6534] Wiki importer for github: convert markup properly (3cp)
Related
Tickets:
#6534Perhaps similar to [#6622]?
Related
Tickets:
#6622Exactly that :) Will do that separately, then.
Closed #416. Force-pushed
je/42cc_6534
We'll implement proper markup conversion in [#6622].
Related
Tickets:
#6622tool_option
is copied from another method, should be its own.import git
is a 3rd-party lib so should be in the 2nd section of imports, not the 1st sectionforgewiki/templates/wiki/page_history.html
change doesn't seem right. It previously showed the revision date and now it shows the previous revision date, it seems.formatting
render_any_markup
returns an HTML string. If we're handling Markdown input, we should keep the Markdown and not render it at all (just special conversions later in [#6622]). For all others, we might want to run it through html2text so that it can be markdown instead of HTML, but so far things are looking pretty good just saving HTML in the wiki markdown content, and staying free from the html2text dependency is nice.mediawiki2markdown
function which we might want to use as a special case. http://pypeline.sourceforge.net/tour.html#extending-pypeline shows how to extend pypeline and we can do that in Allura, but I'd rather see the support added to pypeline itself, so if there are good conversion methods we can create, lets go ahead and add it to pypeline directly.page links
Related
Tickets:
#6622There are also gollum tags (e.g. links to other wiki pages) that can be in any source format. We'll need to handle those.
That could be done as part of [#6622], but so far [#6622] is just for handling github markdown -> Allura markdown conversions. And gollum tags need to be handled for all formats of conversion (markdown included?)
Related
Tickets:
#6622Ok, that's what I found about github-supported formats, that pypeline can't handle. We can add support for couple of formats to pypeline pretty easily (asciidoc, mediawiki), but the rest of them require pretty much work.
ASCIIDoc: .asciidoc
http://asciidoc.org/asciidocapi.html
Uses GPLv2
Requires installing asciidoc package system-wide. API is distributed as a standalone python script, so should be included directly into pypeline repo, or installed manually.
I think pypeline support can be done in 2-3 cp.
Org Mode: .org
There are couple of org mode parsing libraries for python, but it seems that all of them just parse org mode files into tree of Orgnode objects, and there is no support for converting that into html. I'm not familiar with this format at all. I think adding such support might be pretty heavy.
Pod: .pod
Perl documentation system. Seems like only perl tool exists for converting this. Should be possible to write a python-wraaper around command-line tool and use it in pypeline, but this may take awhile.
RDoc: .rdoc
Ruby documentation system. Seems like only ruby tool exists for converting this. Also wrapper for command-line tool can be created, I think.
MediaWiki: .mediawiki, .wiki
Uses GPLv3
Can add support to pypeline using python-mediawiki. Allura's
mediawiki2markdown
already uses it.Also, can convert to Allura-markdown using
mediawiki2markdown
. Both cases shouldn't be hard to implement. 1-2 cp, I guess.Created:
Related
Tickets:
#6534Using searches just on README files, to get a ballpark on popularity (e.g. https://github.com/search?q=path%3AREADME.asciidoc&type=Code&ref=searchresults) I get:
Rdoc is definitely popular and mediawiki is not. However, since we have an easy approach for mediawiki (which may be more popular for wikis than readmes) let's just do mediawiki. Go with the
mediawiki2markdown
approach. Remember that depends on optional GPL libraries so keep this conversion optional too.Let's leave the rest for later, we'll see what demand is for them.
If it's possible to list the supported formats on the import form's description text, that would be great.
Which reminds me, a separate issue is that we need an individual tool importer for github wiki. That is, specifically, a
GitHubWikiImportController
set on the importer'scontroller
attribute.Created:
Related
Tickets:
#6534Closed #438. Force-pushed
je/42cc_6534
It's possible to embed images, link to images and files (external and internal) using gollum tags https://github.com/gollum/gollum/wiki#file-links Should we handle those too? It'll require importing all the images/files from the github wiki repo into allura as attachments or something. Should we do this right now?
Now I think we should go with
html2text
conversion of html rendered byh.render_any_markup()
for all input formats. There are several reasons for that:[[_TOC_]]
tag directly into markdown[TOC]
tag.Are you ok with that?
Yep, makes sense.
Like I've mentioned before, html2text needs to stay optional, so keep that conversion (and everything that must happen after it) as optional depending on the presence of html2text. If its not installed, you'll just get a simpler less complete conversion.
What sort of internal files are possible? To files in the git repo? Or can you have images & files in the wiki itself? Have any examples?
I think external references can be kept as-is.
You can have any type of file in the wiki repo and can reference it from the page. When referencing image from the page text it ends up embeded into a page, and when referencing something else (e.g. pdf) it just displayed as link to the file.
I've created example page showing this capabilities https://github.com/jetmind/dot/wiki/Files
Source looks like this:
Also, when embedding an image there are couple of available options (like show in frame, resize, align, etc). Don't sure if we can convert all of those.
I think we can upload such files as attachments to pages from where they are referenced, and convert links/embedd tags to markdown format.
External references can be kept as-is, indeed.
Closed #443.
je/42cc_6534
Ok, I see. I made [#6673] for it. We should definitely track this need, but I don't want to try to do too much all at once on this ticket :)
Related
Tickets: #6673
Closed #441, #440.
je/42cc_6534