Apache Allura™ / Tickets / #6464 Create tracker importer for Google Code using CSV and scraping

Cory Johns - 2013-07-24

Milestone: forge-aug-09 --> forge-backlog
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-07-24

Milestone: forge-aug-09 --> forge-backlog
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-07-26

Milestone: forge-backlog --> forge-aug-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Description has changed:

Diff:

--- old
+++ new
@@ -2,6 +2,6 @@

 The importer should follow the framework discussed on the [mailing list](http://mail-archives.apache.org/mod_mbox/incubator-allura-dev/201307.mbox/%3CCAEMb8zUg7Kem2aDxVzAqF3U4aKEj7jL3UO=UpX=2+NfY_P8kXQ@mail.gmail.com%3E) and integrate with the project importer from [#6456].

-The list of tickets and their metadata can be retrieved via the CSV export list, e.g., https://code.google.com/p/modwsgi/issues/csv but the ticket body and comments will need to be scraped from the web interface.  The description and comments can be retrieved from, e.g., https://code.google.com/p/modwsgi/issues/detail?id=22 by iterating over the items with `id="hc\d+"` or `class="issuedescription|issuecomment"`.
+        The list of tickets and their metadata can be retrieved via the CSV export list, e.g., https://code.google.com/p/modwsgi/issues/csv but the ticket body and comments will need to be scraped from the web interface.  The description and comments can be retrieved from, e.g., https://code.google.com/p/modwsgi/issues/detail?id=22 by iterating over the items with `id="hc\d+"` or `class="issuedescription|issuecomment"`.

 The description and comments on issues don't support wiki syntax or HTML, so we can just convert them to text.  User mapping will have the same issues, so whatever we end up doing in [#6461] will apply here.

Size: --> 2

Tickets: ~~#6456~~
Tickets: ~~#6461~~

Cory Johns - 2013-07-29

assigned_to: Cory Johns
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-07

status: open --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-07

Why did you add a bunch of space to that line?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-07

Not sure why my comment didn't post when I changed the status (got a random 500), but here it is:

allura:cj/6464

I will probably add more tests with some actual HTML data but this is working and ready for review.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-07

QA: Dave Brondsema
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-07

status: code-review --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-07

Failure against https://code.google.com/p/google-code-feed-gadget/issues/detail?id=1 and http://code.google.com/p/modwsgi/issues/detail?id=11

File "/home/dbrondsema/dbrondsema-1019/forge/ForgeImporters/forgeimporters/google/tracker.py", line 53, in import_tool self.process_fields(ticket, issue) File "/home/dbrondsema/dbrondsema-1019/forge/ForgeImporters/forgeimporters/google/tracker.py", line 82, in process_fields owner=issue.get_issue_owner(), File "/home/dbrondsema/dbrondsema-1019/forge/ForgeImporters/forgeimporters/google/__init__.py", line 166, in get_issue_owner return UserLink(self.page.find(id='issuemeta').find('th', text=re.compile('Owner:')).findNext().a) File "/home/dbrondsema/dbrondsema-1019/forge/ForgeImporters/forgeimporters/google/__init__.py", line 185, in __init__ self.name = tag.string.strip() AttributeError: 'NoneType' object has no attribute 'string'

Would we want to convert # of stars to # of upvotes?

Fields for type, priority, opsys, component (more possible?) should be added as custom fields and converted.

Need to use skip_mod_date (grep for examples) to preserve the mod_date you set.

Need to disable notifications. googlecodewikiimporter does this already, and for the Trac importer I suggested looking at a way to make it happen for all importers.

Need to call g.post_event('project_updated')

Everything is done as the current user. Would it be better to do it as *anonymous? That's what some of our other importers do.

Since GC tickets and comments are plain text, whitespace is significant and should be preserved. Also special markdown chars need to be escaped. E.g. http://code.google.com/p/modwsgi/issues/detail?id=1 and http://code.google.com/p/modwsgi/issues/detail?id=4#c5 To do so, use forgeblog.command.rssfeeds.plain2markdown() That needs html2text which is GPL'd, so make sure you handle the lack of html2text gracefully. (And if you refactor plain2markdown to a more generic place, make sure you update SF's forge-classic code reference to it)

Comments aren't posted on the Allura ticket in sequential order. They seem random.

Attachment on a comment didn't get imported (from modwsgi #1)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-13

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-13

Changes force-pushed to:
allura:cj/6464

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-14

status: code-review --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-14

Needs to be rebased, there are some significant conflicts with master.

And then check to see if https://pypi.python.org/pypi/GoogleCodeWikiImporter needs corresponding changes too.

Better to call h.plain2markdown(..., preserve_multiple_spaces=True) than h.plain2markdown(..., True)

ForgeBlog/forgeblog/tests/test_commands.py:test_plain2markdown should be moved to Allura's helper test file. And would be very good to have a test case for the \\ you added to md_chars_matcher_all (was it just a typo fix?)

Our internal forge-classic repo needs changes to correspond to the plain2markdown/re_preserve_spaces changes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-14

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-14

allura:cj/6464
forge-classic:cj/6464
googlecodewikiimporter:cj/6464

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-15

status: code-review --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-15

HTML over escaping on https://sf-dbrondsema-1015.sb.sf.net/p/modwsgi3r/tickets/11/

error on googlecodewikiimporter:cj/6464 https://sourceforge.net/p/allura/pastebin/520bf37f85540d536afb3f54

error: https://sourceforge.net/p/allura/pastebin/520cf2a9d46bb44f8e65e3c3
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-19

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-19

Force-pushed.
allura:cj/6464
forge-classic:cj/6464
googlecodewikiimporter:cj/6464

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-20

~~over-encoded summary line~~

voting needs to be enabled on the ticket for you to see the votes

close milestones if all the tickets are closed

sort the milestones before saving them, so they show up sorted

wiki import is broken - I'm checking on this

preserve original ticket #s
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-20

Fixes pushed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-08-21

status: code-review --> closed

Milestone: forge-aug-09 --> forge-aug-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-08-21

Oh man, that's nice to see. :-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Apache Allura™

Forge software for hosting software projects

Milestone

Searches

Help

#6464 Create tracker importer for Google Code using CSV and scraping

Related

Discussion

Related