#7124 Validate Trac URLs before importing

v1.2.0
closed
General
2015-08-20
2014-01-31
Cory Johns
No

https://sourceforge.net/nf/admin/task_manager/view/52ebe3f21be1ce2af7b78b31

Validate Trac URLs so we don't get 404s or spurious NoneType errors during import.

Discussion

  • Cory Johns - 2014-02-03

    allura:cj/7124
    tracwikiimporter:cj/7124
    sfx:cj/7124

     
  • Cory Johns - 2014-02-03
    • status: in-progress --> code-review
    • Size: 1 --> 2
     
  • Dave Brondsema

    Dave Brondsema - 2014-02-04
    • QA: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2014-02-04
    • status: code-review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2014-02-04

    Can we de-dupe the block of code following # remove extraneous /wiki/[PageName], which is in both repos?

    It would be nice to validate more than a 200 response, since I could try to import from http://google.com and it'll let me. I don't see any HTML or HTTP header that would be useful to check though.

     
  • Cory Johns - 2014-02-04

    The URL checks for all the other importer types are all just doing a 200 check. I agree something more robust would be nice but this will catch all the errors we've seen so far.

    The URL normalization in tracwikiexporter.scripts.wiki_from_trac.extractors.WikiExporter should be redundant now, since it should happen in the controller, so we could just remove it, but I wasn't sure why it was put so deep in the first place so I left it.

    We could also switch the WikiExtractor code to use the validator to do the normalization, which would at least reduce the duplication.

     
  • Dave Brondsema

    Dave Brondsema - 2014-02-04

    I put the URL normalization in the WikiExporter because that class can be used from elsewhere besides the form controller. For example, the commandline script tracwikiimporter/scripts/wiki_from_trac/wiki_from_trac.py. Having it use the validator's normalization sounds good to me

     
  • Cory Johns - 2014-02-04

    Change pushed to tracwikiimporter:cj/7124

     
  • Cory Johns - 2014-02-04
    • status: in-progress --> code-review
     
  • Dave Brondsema

    Dave Brondsema - 2014-02-05
    • status: code-review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2014-02-05
    ERROR: tracwikiimporter.tests.test_importer:TestWikiExporter.test_page_list__special_chars
      vim +184 tracwikiimporter/tests/test_importer.py  # test_page_list__special_chars
        exp = WikiExporterFunnyPages('http://example.com/trac/', Mock())
      vim +98  tracwikiimporter/scripts/wiki_from_trac/extractors.py  # __init__
        self.base_url = TracURLValidator().to_python(base_url)
      vim +419 /var/local/env-allura/lib/python2.7/site-packages/FormEncode-1.2.4-py2.7.egg/formencode/api.py  # to_python
        value = tp(value, state)
      vim +43  /home/dbrondsema/dbrondsema-1019/forge/ForgeImporters/forgeimporters/trac/__init__.py  # _to_python
        raise fev.Invalid(self.message('unavailable', state), value, state)
    Invalid: This project is unavailable for import
    
    ERROR: tracwikiimporter.tests.test_importer:TestWikiExporter.test_url_canonicalization
      vim +169 tracwikiimporter/tests/test_importer.py  # test_url_canonicalization
        self.assertEqual(WikiExporter('http://foo.com/wiki/bar/', None).base_url,
      vim +98  tracwikiimporter/scripts/wiki_from_trac/extractors.py  # __init__
        self.base_url = TracURLValidator().to_python(base_url)
      vim +419 /var/local/env-allura/lib/python2.7/site-packages/FormEncode-1.2.4-py2.7.egg/formencode/api.py  # to_python
        value = tp(value, state)
      vim +43  /home/dbrondsema/dbrondsema-1019/forge/ForgeImporters/forgeimporters/trac/__init__.py  # _to_python
        raise fev.Invalid(self.message('unavailable', state), value, state)
    Invalid: This project is unavailable for import
    
     
  • Cory Johns - 2014-02-05
    • status: in-progress --> code-review
     
  • Cory Johns - 2014-02-05

    Test fixes pushed to:
    tracwikiimporter:cj/7124

    Unchanged:
    allura:cj/7124
    sfx:cj/7124

     
  • Dave Brondsema

    Dave Brondsema - 2014-02-05
    • status: code-review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2014-02-05

    Got this when trying to run a trac wiki import:

    21:43:56,170 ERROR [taskd:forgeimporters.base.import_tool:52f2b099d8be352dc973da38:allura.model.monq_model] Error "global name 'urlopen' is not defined" on job <MonQTask 52f2b099d8be352dc973da38 (busy) P:10 forgeimporters.base.import_tool h1v1019.sb.sf.net pid 12628 project:/p/testit/ app:admin user:admin1>
    Traceback (most recent call last):
      File "/home/dbrondsema/dbrondsema-1019/forge/Allura/allura/model/monq_model.py", line 267, in __call__
        self.result = func(*self.args, **self.kwargs)
      File "/home/dbrondsema/dbrondsema-1019/forge/ForgeImporters/forgeimporters/base.py", line 131, in import_tool
        mount_point=mount_point, mount_label=mount_label, **kw)
      File "/nfs/home/dbrondsema/tracwikiimporter/tracwikiimporter/importer.py", line 116, in import_tool
        WikiExporter(trac_url, options).export(f)
      File "/nfs/home/dbrondsema/tracwikiimporter/tracwikiimporter/scripts/wiki_from_trac/extractors.py", line 103, in export
        for title in self.page_list():
      File "/nfs/home/dbrondsema/tracwikiimporter/tracwikiimporter/scripts/wiki_from_trac/extractors.py", line 131, in page_list
        r = self.fetch(url)
      File "/nfs/home/dbrondsema/tracwikiimporter/tracwikiimporter/scripts/wiki_from_trac/extractors.py", line 126, in fetch
        return urlopen(url)
    NameError: global name 'urlopen' is not defined
    
     
  • Cory Johns - 2014-02-06
    • status: in-progress --> code-review
     
  • Cory Johns - 2014-02-06

    Forgot the aliasing of that in:
    tracwikiimporter:cj/7124

     
  • Dave Brondsema

    Dave Brondsema - 2014-02-07

    Found one other issue, unhandled errors from requests if an invalid domain is entered, for example. This should handle it: https://sourceforge.net/p/allura/pastebin/52f4fee43e5e8346ccf15780 If that's good by you, I can commit & merge the branches (there's some rebasing and conflicts that I've got resolved locally already)

     
  • Cory Johns - 2014-02-07

    +1

     
  • Dave Brondsema

    Dave Brondsema - 2014-02-07
    • status: code-review --> closed
     
  • Dave Brondsema

    Dave Brondsema - 2015-01-05
    • Milestone: unreleased --> asf_release_1.2.0
     

Log in to post a comment.