#7935 Forum importer for allura's own export format

v1.13.0
closed
Fahri
import (116)
General
2021-05-17
2015-07-21
No

Related

Git: 5e6c1d86913463000a44784e
Tickets: #8348

Discussion

  • Fahri - 2019-12-13

    Hey, I am currently working on this.

     
    • Dave Brondsema

      Dave Brondsema - 2019-12-15

      That is great! If you have any questions, feel free to ask and we'll see if we can help.

       
  • Dave Brondsema

    Dave Brondsema - 2020-01-17

    Hi Fahri,

    I saw your question on IRC about a forum export example, but then you disconnected. Here's a super simple one you can start with. I can give you more complex ones, with attachments, different users, etc, if you want. Also what problem are you getting with your own export? Does the taskd process log an error?

     
    • Fahri - 2020-01-18

      Hi Dave,

      Thanks for the example export of a discussion.

      Yes I do get an error. I get the following error after clicking the
      "Export"-Button at the admin section:

      Traceback (most recent call last):
      File "/home/fahri/env-allura/lib/python2.7/site-packages/tg/wsgiapp.py",
      line 120, in call
      response = self.wrapped_dispatch(controller, environ, context)
      File "/home/fahri/env-allura/lib/python2.7/site-packages/tg/wsgiapp.py",
      line 285, in _dispatch
      return controller(environ, context)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/tg/controllers/dispatcher.py",
      line 119, in call
      response = self._perform_call(context)
      File "/home/fahri/src/fahri-allurafork/Allura/allura/lib/base.py", line
      42, in _perform_call
      return super(WsgiDispatchController, self)._perform_call(context)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/tg/controllers/dispatcher.py",
      line 108, in _perform_call
      r = self._call(action, params, remainder=remainder, context=context)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/tg/controllers/decoratedcontroller.py",
      line 131, in _call
      output = controller_caller(context_config, bound_controller_callable,
      remainder, params)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/tg/decorators.py", line
      44, in _decorated_controller_caller
      return application_controller_caller(tg_config, controller, remainder,
      params)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/tg/configuration/app_config.py",
      line 127, in call_controller
      return controller(remainder, params)
      File
      "/home/fahri/src/fahri-allurafork/Allura/allura/ext/admin/admin_main.py",
      line 680, in export
      "$project": {"_id": 0, "total_size": {"$divide": ["$total_size",
      1000000]
      }}
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/timermiddleware/init.py",
      line 120, in wrapper
      return self.run_and_log(func, inst, *args,
      kwargs)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/timermiddleware/init.py",
      line 151, in run_and_log
      retval = func(
      args, kwargs)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/pymongo/collection.py",
      line 1397, in aggregate
      "aggregate", self.__name,
      command_kwargs)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/pymongo/database.py",
      line 345, in _command
      msg, allowable_errors)
      File
      "/home/fahri/env-allura/lib/python2.7/site-packages/pymongo/helpers.py",
      line 182, in _check_command_response
      raise OperationFailure(msg % errmsg, code, response)
      OperationFailure: command SON([('aggregate', u'attachment.files'),
      ('pipeline', [{'$match': {'_id': {'$in': []
      }}}, {'$group': {'total_size':
      {'$sum': '$length'}, '_id': 'total'}}, {'$project': {'total_size':
      {'$divide': ['$total_size', 1000000]}, '_id': 0}}])]) on namespace
      project-data.$cmd failed: The 'cursor' option is required, except for
      aggregate with the explain argument

       

      Last edit: Fahri 2020-01-18
  • Fahri - 2020-01-18

    Hi Dave,

    the current version of the discussion importer is able to import this example (This was the latest commit: https://forge-allura.apache.org/u/fahri/allurafork/ci/063507d5a23b611f3537f3434b4a7c172650b127/). I have added some screenshots as attachment.

    If I try to export my discussion at https://forge-allura.apache.org/u/fahri/discussion/ I receive the following E-Mail:

    The bulk export for project u/fahri is completed.

    The following tools were exported:
    - discussion

    Follow the instructions below to get your exported data.

    Sample instructions for u/fahri

    I would be glad if you send me more examples of a discussion export. Thanks in advance :)

     

    Last edit: Fahri 2020-01-18
  • Dave Brondsema

    Dave Brondsema - 2020-01-22

    Ok the error you got looks to be something that changed with mongodb 3.6 and we haven't updated for. I created ticket [#8348] for that. You could try a prior version of mongo, or work on code changes for [#8348] (if you do that, please make sure they are separate commits from import work, so it could be merged separately)

    Exports on https://forge-allura.apache.org aren't fully enabled. They run, but we don't have any way set up for users to access and download them.

    I'll get back to you again before long with more export examples. Exciting to see you got something working!

     

    Related

    Tickets: #8348

  • Dave Brondsema

    Dave Brondsema - 2020-01-29

    Here's another forum export, illustrating a number of different situations. Some things (like sticky & announcement posts) don't seem to be marked in export so guess you can't do anything about that during the import. Should be added to the export/API at some point.

     
  • Dave Brondsema

    Dave Brondsema - 2020-02-25

    If update to the latest 'master' code, that works with new versions of Mongodb now ([#8348] is fixed), so you can test your own exports if you want.

     

    Related

    Tickets: #8348

  • Fahri - 2020-02-26

    Thank you Dave!
    Unfortunately I have the next problem: If I try to clone or pull from my repository at forge allura I get this error:

    fahri@debian:~/git/fahri-allurafork$ git pull
    fatal: unable to access 'https://forge-allura.apache.org/git/u/fahri/allurafork/': The requested URL returned error: 500
    

    But it is still possible to clone the "master" code:
    git clone https://gitbox.apache.org/repos/asf/allura.git/ allura-git

     

    Last edit: Fahri 2020-02-26
    • Dave Brondsema

      Dave Brondsema - 2020-02-26

      Hi, sorry about that. I think its fixed now can you try again?

       
      • Fahri - 2020-02-27

        Yes, it works now. Thank you!

         
  • Dave Brondsema

    Dave Brondsema - 2020-03-04

    Hi,

    For the post with attachments, you don't necessarily need to reach the original URL. Forum exports may or may not include a copy of the attachments, in this case they did, so you can use "path": "discussion/5e32098a07ae3108f6c98c23/251636cb3f/be44/keepass.png", to locate the attachment file within the zip file.

    For the post with lots of replies, some of the comments should be nested replies. If you look at "slug": "0d1d/d421/f37d" for example in the json, that represents it is a nested child of "slug": "0d1d/d421" which is a nested child of "slug": "0d1d". If you can preserve the original slugs when creating the comments then I think that slug format should automatically make them be shown as nested.

    Otherwise looking pretty great!

    -Dave

     
    • Fahri - 2020-03-05

      Thanks for the suggestion! I have reworked the part where posts are added. I have also added a screenshot of the topic with lots of replies.
      But I have a question. Should it be possible to upload just a JSON File or should it be possible to upload an entire ZIP File? The current state of the importer only supports an upload of a JSON File.
      I have added screenshots of the upload mask as attachment.

      Edit: I have realized that nested posts weren't imported correctly: https://forge-allura.apache.org/p/allura/tickets/7935/#2dea

       

      Last edit: Fahri 2020-03-05
  • Fahri - 2020-03-05

    While creating tests for the controller I encountered another thing which I don't understand. I do use forge/test_tracker.py as an example for writing my tests. In the "test_index"-method of the "TestForgeTrackerImportController" class there is this path of a URL that I don't really understand: '/p/test/admin/bugs/_importer/'. How is this path assembled and from where does the "bugs" part come from?

        @with_tracker
        def test_index(self):
            r = self.app.get('/p/test/admin/bugs/_importer/')
            self.assertIsNotNone(r.html.find(attrs=dict(name="tickets_json")))
            self.assertIsNotNone(r.html.find(attrs=dict(name="mount_label")))
            self.assertIsNotNone(r.html.find(attrs=dict(name="mount_point")))
    

    I have used another path for my tests: "'/p/test/admin/ext/import/forge-discussion/'". When I use this path my tests pass. Is this also a valid solution?

    This is my test for the Controller:

    class TestForgeDiscussionController(TestController, TestCase):
    
         def setUp(self):
             super(TestForgeDiscussionController, self).setUp()
             from forgediscussion.forum_main import ForumAdminController
             ForumAdminController._importer = \                 discussion.ForgeDiscussionImportController(discussion.ForgeDiscussionImporter())
    
    
         @with_discussion
         def test_index(self):
             r = self.app.get('/p/test/admin/ext/import/forge-discussion/')
             self.assertIsNotNone(r.html.find(attrs=dict(name="discussions_json")))
             self.assertIsNotNone(r.html.find(attrs=dict(name="mount_label")))
             self.assertIsNotNone(r.html.find(attrs=dict(name="mount_point")))
    
    
         @with_discussion
         @mock.patch('forgeimporters.forge.discussion.save_importer_upload')
         @mock.patch('forgeimporters.base.import_tool')
         def test_create(self, import_tool, siu):
             project = M.Project.query.get(shortname='test')
             params = {
                 'discussions_json': webtest.Upload('discussions.json', b'{"key": "val"}'),
                 'mount_label': 'mylabel',
                 'mount_point': 'mymount',
             }
             r = self.app.post('/p/test/admin/ext/import/forge-discussion/create', params,
                               status=302)
             self.assertEqual(r.location, 'http://localhost/p/test/admin/')
             siu.assert_called_once_with(project, 'discussions.json', '{"key": "val"}')
             self.assertEqual(
                 'mymount', import_tool.post.call_args[1]['mount_point'])
             self.assertEqual(
                 'mylabel', import_tool.post.call_args[1]['mount_label'])
    
    
         @with_discussion
         @mock.patch('forgeimporters.forge.discussion.save_importer_upload')
         @mock.patch('forgeimporters.base.import_tool')
         def test_create_limit(self, import_tool, siu):
             project = M.Project.query.get(shortname='test')
             project.set_tool_data('ForgeDiscussionImporter', pending=1)
             ThreadLocalORMSession.flush_all()
             params = {
                 'discussions_json': webtest.Upload('discussions.json', b'{"key": "val"}'),
                 'mount_label': 'mylabel',
                 'mount_point': 'mymount',
             }
             r = self.app.post('/p/test/admin/ext/import/forge-discussion/create', params,
                               status=302).follow()
             self.assertIn('Please wait and try again', r)
             self.assertEqual(import_tool.post.call_count, 0)
    
     

    Last edit: Fahri 2020-03-06
  • Fahri - 2020-03-05

    Well, I have just realized that nested posts were still handled wrong by the importer. Here is a new screenshot of these posts which were imported by the new version of the importer.

     

    Last edit: Fahri 2020-03-05
  • Dave Brondsema

    Dave Brondsema - 2020-03-06
    • importing with a whole zip file would be nice to do, but could be a phase 2. It would take some additional work to be able to handle very large zip files, and existing importers don't handle them yet either. When it is added, it would be good to add it for discussions and tickets and share logic between them.
    • for tests, the /p/test project automatically is created. And @with_tracker creates a "bugs" ticket tracker on the /p/test project. Similarly, @with_discussion creates a "discussion" discussion tool. As for the _importer part of the URL, I can see in the setUp that it added in the setUp but I don't know offhand why it does it that way. Using the real URL like /p/test/admin/ext/import/forge-discussion definitely seems better to me. And then you don't need to do anything with _importer in the setUp
    • those nested posts look good now!

    Nice job so far, looks like you're getting quite close to having this all ready :D

     
  • Fahri - 2020-03-13

    Thank you for explaining this! I've just cleaned up my scripts so they match pep8 code style guidelines and made a merge request.

     

    Last edit: Fahri 2020-03-13
  • Dave Brondsema

    Dave Brondsema - 2020-03-20

    Hi, I haven't had a chance to look at this yet and it might take several days more as it is a big contribution. But I wanted to let you know I've seen it and will get to it when I can. Thanks for working on this!

     
    👍
    1
  • Dave Brondsema

    Dave Brondsema - 2020-03-20
    • status: open --> review
    • assigned_to: Fahri Korkmaz
    • Reviewer: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2020-04-13
    • status: review --> closed
     
  • Dave Brondsema

    Dave Brondsema - 2020-04-13

    Hi,

    After merging this I noticed a minor issue, but wanted to let you know and see if you were interested in doing a little bit more cleanup. PEP8 is a code style guideline for python and there are various tools to check and to enforce it. We're not super strict about it with Allura, but we like to follow the guidelines when possible. ForgeImporters/forgeimporters/tests/forge/test_discussion.py and ForgeImporters/forgeimporters/forge/discussion.py have quite a few places that don't follow PEP8 exactly. Do you want to see about tidying them up? Depending on your editor, you may have an option to automatically reformat the files. Or you can use a tool flake8 or autopep8. In all cases, please set the max line length to 119 (that is the setting we use for Allura, some tools may pick it up automatically but some might default to a smaller width can cause extra line wrapping which we don't like).

    Thanks!

     
    • Fahri - 2020-04-13

      Sure, I'll do that!

       
  • Dave Brondsema

    Dave Brondsema - 2021-05-17
    • Milestone: unreleased --> v1.13.0
     

Log in to post a comment.