#7837 Use repo directly instead of DiffInfoDoc

v1.3.0
closed
General
2015-08-20
2015-02-20
No

See [#7828] for analysis of where DiffInfoDoc is used. Goal is to remove it altogether, using SCM data directly. Then also remove the building of DiffInfoDoc records during repo refresh. (If there are really slow computations we could keep using DiffInfoDoc for a cache of those results)

Related

Tickets: #7828

Discussion

  • Dave Brondsema

    Dave Brondsema - 2015-02-23
    • labels: sf-current, indexless --> sf-current, indexless, sf-8
     
  • Igor Bondarenko

    Igor Bondarenko - 2015-02-24
    • Owner: Anonymous --> Igor Bondarenko
    • Labels: sf-current, indexless, sf-8 --> 42cc, sf-8, sf-current, indexless
    • Status: open --> in-progress
     
  • Igor Bondarenko

    Igor Bondarenko - 2015-03-05
    • status: in-progress --> review
     
  • Dave Brondsema

    Dave Brondsema - 2015-03-05

    After going through this for DiffInfoDoc, any new thoughts about how much work for other repo models? Or probably each is unique in how it is used, effort to make changes, architecture, performance, etc.

     
    • Igor Bondarenko

      Igor Bondarenko - 2015-03-06

      Yeah, I think each one is unique. My impression is DiffInfoDoc was the easiest, actually :) It was used only for getting list of files that changed in a commit. And others (like CommitDoc, for example) are used much more and sometimes depend on each other.

       
  • Dave Brondsema

    Dave Brondsema - 2015-03-12
    • Reviewer: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2015-03-13
    • status: review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2015-03-13

    The new form of paged_diffs which uses log looks much better than the old logic in refresh_commit_info which used info2. However the ClientErrors that the old implementation handled seem to still be something we need to address. I tried importing https://svn.code.sf.net/p/tango-ds/code/ since I knew it had those "non-existent in revision" errors. It is a big repo (took few hours to import and index); I'm sure you can find others that are smaller that exhibit it problem. If you run a refresh on master you can see the revisions it complains about. Then on your branch, go to the commit view for that revision + 1. It will show no changed files, but it should show some (compared to master). It does seem to know how many changes there are, because e.g. https://sf-dbrondsema-1015.sb.sf.net/p/project2/tango-ds/15818/ lists 3 pages (but no items on those pages).

    I dug into it a little bit and apparently p['action'] == 'R' so we need to figure out what to do with that. I'd also recommend adding this so any ClientError that might happen does not get swallowed:

             except pysvn.ClientError:
    +            log.info('Error getting paged_diffs log of %s on %s', commit_id, self._url, exc_info=True)
                 return result
    
     
  • Igor Bondarenko

    Igor Bondarenko - 2015-03-16
    • status: in-progress --> review
     
  • Igor Bondarenko

    Igor Bondarenko - 2015-03-16

    Closed #749. Updated ib/7873

     
  • Dave Brondsema

    Dave Brondsema - 2015-03-16
    • status: review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2015-03-16

    FYI your new branch is ib/7873 and previous was ib/7837 :)

    I think sometimes 'R' is used when it's not really a replacement, but we can only work with the data we get. I think the 'R' might come from situations like this http://svn.haxx.se/users/archive-2005-06/0711.shtml particularly when tagging repositories since trunk/tags/branches checkouts might be separate on the machine of the person doing the tagging. Perhaps we could do some heuristics to see if the file existed in the previous revision, but lets not go down that path yet :) I think we can live with this.

    I'm seeing an 'added' directory not showing up. Example: tango-ds r98. The paged_diffs method returns it {'total': 3, 'removed': [], 'added': [u'/Servers/Acquisition/Ccd/ExampleServer/tags/Release_2_1'], 'changed': [u'/Servers/Acquisition/Ccd/ExampleServer/tags/Release_2_1/Makefile', u'/Servers/Acquisition/Ccd/ExampleServer/tags/Release_2_1/Pilatus.cpp']} but the added folder does not show up in the template. (Perhaps some unrelated bug, but might as well fix now)

    I also note that SVN provides the copied from information (p.copyfrom_path) for that 'added' file. If that is present, can we use status 'copied' and set the .old and .new attributes so that repo/commit.html will show it?

     
    • Dave Brondsema

      Dave Brondsema - 2015-03-16

      Actually maybe the copyfrom_path would help us handle the "R" status better. Here's an example where files were copied to tag a new release. They come through as "R" but they do have "copy from" information. So it might work well have that take precedence over the R status, and thus they'd all be marked as 'copied'.

      $ svn log -r361 file:///svn/p/project2/tango-ds -v
      ------------------------------------------------------------------------
      r361 | jblume | 2009-04-09 11:21:21 +0000 (Thu, 09 Apr 2009) | 1 line
      Changed paths:
         A /Servers/Motion/OmsVme58/tags/Release_1_2 (from /Servers/Motion/OmsVme58/trunk:359)
         R /Servers/Motion/OmsVme58/tags/Release_1_2/Makefile (from /Servers/Motion/OmsVme58/trunk/Makefile:360)
         R /Servers/Motion/OmsVme58/tags/Release_1_2/OmsVme58.cpp (from /Servers/Motion/OmsVme58/trunk/OmsVme58.cpp:360)
         R /Servers/Motion/OmsVme58/tags/Release_1_2/OmsVme58.h (from /Servers/Motion/OmsVme58/trunk/OmsVme58.h:360)
         R /Servers/Motion/OmsVme58/tags/Release_1_2/OmsVme58Class.cpp (from /Servers/Motion/OmsVme58/trunk/OmsVme58Class.cpp:360)
      
      Tagging the Release_1_2 of the OmsVme58 project.
      ------------------------------------------------------------------------
      
       
  • Dave Brondsema

    Dave Brondsema - 2015-03-16

    Git can tell us about copied file info too, with the --find-copies and --find-copies-harder options. However --find-copies-harder can be expensive on large repos since it is looking through all the files. And just --find-copies doesn't seem too useful on its own since it only finds files where the source was modified in the same changeset and that seems quite rare. I think we can add --find-copies-harder at some future point with extra logic to make sure it performs well.

     
  • Igor Bondarenko

    Igor Bondarenko - 2015-03-30
    • status: in-progress --> review
     
  • Igor Bondarenko

    Igor Bondarenko - 2015-03-30

    Closed #750.

    This time it's ib/7837 again (force-push) :)

     
  • Dave Brondsema

    Dave Brondsema - 2015-03-30
    • status: review --> closed
     
  • Dave Brondsema

    Dave Brondsema - 2015-04-06
    • labels: 42cc, sf-8, sf-current, indexless --> 42cc, sf-8, indexless
     
  • Igor Bondarenko

    Igor Bondarenko - 2015-06-18
    • Milestone: unreleased --> asf_release_1.3.0
     

Log in to post a comment.