#5733 Improve performance of Commit._diffs_copied

v1.3.2
closed
nobody
General
nobody
2015-08-11
2013-02-01
Cory Johns
No

Commit._diffs_copied() is used to determine if a removed blob was actually moved or renamed, possibly with some changes. However, it is called every time a commit is viewed and hits every file removed from a commit, and it is slow enough to be a problem.

Some ideas for optimizing it:

  • Short-circuit identical blob comparisons by comparing the blob hash first, as is done w/ trees
  • Use SequenceMatcher.real_quick_ratio() to get the upper-bound on the ratio to exclude obvious non-matches quickly, probably followed up with quick_ratio() and/or ratio() to confirm a match
  • Raise the DIFF_SIMILARITY_THRESHOLD and break after a single match instead of continuing to test all files (though this could give false matches, so maybe not do this one)
  • Exclude binary or particularly large blobs

Finally, we should almost certainly move this computation to compute_diffs() instead of doing it every time the commit's diffs are used.

Also, currently, children of removed (or the removed side of moved/renamed) trees are not included in the diff to avoid hitting this performance issue too often, which causes the added portion of moved/renamed trees to look like brand new files. Once the performance of _diffs_copied() is more reasonable and/or pre-computed, the removed trees short-circuit in compute_diffs() needs to be removed.

Related

Tickets: #5738
Tickets: #5783

Discussion

  • Cory Johns - 2013-02-01
    • labels: --> performance, scm
     
  • Dave Brondsema

    Dave Brondsema - 2014-04-01

    We should really consider getting diff info from the SCM directly instead of trying to do it ourselves. (see 'indexless' tickets)

     
  • Dave Brondsema

    Dave Brondsema - 2015-08-11
    • status: open --> closed
     
  • Dave Brondsema

    Dave Brondsema - 2015-08-11

    This was fixed (by using SCM directly instead of doing it ourselves) in [#7925]

     

    Related

    Tickets: #7925

  • Dave Brondsema

    Dave Brondsema - 2015-12-08
    • Milestone: unreleased --> v1.3.2
     

Log in to post a comment.