See [#7828] for analysis of where DiffInfoDoc is used. Goal is to remove it altogether, using SCM data directly. Then also remove the building of DiffInfoDoc records during repo refresh. (If there are really slow computations we could keep using DiffInfoDoc for a cache of those results)
Closed #736, #737, #738.
git & svn:
allura:ib/7837
mercurial: https://sourceforge.net/p/forgehg/code/merge-requests/2/
After going through this for DiffInfoDoc, any new thoughts about how much work for other repo models? Or probably each is unique in how it is used, effort to make changes, architecture, performance, etc.
Yeah, I think each one is unique. My impression is
DiffInfoDoc
was the easiest, actually :) It was used only for getting list of files that changed in a commit. And others (likeCommitDoc
, for example) are used much more and sometimes depend on each other.The new form of
paged_diffs
which useslog
looks much better than the old logic inrefresh_commit_info
which usedinfo2
. However theClientError
s that the old implementation handled seem to still be something we need to address. I tried importing https://svn.code.sf.net/p/tango-ds/code/ since I knew it had those "non-existent in revision" errors. It is a big repo (took few hours to import and index); I'm sure you can find others that are smaller that exhibit it problem. If you run a refresh on master you can see the revisions it complains about. Then on your branch, go to the commit view for that revision + 1. It will show no changed files, but it should show some (compared to master). It does seem to know how many changes there are, because e.g. https://sf-dbrondsema-1015.sb.sf.net/p/project2/tango-ds/15818/ lists 3 pages (but no items on those pages).I dug into it a little bit and apparently
p['action'] == 'R'
so we need to figure out what to do with that. I'd also recommend adding this so anyClientError
that might happen does not get swallowed:except pysvn.ClientError: + log.info('Error getting paged_diffs log of %s on %s', commit_id, self._url, exc_info=True) return result
Closed #749. Updated
ib/7873
FYI your new branch is ib/7873 and previous was ib/7837 :)
I think sometimes 'R' is used when it's not really a replacement, but we can only work with the data we get. I think the 'R' might come from situations like this http://svn.haxx.se/users/archive-2005-06/0711.shtml particularly when tagging repositories since trunk/tags/branches checkouts might be separate on the machine of the person doing the tagging. Perhaps we could do some heuristics to see if the file existed in the previous revision, but lets not go down that path yet :) I think we can live with this.
I'm seeing an 'added' directory not showing up. Example: tango-ds r98. The paged_diffs method returns it
{'total': 3, 'removed': [], 'added': [u'/Servers/Acquisition/Ccd/ExampleServer/tags/Release_2_1'], 'changed': [u'/Servers/Acquisition/Ccd/ExampleServer/tags/Release_2_1/Makefile', u'/Servers/Acquisition/Ccd/ExampleServer/tags/Release_2_1/Pilatus.cpp']}
but the added folder does not show up in the template. (Perhaps some unrelated bug, but might as well fix now)I also note that SVN provides the copied from information (
p.copyfrom_path
) for that 'added' file. If that is present, can we use status 'copied' and set the.old
and.new
attributes so thatrepo/commit.html
will show it?Actually maybe the
copyfrom_path
would help us handle the "R" status better. Here's an example where files were copied to tag a new release. They come through as "R" but they do have "copy from" information. So it might work well have that take precedence over the R status, and thus they'd all be marked as 'copied'.Git can tell us about copied file info too, with the
--find-copies
and--find-copies-harder
options. However--find-copies-harder
can be expensive on large repos since it is looking through all the files. And just--find-copies
doesn't seem too useful on its own since it only finds files where the source was modified in the same changeset and that seems quite rare. I think we can add--find-copies-harder
at some future point with extra logic to make sure it performs well.Closed #750.
This time it's
ib/7837
again (force-push) :)