{"thread": {"_id": "8fe1847f", "discussion_id": "545186e86d19cd63b88d1604", "subject": "", "limit": 25, "page": 0, "posts": [{"slug": "498b", "text": "- **labels**:  --> sf-2, sf-current\n", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-07-13 15:53:20.175000", "last_edited": null, "attachments": [], "is_meta": true}, {"slug": "bd3e", "text": "- **status**: in-progress --> review\n", "subject": "#7925  Speed up diff processing with binary files", "author": "heiths", "author_icon_url": null, "timestamp": "2015-07-16 19:10:50.153000", "last_edited": null, "attachments": [], "is_meta": true}, {"slug": "b8a7", "text": "QA: **hs/7925**\r\n\r\nBinary files should no longer make XHR requests for diff processing.", "subject": "#7925  Speed up diff processing with binary files", "author": "heiths", "author_icon_url": null, "timestamp": "2015-07-16 19:10:50.511000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "784a", "text": "- **labels**: sf-2, sf-current --> sf-2, sf-current, performance\n- **status**: review --> in-progress\n- **Reviewer**: Dave Brondsema\n", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-07-21 21:29:39.009000", "last_edited": null, "attachments": [], "is_meta": true}, {"slug": "efe5", "text": "I think we need to skip binaries sooner, not just on the display side.  Server-side time plus the background task to \"refresh\" the repo takes forever still.  Like inside the `_diffs_copied` method.\r\n\r\nAlso we can do better text detection, using the existing `has_html_view` method.  It checks several things to determine if its text.  Might want to rename the method, or alias it, though.", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-07-21 21:29:39.911000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "f3ef", "text": "- **status**: in-progress --> review\n", "subject": "#7925  Speed up diff processing with binary files", "author": "heiths", "author_icon_url": null, "timestamp": "2015-07-27 20:28:45.673000", "last_edited": null, "attachments": [], "is_meta": true}, {"slug": "0689", "text": "Good notes.  Based on your feedback I refactored paged_diffs to now rely on the SCM system.\r\n\r\nQA at: \r\n\r\nhs/7925\r\n&  \r\nhs/7925 on forgehg\r\n\r\n\r\nOther notes:\r\nGit has a few other interesting options for tweaking performance.  For example -- we could use a diff processing threshold when searching for copies.\r\n\r\nWe also could further improve the visual indicators when displaying copies vs renames etc (but that may be better in another ticket).", "subject": "#7925  Speed up diff processing with binary files", "author": "heiths", "author_icon_url": null, "timestamp": "2015-07-27 20:28:46.028000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "0689/8ba2", "text": "We could specify the max number for the -C option.  We could also make this configurable via the ini.\r\n\r\n\r\n**git diff-tree:**\r\n\r\n**-l***\u003Cnum>*\r\n\r\n*The -M and -C options require O(n^2) processing time where n is the number of potential rename/copy targets. This option prevents rename/copy detection from running if the number of rename/copy targets exceeds the specified number.*", "subject": "#7925  Speed up diff processing with binary files", "author": "heiths", "author_icon_url": null, "timestamp": "2015-07-28 15:59:52.970000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "2283", "text": "- **status**: review --> in-progress\n", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-07-30 22:31:23.928000", "last_edited": null, "attachments": [], "is_meta": true}, {"slug": "1f05", "text": "The results here are great.  Including the repo refresh backend logic.  But it is several changes and some quite big changes, and so naturally there's a good handful of tweaks needed to polish it up: \r\n\r\n#### general\r\n* Now the commit view doesn't show binary diffs, good.  But the table listing all the files has binary files linked up still and the links don't go anywhere.\r\n* Can you add a test for the `has_html_view` method's new functionality for fast binary detection?\r\n* \"refresh\" logic is fast now too, yay!\r\n* I guess this should be a separate ticket, but it'd be nice to sort by filename across all change types, instead of showing adds, then removes, etc.  Maybe same ticket as displaying copies vs renames better.\r\n* Down in the diff list, it says \"File was copied or renamed.\"  We should be able to say exactly which now.\r\n* A rename shows up as `{'new': u'README.txt', 'old': u'README', 'diff': '', 'ratio': 1}` in the diff section and also says `Can't load diff`\r\n    * Is it ok that we set diff to `''` in many places?\r\n\r\n#### hg & svn\r\n* The `[:]` slice would be better on the `for` loop than the `if` line right?\r\n\r\n#### hg\r\n* cleanup: move imports to top of file\r\n\r\n#### git\r\n* Testing with walrustech repo, in the 2nd commit, only the `Flan` dir shows up as having changes.  Nothing shown for `options.txt` or `bin/` or `mods/` but they did have changes.  You can see this with ?limit=1000.  And if you use the default limit, the pages at the end are all blank.\r\n* I think we don't want to use `--find-copies-harder`\r\n    * Performance wise on a big repo my timing measurement is 0m0.035s without it and 0m0.135s with it.  Noticable but not huge\r\n    * A bigger impact is the semantics of it.  It can make an incorrect association of files being \"copied\" if the contents are common contents.  A very good example of common contents is no content, an empty file.  I've found a diff that says one `__init__.py` file was copied to another, but really it's just a new file.  And another file that is new but has a lot of test boilerplate so git thinks its a 56% similar copy.  Thus I think we should drop `--find-copies-harder`\r\n* After doing a straight copy or rename in git and committing it, I get:\r\n\r\n~~~~\r\nFile '/home/dbrondsema/dbrondsema-1019/forge/ForgeGit/forgegit/model/git_repo.py', line 682 in paged_diffs\r\n  for i in xrange(0, result['total'] + 1, 2)]\r\nIndexError: list index out of range\r\n~~~~\r\n\r\n", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-07-30 22:31:24.351000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "5f3b", "text": "And this ticket will also resolve [#7918] too. (Although I think the `[:]` loop issue needs causing a minor bug still)", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-07-30 22:38:15.787000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "ce9a", "text": "These are great notes.\r\n\r\nI was on the fence about *--find-copies-harder*.  I ended up using it because my testing showed slightly better results when detecting copies, but I did not consider (or test for) false positives.", "subject": "#7925  Speed up diff processing with binary files", "author": "heiths", "author_icon_url": null, "timestamp": "2015-07-31 14:50:57.593000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "f4b5", "text": "- **status**: in-progress --> review\n- **Reviewer**: Dave Brondsema --> Heith Seewald\n", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-08-05 16:09:40.314000", "last_edited": null, "attachments": [], "is_meta": true}, {"slug": "4b30", "text": "Fixes on `db/7925` on allura and forgehg repos.  Followup ticket [#7949] for a few items.", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-08-05 16:09:40.957000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "7d62", "text": "- **labels**: sf-2, sf-current, performance --> sf-current, performance, sf-4\n- **status**: review --> closed\n", "subject": "#7925  Speed up diff processing with binary files", "author": "heiths", "author_icon_url": null, "timestamp": "2015-08-10 13:42:57.412000", "last_edited": null, "attachments": [], "is_meta": true}, {"slug": "8478", "text": "The changes you made looked really good and over all much cleaner.\r\n\r\nNice work!", "subject": "#7925  Speed up diff processing with binary files", "author": "heiths", "author_icon_url": null, "timestamp": "2015-08-10 13:42:58.220000", "last_edited": null, "attachments": [], "is_meta": false}, {"slug": "6f30", "text": "- **labels**: sf-current, performance, sf-4 --> performance, sf-4\n", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-08-10 14:27:58.594000", "last_edited": null, "attachments": [], "is_meta": true}, {"slug": "8094", "text": "- **Milestone**: unreleased --> v1.3.2\n", "subject": "#7925  Speed up diff processing with binary files", "author": "brondsem", "author_icon_url": "https://forge-allura.apache.org/u/brondsem/user_icon", "timestamp": "2015-12-08 16:30:41.405000", "last_edited": null, "attachments": [], "is_meta": false}]}}