I saw that git pages on forge-allura.apache.org were slow, so I looked at stats.log and saw that the sidebar was the slowest part. I did some additional digging and found 2 specific areas for improvement:
git_main.py
change default_branch_name
to a @LazyProperty
since it is called many times inside a loop in RepositoryApp.sidebar_menu
sidebar_menu
only requests a certain number of branches, pass that "limit" all the way through to git_repo.py
's branches
method so that is_valid()
is only called a minimum number of times needed.In addition to those changes, generalize and apply the same approach to the tags. And also check ForgeHg to see if mercurial can benefit the same way.
Last edit: Heith Seewald 2015-05-05
Review at: hs/7873
TTFB was significantly reduced by removing the logic to locate the '
default_branch_name
'.Time complexity for displaying the '
master
' branch went from O(n) to O(1).It turns out that the issue was unique to the branches processing --
tags
andForgeHG
are unaffected.Last edit: Heith Seewald 2015-05-05
Found one small issue. If default branch is in first 10 branches returned by
self.repo.get_branches()
you'll see two buttons for it in the sidebar (this isn't true when you have only one branch)And a test failure:
Also, I don't sure there are any improvements on performance. I've imported allura repo itself for testing. I'm looking at
stats.log
and don't see any improvement insidebar
timings.On master (timings fluctuate around 4200 in most cases):
On a branch (sometimes even slightly worse, seems like it randomly spikes. 4000 then almost ~6700):
(Hitting
/p/allura/allura/ci/master/tree/
page)Could you describe how did you measure it? Maybe I'm missing something?
These are great notes.
My performance improvements were based on the "time to first byte" from the chrome dev tools. I attached two files of the before and after based on my tests (using ~200 branches).
TTFB, while not the best metric for gauging specific functions, it does give an indication of the perception of latency.
I'm going to re-run the tests and check out the stats.log.
How many branches were your performance tests run with?
I tested with 1224 branches and 588 tags.
I've looked into TTFB, for me it's ~7.6 sec and stable across page reloads on master, and on the branch it spiking just like stats: ~7sec, then ~13sec, then ~8.
It's strange that I don't see even tiny performance improvement, in your case it's almost 5 times faster. Could you try it with more branches and maybe some tags too?
You were right Igor, when adding thousands of branches/tags, things became tricky.
The number of branches and tags were high enough that their initial loading was killing the performance. Due to the nature of git and symbolic references, it was necessary to loop through each tag/branch like we had been doing... But by removing the validation for each branch, the number of calls can be significantly reduced.
I attached two images of the cprofile breakdown of this loop
Notice the cumtime is significantly higher when checking validation. Given the probability of a symbolic ref existing that doesn't point to a commit, this seems like a more than fair trade-off. Even if it does occur, it would have no major impact.
What do you think?
I can't see the attachments. Did you forget to add those? :)
Haha, looks like it, Here you go.
Ok, I see some timing improvements. TTFB mean: 9.7s -> 8.3s. It is not very much, but a lot of time takes other stuff. E.g. if I change
RepositoryApp.sidebar_menu
to always return empty list, TTFB is still pretty high (6-7s on a branch and 7-9 on a master).That said,
b.commit.hexsha
can raise an error if ref is not valid, so we should handle it gracefully for the (hopefully rare) cases like this.or
You can reproduce it like this:
and then refresh the page.
Last edit: Igor Bondarenko 2015-06-09
Yeah, I see what you mean. My thought was that if they had a mismatched sha, then there are probably bigger data integrity problems to deal with. But you're right -- we should be handle it to be safe.
I tried catching the BadObject exception in a few places -- but quickly realized that the existing location is the best place for validation (inside the branch/tags list).
So here's another angle:
What if we used git fsck to check for invalid refs on repo change (via hooks maybe)? A single check on when something changes would be much more performant than checking on every page view... the only question would be how to handle the invalid refs.
What are your thoughts on that?
I thought we can expand list comprehension in the
branches/tags
to be a full loop and wrapObject
construction in thetry..except
block and that it will be cheaper than callingis_valid
on every branch/tag, since exception handling logic will run only for repos with integrity problems, which is fairly rare. I'm not sure, though.We can also cache results of branches/tags and update cache on repo change with proper
is_valid
checks, but we're moving from keeping repo metadata in mongo to direct access, so don't sure if it worth it.git fsck
approach looks like overkill to me, we'll need to parse it's output to determine what's wrong and also save the results somewhere to skip invalid branches. Correct me if I'm mistaken.So the Object construction in the branches/tags context is actually just a ming Object (aka a dict with object-like attr access). And the 'is_valid' is just doing a simple try/except itself -- so I don't think expanding the list comprehension for our own try/except be any cheaper unless...
We use git fsck as the try/except and only check is_valid for the very rare cases that it finds an issue. I attached a profile of running
self._git.git.fsck(full=True)
and it's very cheap even with 1000s of branches.This looks like it's a pretty good trade-off between handling invalid refs and performance. It would have the added benefit of informing us of issues instead of just silently returning valid results.
I pushed an optimized version of this idea if you want to take a look. hs/7873
The
fsck
operation checks all objects in a repo, so I think it's going to scale with the size of the repo overall, not just the number of branches. It seems overkill to me too, with the potential for being slow on big repos. (I also wonder howfsck
could be faster than theis_valid
checks since they'd both be going through and checking each ref)I think a try/catch in the loop would be worth trying. I'd say change it from a list comprehension to a regular loop so you can put the try/catch around just the
ref.commit.hexsha
bit that triggers the exception).Igor's suggestion of caching the results is also interesting. As he says, we're trying to do direct repo access rather than caching in mongo, but it would be much faster. I'm ok with using mongo storage if we treat it as a smart cache, not as a definitive datastore that must be populated by the "repo refresh" logic before repo webpages are usable. For example: the
branches
property would know how to get the info from the git repo and would cache the result in mongo and use that cache. Cache invalidation would happen whenever new changes are pushed. That'd be in the "repo refresh" functions, but would be fast and simple.Checkout these simple timings:
Of the time it took to get the list with
self._get_refs(self._git.branches)
,fsck only accounted for: 0.121250049591 -- In that time, it checked 256 directories and 3498 objects (the result of which is cached via lazyprop so tags can make use of it too).
The only time the fsck method will be slower is the case where there is an actual error -- then it's the sum of both fsck and is_valid times. But that is supposedly an edge case anyways, right?
If you guys think it'll cause issues down the road, I have no problem going another route, but it seems to scale better than our existing implementation.
Also, I really like the idea of using mongo as a 'smart cache' and invalidating via repo_refresh!
No is_valid/fsck would be even faster too right?
I am still skeptical about fsck. Can you time it on a large repo? The Allura repo might be a good start but there are much much bigger ones out there too.
In principle I don't like doing extra work to handle a corner case when try/except could too. But if it is really fast all the time it could be ok.
I am still skeptical about fsck. Can you time it on a large repo? The
Allura repo might be a good start but there are much much bigger ones out
there too.
In principle I don't like doing extra work to handle a corner case when
try/except could too. But if it is fast all the time it could be ok.
--
Dave Brondsema
Principal Software Engineer - sourceforge.net
Implemented a cache system that is cleared on repo_refresh. Much faster overall. Thanks for the tips :)