Once the volume ramps up, we probably won't want to be doing timeline aggregations on demand if we can help it. Find good spots to fire off aggregations in the background using taskd, so that when an activitystream page is requested, the cached timeline can just be pulled from mongo w/o doing an aggregation.
For users, a good spot to do this might be on login. For projects, not sure...needs more thought.
Do we need to worry about two aggregations for the same node running at the same time?
Diff:
activitystream:tv/4397
allura:tv/4397
Before testing run:
You can tell if an on-demand aggregation happened (when viewing a timeline) by tailing
stats.log
and searching forcreate_timeline
calls. If there are none, the timeline was up-to-date and served directly from mongo.Looking pretty good. I thought
needs_aggregation
might be doing a lot of work just to return a bool, so I turned up timermiddleware to DEBUG level to see what kind of queries were going on. I noticed a number of queries containing'node_id': {'$in': []}
We should short-circuit that either at the storage layer or higher. Full capture from onecreate_timelines
task: https://sourceforge.net/p/allura/pastebin/52cb20e8c4d1042c11956f2fBack on
needs_aggregation
, thesince
param should keep the query relatively small, but you might as well addlimit=1
too, so it can quit after the first is found.To ssh://vansteenburgh@git.code.sf.net/p/activitystream/code
77abf95..3a30da4 tv/4397 -> tv/4397
To ssh://vansteenburgh@git.code.sf.net/p/activitystream/code
3a30da4..79a0fa1 tv/4397 -> tv/4397
Latest activitystream code changes removes the need for the mongo command to update pre-existing data.