Apache Allura™ / Tickets / #4397 Run timeline aggregations in the background with taskd

Dave Brondsema - 2012-06-15

size: --> 2
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2012-06-29

milestone: forge-jul-13 --> forge-jul-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Van Steenburgh - 2013-11-01

Milestone: forge-backlog --> forge-nov-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-11-01

Milestone: forge-nov-15 --> forge-backlog
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-12-03

Milestone: forge-backlog --> forge-dec-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Description has changed:

Diff:

--- old
+++ new
@@ -1,3 +1,6 @@
 Once the volume ramps up, we probably won't want to be doing timeline aggregations on demand if we can help it. Find good spots to fire off aggregations in the background using taskd, so that when an activitystream page is requested, the cached timeline can just be pulled from mongo w/o doing an aggregation.

 For users, a good spot to do this might be on login. For projects, not sure...needs more thought.
+
+Do we need to worry about two aggregations for the same node running at the same time?
+

Dave Brondsema - 2013-12-03

Size: 2 --> 4
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-12-04

Milestone: forge-dec-13 --> forge-dec-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Van Steenburgh - 2013-12-13

status: open --> in-progress

assigned_to: Tim Van Steenburgh
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Van Steenburgh - 2013-12-19

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Van Steenburgh - 2013-12-19

activitystream:tv/4397
allura:tv/4397

Before testing run:

mongo activitystream --eval 'db.nodes.update({}, {"$set": {"is_aggregating": false}}, false, true);'

You can tell if an on-demand aggregation happened (when viewing a timeline) by tailing stats.log and searching for create_timeline calls. If there are none, the timeline was up-to-date and served directly from mongo.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-02

Milestone: forge-dec-27 --> forge-jan-10
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-06

QA: Dave Brondsema
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-06

status: code-review --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-06

Looking pretty good. I thought needs_aggregation might be doing a lot of work just to return a bool, so I turned up timermiddleware to DEBUG level to see what kind of queries were going on. I noticed a number of queries containing 'node_id': {'$in': []} We should short-circuit that either at the storage layer or higher. Full capture from one create_timelines task: https://sourceforge.net/p/allura/pastebin/52cb20e8c4d1042c11956f2f

Back on needs_aggregation, the since param should keep the query relatively small, but you might as well add limit=1 too, so it can quit after the first is found.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-06

summary: Run timeline aggregations in the background with taskd --> Run timeline aggregations in the background with taskd NEEDS MONGO CMD
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Van Steenburgh - 2014-01-06

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Van Steenburgh - 2014-01-06

To ssh://vansteenburgh@git.code.sf.net/p/activitystream/code
77abf95..3a30da4 tv/4397 -> tv/4397

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-07

status: code-review --> in-progress
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

FAIL: forgeactivity.tests.functional.test_root:TestActivityController.test_background_aggregation
  vim +172 forgeactivity/tests/functional/test_root.py  # test_background_aggregation
    assert_equal(create_timeline.call_count, 0)
AssertionError: 2 != 0

Tim Van Steenburgh - 2014-01-08

To ssh://vansteenburgh@git.code.sf.net/p/activitystream/code
3a30da4..79a0fa1 tv/4397 -> tv/4397

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tim Van Steenburgh - 2014-01-08

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-08

status: code-review --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-08

summary: Run timeline aggregations in the background with taskd NEEDS MONGO CMD --> Run timeline aggregations in the background with taskd
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-01-08

Latest activitystream code changes removes the need for the mongo command to update pre-existing data.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Apache Allura™

Forge software for hosting software projects

Milestone

Searches

Help

#4397 Run timeline aggregations in the background with taskd

Discussion