These queries are consistently very slow:
{u'active': True,
u'desc': u'conn',
u'lockType': u'read',
u'ns': u'pyforge.mailbox',
u'numYields': 2481,
u'op': u'query',
u'opid': -5041171,
u'query': {u'queue': {u'$ne': []}, u'type': u'direct'},
u'secs_running': 10,
u'threadId': u'0x670df940',
u'waitingForLock': False}
Mon Oct 1 16:43:53 [conn4012152] query pyforge.mailbox query: { queue: { $ne: {} }, type: "direct" } nscanned:1343637 nreturned:3 reslen:1135 24720ms
A previous attempt was made to fix this query, but it actually made things worse when run with production-sized data and load. It was reverted in [e3fd1b]. I wonder if we may need an additional field so that we can do indexed queries on it properly.
Diff:
http://docs.mongodb.org/manual/faq/indexes/#using-ne-and-nin-in-a-query-is-slow-why
Why are we querying all mboxes w/ empty queues, only to do a follow-up
find_and_modifyfor each matching mbox, when we could just use awhile find_and_modifywhich would allow mongo to short-circuit after finding only a single match each iteration?The response on there is probably less relevant since we expect the number of empty queues to vastly out-number the ones with items in the queue, and we want to process everything with a non-empty queue anyway. Still, I think the suggestion of moving processed items into an "archive" collection makes sense and would give us the best performance.
We don't have a construct for doing
while find_and_modifydo we? The find_and_modify method only works on a single document.allura:db/5023
Post-push, need to run:
allurapaste script /var/local/config/production.ini ../scripts/migrations/029-set-mailbox-queue_empty.pyMore elegant query on db/5023 now
test