These queries are consistently very slow:
{u'active': True, u'desc': u'conn', u'lockType': u'read', u'ns': u'pyforge.mailbox', u'numYields': 2481, u'op': u'query', u'opid': -5041171, u'query': {u'queue': {u'$ne': []}, u'type': u'direct'}, u'secs_running': 10, u'threadId': u'0x670df940', u'waitingForLock': False} Mon Oct 1 16:43:53 [conn4012152] query pyforge.mailbox query: { queue: { $ne: {} }, type: "direct" } nscanned:1343637 nreturned:3 reslen:1135 24720ms
A previous attempt was made to fix this query, but it actually made things worse when run with production-sized data and load. It was reverted in [e3fd1b]. I wonder if we may need an additional field so that we can do indexed queries on it properly.
Diff:
http://docs.mongodb.org/manual/faq/indexes/#using-ne-and-nin-in-a-query-is-slow-why
Why are we querying all mboxes w/ empty queues, only to do a follow-up
find_and_modify
for each matching mbox, when we could just use awhile find_and_modify
which would allow mongo to short-circuit after finding only a single match each iteration?The response on there is probably less relevant since we expect the number of empty queues to vastly out-number the ones with items in the queue, and we want to process everything with a non-empty queue anyway. Still, I think the suggestion of moving processed items into an "archive" collection makes sense and would give us the best performance.
We don't have a construct for doing
while find_and_modify
do we? The find_and_modify method only works on a single document.allura:db/5023
Post-push, need to run:
allurapaste script /var/local/config/production.ini ../scripts/migrations/029-set-mailbox-queue_empty.py
More elegant query on db/5023 now
test