Many of these occur in the mongo slow query logs for allura. They come and go.
Mon Jan 28 16:03:03 [conn7317124] query pyforge.project cursorid:3744805240514658737 ntoreturn:1024 ntoskip:772096 nscanned:773121 keyUpdates:0 numYields: 1283 locks(micros) r:2941245 nreturned:1024 reslen:1519021 1552ms
Found details via mongo
db.currentOp()
That client IP maps to
sfs-alluradaemon-1
which has an instance ofpaster script /var/local/config/production.ini ../scripts/create-allura-sitemap.py
running, which I believe is the cause.allura-sitemap.py
usesutils.chunked_find
which uses skip & limit which requires a lot of scanning to do the skip. Better to use asort_key
and then if it is present chunked_find should use $gt instead of skip. (see sfpy's chunked_find for reference). Can we do this with the _id field for the key?Also, we should skip the User neighborhood.
Mongo "proof" of behavior:
Actually I changed my mind and think keeping the Users nbhd in the sitemap is fine. Those pages should get indexed too.
allura:db/5699
Today it ran from 10:00 UTC until 16:36. Tomorrow we can compare by looking at
ls /nfs/sitemaps/allura_sitemap/ -latr
timestampsSitemap script took about the same amount of time to run, but the slow query is gone from the mongo logs.