Apache Allura™ / Tickets / #4947 Handle pymongo AutoReconnect during taskd restart

Dave Brondsema - 2012-09-17

summary: Handle pymongo AutoReconnect during pymongo restart --> Handle pymongo AutoReconnect during taskd restart

milestone: forge-backlog --> forge-oct-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2012-09-17

So far most of the exceptions we're seeing are on idle workers in their monq_task get() method which calls ming/pymongo's find_and_modify

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Here's a different trace that we should handle too. They're not all cursor.next() calls in pymongo that raise the error. This is during an insert. We'll have to figure out a way to wrap all the necessary ming methods.

File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/odmsession.py", line 58, in flush
  self.insert_now(obj, st)
File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/base.py", line 29, in inner
  result = func(obj, *args, **kwargs)
File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/odmsession.py", line 66, in insert_now
  mapper(obj).insert(obj, st, self, **kwargs)
File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/base.py", line 29, in inner
  result = func(obj, *args, **kwargs)
File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/mapper.py", line 55, in insert
  session.impl.insert(doc, validate=False)
File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/session.py", line 20, in wrapper
  return func(self, doc, *args, **kwargs)
File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/session.py", line 137, in insert
  bson = self._impl(doc).insert(data, safe=kwargs.get('safe', True))
File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 306, in insert
  continue_on_error, self.__uuid_subtype), safe)
File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/connection.py", line 748, in _send_message
  raise AutoReconnect(str(e))
AutoReconnect: [Errno 4] Interrupted system call

Dave Brondsema - 2012-09-21

size: --> 4
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2012-09-25

status: open --> in-progress

assigned_to: Cory Johns
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

find_and_modify example

21:01:41,183 ERROR [allura.command] taskd error [Errno 4] Interrupted system call
Traceback (most recent call last):
  File "/var/local/allura/Allura/allura/command/taskd.py", line 93, in worker
    exclude=exclude)
  File "/var/local/allura/Allura/allura/model/monq_model.py", line 174, in get
    sort=sort)
  File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/mapper.py", line 269, in inner
    return method(self.mapped_class, *args, **kwargs)
  File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/odmsession.py", line 110, in find_and_modify
    obj = self.impl.find_and_modify(m.collection, *args, **kwargs)
  File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/session.py", line 104, in find_and_modify
    bson = self._impl(cls).find_and_modify(**options)
  File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 1168, in find_and_modify
    **kwargs)
  File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/database.py", line 350, in command
    result = self["$cmd"].find_one(command, **extra_opts)
  File "/var/local/env-allura/lib/python2.7/site-packages/TimerMiddleware-0.2.1-py2.7.egg/timermiddleware/__init__.py", line 102, in wrapper
    return self.run_and_log(func, inst, *args, **kwargs)
  File "/var/local/env-allura/lib/python2.7/site-packages/TimerMiddleware-0.2.1-py2.7.egg/timermiddleware/__init__.py", line 111, in run_and_log
    return func(*args, **kwargs)
  File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 514, in find_one
    for result in self.find(spec_or_id, *args, **kwargs).limit(-1):
  File "/var/local/env-allura/lib/python2.7/site-packages/TimerMiddleware-0.2.1-py2.7.egg/timermiddleware/__init__.py", line 102, in wrapper
    return self.run_and_log(func, inst, *args, **kwargs)
  File "/var/local/env-allura/lib/python2.7/site-packages/TimerMiddleware-0.2.1-py2.7.egg/timermiddleware/__init__.py", line 111, in run_and_log
    return func(*args, **kwargs)
  File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/cursor.py", line 749, in next
    if len(self.__data) or self._refresh():
  File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/cursor.py", line 700, in _refresh
    self.__uuid_subtype))
  File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/cursor.py", line 638, in __send_message
    **kwargs)
  File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/connection.py", line 811, in _send_message_with_response
    raise AutoReconnect(str(e))
AutoReconnect: [Errno 4] Interrupted system call

Cory Johns - 2012-09-25

From the documentation here and here, and the discussion here, it seems that calling signal.signal() is overriding the default behavior of transparently restarting system calls and instead setting it to interrupt.

I've added the calls to signal.siginterrupt() in allura:cj/4947 and, while it is impossible to test properly, I have verified that the graceful handlers still work and was not able to reproduce the AutoReconnect error after a few minutes of trying, while I was getting them occasionally before the change. At the very least, it doesn't seem to hurt the handlers and since it is at the system level, it seems like it should avoid the thorny issues with mongo and attempting to retry non-idempotent operations.

It might warrant further discussion, however, so leaving this ticket open for the moment.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2012-09-25

status: in-progress --> code-review

qa: Dave Brondsema
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2012-09-26

status: code-review --> validation
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2012-09-27

status: validation --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Apache Allura™

Forge software for hosting software projects

Milestone

Searches

Help

#4947 Handle pymongo AutoReconnect during taskd restart

Discussion