#4947 Handle pymongo AutoReconnect during taskd restart

v1.0.0
closed
General
2015-08-20
2012-09-14
No

A signal to taskd (e.g. SIGHUP for graceful restart) will frequently cause an error if pymongo is currently doing something: http://pastie.org/private/mdiglqtzkcydtyfusja

AutoReconnect
Raised when a connection to the database is lost and an attempt to auto-reconnect will be made.

In order to auto-reconnect you must handle this exception, recognizing that the operation which caused it has not necessarily succeeded. Future operations will attempt to open a new connection to the database (and will continue to raise this exception until the first successful connection is made).


Could we implement an auto-retry option in ming? Probably with a setting that would only be enabled for taskd, initially.

Discussion

  • Dave Brondsema

    Dave Brondsema - 2012-09-17
    • summary: Handle pymongo AutoReconnect during pymongo restart --> Handle pymongo AutoReconnect during taskd restart
    • milestone: forge-backlog --> forge-oct-05
     
  • Dave Brondsema

    Dave Brondsema - 2012-09-17

    So far most of the exceptions we're seeing are on idle workers in their monq_task get() method which calls ming/pymongo's find_and_modify

     
  • Dave Brondsema

    Dave Brondsema - 2012-09-19

    Here's a different trace that we should handle too. They're not all cursor.next() calls in pymongo that raise the error. This is during an insert. We'll have to figure out a way to wrap all the necessary ming methods.

    File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/odmsession.py", line 58, in flush
      self.insert_now(obj, st)
    File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/base.py", line 29, in inner
      result = func(obj, *args, **kwargs)
    File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/odmsession.py", line 66, in insert_now
      mapper(obj).insert(obj, st, self, **kwargs)
    File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/base.py", line 29, in inner
      result = func(obj, *args, **kwargs)
    File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/mapper.py", line 55, in insert
      session.impl.insert(doc, validate=False)
    File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/session.py", line 20, in wrapper
      return func(self, doc, *args, **kwargs)
    File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/session.py", line 137, in insert
      bson = self._impl(doc).insert(data, safe=kwargs.get('safe', True))
    File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 306, in insert
      continue_on_error, self.__uuid_subtype), safe)
    File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/connection.py", line 748, in _send_message
      raise AutoReconnect(str(e))
    AutoReconnect: [Errno 4] Interrupted system call
    
     
  • Dave Brondsema

    Dave Brondsema - 2012-09-21
    • size: --> 4
     
  • Cory Johns - 2012-09-25
    • status: open --> in-progress
    • assigned_to: Cory Johns
     
  • Dave Brondsema

    Dave Brondsema - 2012-09-25

    find_and_modify example

    21:01:41,183 ERROR [allura.command] taskd error [Errno 4] Interrupted system call
    Traceback (most recent call last):
      File "/var/local/allura/Allura/allura/command/taskd.py", line 93, in worker
        exclude=exclude)
      File "/var/local/allura/Allura/allura/model/monq_model.py", line 174, in get
        sort=sort)
      File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/mapper.py", line 269, in inner
        return method(self.mapped_class, *args, **kwargs)
      File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/odm/odmsession.py", line 110, in find_and_modify
        obj = self.impl.find_and_modify(m.collection, *args, **kwargs)
      File "/var/local/env-allura/lib/python2.7/site-packages/Ming-0.3.2dev_20120912-py2.7.egg/ming/session.py", line 104, in find_and_modify
        bson = self._impl(cls).find_and_modify(**options)
      File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 1168, in find_and_modify
        **kwargs)
      File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/database.py", line 350, in command
        result = self["$cmd"].find_one(command, **extra_opts)
      File "/var/local/env-allura/lib/python2.7/site-packages/TimerMiddleware-0.2.1-py2.7.egg/timermiddleware/__init__.py", line 102, in wrapper
        return self.run_and_log(func, inst, *args, **kwargs)
      File "/var/local/env-allura/lib/python2.7/site-packages/TimerMiddleware-0.2.1-py2.7.egg/timermiddleware/__init__.py", line 111, in run_and_log
        return func(*args, **kwargs)
      File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 514, in find_one
        for result in self.find(spec_or_id, *args, **kwargs).limit(-1):
      File "/var/local/env-allura/lib/python2.7/site-packages/TimerMiddleware-0.2.1-py2.7.egg/timermiddleware/__init__.py", line 102, in wrapper
        return self.run_and_log(func, inst, *args, **kwargs)
      File "/var/local/env-allura/lib/python2.7/site-packages/TimerMiddleware-0.2.1-py2.7.egg/timermiddleware/__init__.py", line 111, in run_and_log
        return func(*args, **kwargs)
      File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/cursor.py", line 749, in next
        if len(self.__data) or self._refresh():
      File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/cursor.py", line 700, in _refresh
        self.__uuid_subtype))
      File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/cursor.py", line 638, in __send_message
        **kwargs)
      File "/var/local/env-allura/lib/python2.7/site-packages/pymongo-2.2.1-py2.7-linux-x86_64.egg/pymongo/connection.py", line 811, in _send_message_with_response
        raise AutoReconnect(str(e))
    AutoReconnect: [Errno 4] Interrupted system call
    
     
  • Cory Johns - 2012-09-25

    From the documentation here and here, and the discussion here, it seems that calling signal.signal() is overriding the default behavior of transparently restarting system calls and instead setting it to interrupt.

    I've added the calls to signal.siginterrupt() in allura:cj/4947 and, while it is impossible to test properly, I have verified that the graceful handlers still work and was not able to reproduce the AutoReconnect error after a few minutes of trying, while I was getting them occasionally before the change. At the very least, it doesn't seem to hurt the handlers and since it is at the system level, it seems like it should avoid the thorny issues with mongo and attempting to retry non-idempotent operations.

    It might warrant further discussion, however, so leaving this ticket open for the moment.

     
  • Cory Johns - 2012-09-25
    • status: in-progress --> code-review
    • qa: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2012-09-26
    • status: code-review --> validation
     
  • Dave Brondsema

    Dave Brondsema - 2012-09-27
    • status: validation --> closed
     

Log in to post a comment.