#5127 Taskd maintenance command

v1.0.0
closed
nobody
42cc (432)
General
2015-08-20
2012-10-16
No

If a taskd instance is killed (not gracefully) then the monq_task document for it will be left in the 'busy' state incorrectly. Also, it occasionally happens that a taskd process gets completely stuck and doesn't finish its current task. It can stay in this state for many days, which is a waste of the process.

We should have a taskd cleanup command that queries the monq_task docs for all 'busy' processes assigned to the current hostname. For each one, do something like this (may need some tweaking):

  • if the pid doesn't match one running on this host (e.g. run something like pgrep -f '/paster taskd'), then set the task doc's state to error. Put an explanation in the result field.
  • for all the processes that are running, send a USR1 signal to the pid (this will log the current task to allura.log) and then watch that log (may have to wait a few seconds for it to appear) and see if the task id matches. Some tasks move quickly, so we need to make sure we don't assume a miss when the task moved quickly. If it is really not found, then also consider the 'busy' task to be killed previously so set its state to error and put an explanation in the result field.
  • if the process does not log anything at all to allura.log for its current task, we consider it stuck. Have a commandline option to kill stuck tasks and update their state. Otherwise just report on finding a stuck proc & task
  • you can change taskd to write the USR1 status output to an additional file besides allura.log if that is easier to watch for. The allura.log file can be very busy.
  • print to stdout all the actions done, including proc & task details

Related

Tickets: #5127

Discussion

  • Igor Bondarenko

    Igor Bondarenko - 2012-10-17
    • status: open --> in-progress
     
  • Igor Bondarenko

    Igor Bondarenko - 2012-10-17

    Created #192: [#5127] Taskd maintenance command

     

    Related

    Tickets: #5127

  • Igor Bondarenko

    Igor Bondarenko - 2012-10-17

    you can change taskd to write the USR1 status output to an additional file besides allura.log if that is easier to watch for. The allura.log file can be very busy.

    I don't found any mention of allura.log in the configuration. On my development environment I just run taskd like that $ paster taskd development.ini > taskd.log to get a separate log. Could you show me the way I can configure separate taskd log?

     
  • Dave Brondsema

    Dave Brondsema - 2012-10-17

    Ah, I forget we use a slightly different configuration. In our ini file, the [handler_console] is set up like this:

    class = handlers.WatchedFileHandler
    args = ('/var/log/allura/allura.log', 'a') # make this be a logfile that is writable by the process
    level = NOTSET
    formatter = generic
    

    I think for this command, it will have to take the log file name as a parameter, since different Allura configurations may log to different locations. Also if you go with a separate log file solely for checking the taskd status, that'd need to be configurable too.

     
  • Igor Bondarenko

    Igor Bondarenko - 2012-10-19

    Closed #192. Branch 42cc_5127.

    For this to work you need configure separate log for taskd status. See [handler_taskdstatus] and [logger_taskdstatus] sections of development.ini.

     
  • Igor Bondarenko

    Igor Bondarenko - 2012-10-19
    • status: in-progress --> code-review
     
  • Dave Brondsema

    Dave Brondsema - 2012-10-30
    • qa: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2012-11-27
     
  • Dave Brondsema

    Dave Brondsema - 2012-11-27

    This is looking great. I like how the code is broken down into lots of little functions - perhaps as a necessity of doing the mocking for tests :) and the logging at each step.

    One small concern is that I've seen taskd instances take many seconds to respond to a USR1 signal (not sure exactly the scenario to duplicate that though). So running tail -n1 right afterwards might miss it. One option would be to retry several times. Another option is to implement [#5328] (it's a new ticket, a new idea I found last week) and then you can check what each taskd is doing just by the process name rather than sending a USR1 signal. However, I guess that doesn't help with stuck jobs :( Have to stick with signal & logfile for that to be detected.

     

    Related

    Tickets: #5328

  • Dave Brondsema

    Dave Brondsema - 2012-11-30
    • status: code-review --> in-progress
     
  • Igor Bondarenko

    Igor Bondarenko - 2012-12-03
    • status: in-progress --> code-review
     
  • Igor Bondarenko

    Igor Bondarenko - 2012-12-03

    Created #227: [#5127] Taskd maintenance command improvements (1cp)

     

    Related

    Tickets: #5127

  • Igor Bondarenko

    Igor Bondarenko - 2012-12-03
    • status: code-review --> in-progress
     
  • Igor Bondarenko

    Igor Bondarenko - 2012-12-05

    Closed #227

    forge:42cc_5127a

     
  • Igor Bondarenko

    Igor Bondarenko - 2012-12-05
    • status: in-progress --> code-review
     
  • Dave Brondsema

    Dave Brondsema - 2012-12-06
    • status: code-review --> validation
    • milestone: forge-backlog --> forge-dec-14
     
  • Peter Hartmann

    Peter Hartmann - 2012-12-13

    Hi. I was doing fresh reinstall of Allura from newest master and this part of config results in ugly traceback, cause there is no such file on my filesystem and, pretending I'm clueless user, I was not told by docs to mkdir and set write permissions on /var/log/allura or adjust .ini before launching paster setup-app.

    I can see that [handler_stats] uses local file path by default. Shouldn't [handler_taskdstatus] mirror this behaviour?

     
  • Dave Brondsema

    Dave Brondsema - 2012-12-13
    • status: validation --> closed
     

Log in to post a comment.