If a taskd instance is killed (not gracefully) then the monq_task document for it will be left in the 'busy' state incorrectly. Also, it occasionally happens that a taskd process gets completely stuck and doesn't finish its current task. It can stay in this state for many days, which is a waste of the process.
We should have a taskd cleanup command that queries the monq_task docs for all 'busy' processes assigned to the current hostname. For each one, do something like this (may need some tweaking):
- if the pid doesn't match one running on this host (e.g. run something like
pgrep -f '/paster taskd'
), then set the task doc's state to error. Put an explanation in the result field.
- for all the processes that are running, send a USR1 signal to the pid (this will log the current task to allura.log) and then watch that log (may have to wait a few seconds for it to appear) and see if the task id matches. Some tasks move quickly, so we need to make sure we don't assume a miss when the task moved quickly. If it is really not found, then also consider the 'busy' task to be killed previously so set its state to error and put an explanation in the result field.
- if the process does not log anything at all to allura.log for its current task, we consider it stuck. Have a commandline option to kill stuck tasks and update their state. Otherwise just report on finding a stuck proc & task
- you can change taskd to write the USR1 status output to an additional file besides allura.log if that is easier to watch for. The allura.log file can be very busy.
- print to stdout all the actions done, including proc & task details
Created #192: [#5127] Taskd maintenance command
Related
Tickets:
#5127I don't found any mention of allura.log in the configuration. On my development environment I just run
taskd
like that$ paster taskd development.ini > taskd.log
to get a separate log. Could you show me the way I can configure separatetaskd
log?Ah, I forget we use a slightly different configuration. In our ini file, the
[handler_console]
is set up like this:I think for this command, it will have to take the log file name as a parameter, since different Allura configurations may log to different locations. Also if you go with a separate log file solely for checking the taskd status, that'd need to be configurable too.
Closed #192. Branch 42cc_5127.
For this to work you need configure separate log for taskd status. See
[handler_taskdstatus]
and[logger_taskdstatus]
sections ofdevelopment.ini
.This is looking great. I like how the code is broken down into lots of little functions - perhaps as a necessity of doing the mocking for tests :) and the logging at each step.
One small concern is that I've seen taskd instances take many seconds to respond to a USR1 signal (not sure exactly the scenario to duplicate that though). So running tail -n1 right afterwards might miss it. One option would be to retry several times. Another option is to implement [#5328] (it's a new ticket, a new idea I found last week) and then you can check what each taskd is doing just by the process name rather than sending a USR1 signal. However, I guess that doesn't help with stuck jobs :( Have to stick with signal & logfile for that to be detected.
Related
Tickets:
#5328Created #227: [#5127] Taskd maintenance command improvements (1cp)
Related
Tickets:
#5127Closed #227
forge:42cc_5127a
Hi. I was doing fresh reinstall of Allura from newest master and this part of config results in ugly traceback, cause there is no such file on my filesystem and, pretending I'm clueless user, I was not told by docs to mkdir and set write permissions on /var/log/allura or adjust .ini before launching paster setup-app.
I can see that
[handler_stats]
uses local file path by default. Shouldn't[handler_taskdstatus]
mirror this behaviour?