clone & reclone_repo tasks have a broad try/except and then send off the error in a
repo_clone_task_failed event. This is bad because the task ends up with a state of "complete" when it really failed, and the failure message is off in another task which is not directly discoverable or even associated with the original task.
repo_clone_task_failed event is needed so notifications can be sent. But we should consider re-raising the error so that original task is accurate. I don't know offhand if anything else needs to be changed as well.
This will also let us handle "Too many open files" errors in the taskd system (see [#5330]) and exit when that happens. What currently happens is that the taskd process goes into an infinite loop of claiming every available task but not running it (the task gets its state & process fields updated, but time_start isn't set) and then finally exits.
Log in to post a comment.