#4927 Avoid dog-piling repo refreshes NEEDS PRE-CLEANUP

v1.0.0
closed
nobody
General
Cory Johns
2015-08-20
2012-09-13
No

If multiple refreshes for the same repo run, it can unnecessarily block the task queue, especially if the repo is very large.

The task can check the repo status and abort if it says it's refreshing, then put a while loop around the refresh to check for any new unrecognized commits. One thing to note is that it has to execute at least once, even if no new commits, for the case of the user-started full refresh

Related

Tickets: #4927

Discussion

  • Dave Brondsema

    Dave Brondsema - 2012-09-21
    • size: --> 2
     
  • Dave Brondsema

    Dave Brondsema - 2012-09-30

    Another case:

    Due to bug [#5022], thousands of curl requests to /auth/refresh_repo/p/projectname/code/ were issued. That resulted in thousands of tasks created like:

    { "_id" : ObjectId("50686e8971b75b10ebb20cb8"), "time_start" : null, "context" : { "project_id" : ObjectId("50682204b9363c76ee1d3aba"), "app_config_id" : ObjectId("50682204b9363c76ee1d3ae0"), "user_id" : null }, "time_stop" : null, "process" : null, "args" : [ [ "forgegit/model/git_repo/Repository#50682204b9363c76ee1d3ae6" ] ], "result_type" : "forget", "priority" : 10, "state" : "ready", "result" : null, "task_name" : "allura.tasks.index_tasks.add_artifacts", "kwargs" : { }, "time_queue" : ISODate("2012-09-30T16:08:41.579Z") }
    

    With the only variation being _id and time_queue

     

    Related

    Tickets: #5022

  • Dave Brondsema

    Dave Brondsema - 2012-10-02
    • labels: stability --> stability, 42cc
     
  • Igor Bondarenko - 2012-10-03

    Created #185: [#4927] Avoid multiple repo refreshes (3cp)

     

    Related

    Tickets: #4927

  • Igor Bondarenko - 2012-10-03
    • status: open --> in-progress
     
  • Igor Bondarenko - 2012-10-08

    then put a while loop around the refresh to check for any new unrecognized commits.

    What do you mean by this? Should we check for new commits when repo refresh is finished and run refresh again or what? Could you explain this in more detail, please?

     
  • Cory Johns - 2012-10-08

    Yes, that's more or less correct. The problem that we're having is that, if many changes are done to a repo in a short period or a refresh takes a long time to complete (which is relatively common), we'll end up with all the task workers trying to refresh a single repo on the same set of commits, thus blocking all other tasks.

    We want to add a check in the allura.tasks.repo_tasks.refresh() such that if the repo is currently in anything other than 'ready' state, the task will abort and not double-queue the refresh.

    However, if the refresh takes a while, new commits may have come in that have not been processed and for which the corresponding refresh task has already aborted. So the task needs to check for new, unknown commits (see allura.model.repo_refresh.unknown_commits()) and re-queue itself to pick up those missed changes.

    A while loop was just an initial thought; re-posting the task to the queue is a better implementation as it will allow other tasks to be processed during the break.

     
  • Igor Bondarenko - 2012-10-09

    Thanks for clarifying.

     
  • Igor Bondarenko - 2012-10-12

    Closed #185. Branch 42cc_4927.

     
  • Igor Bondarenko - 2012-10-12
    • status: in-progress --> code-review
     
  • Dave Brondsema

    Dave Brondsema - 2012-10-15
    • qa: Cory Johns
     
  • Cory Johns - 2012-10-15
    • summary: Avoid dog-piling repo refreshes --> Avoid dog-piling repo refreshes NEEDS PRE-CLEANUP
    • status: code-review --> validation
     
  • Cory Johns - 2012-10-15

    We have ~ 17 refresh tasks currently in "busy" state, most of which were orphaned by hard restarts of taskd workers and need to be cleaned up or they will block the corresponding repos from being refreshed.

    The command to find the "busy" refresh tasks is:

    db.monq_task.find({state: 'busy', task_name: 'allura.tasks.repo_tasks.refresh'}, {process: 1})
    

    These will need to be compared to what the given workers are actually working on (obtained by sending a USR1 signal to the worker and watching allura.log for the response) to clear out any that are not actually busy.

     
  • Dave Brondsema

    Dave Brondsema - 2012-10-17
    • status: validation --> closed
    • size: 2 --> 0
     

Log in to post a comment.