#2040 git is hanging and we need some debug code to figure out where

v1.0.0
closed
sf-4 (350)
General
nobody
2015-08-20
2011-04-28
Wolf
No

burley
/p/lauge/
so -- all the wsgi instances are in use and occupied by git stuff for those 2 projects
let me strace a few of the processes and see i fthey are all stuck on a poll on a pipe
(why can't it be a pole on a pipe)

12:11
Dave
lauge looks like a typical size, perhaps a hundred commits

12:11
burley
all are stuck on a poll
all are stuck on a pull on a pipe
poll on a pipe
trying to think of what else we can learn before kicking it

...

burley
unless anyone else has other ideas -- I think we just keep an eye for more of this -- and I request that you guys add some debugging code to the git module to help us figure out where its failing if that's OK?

12:26
Wolf
sounds right ot me

12:27
burley
any chance the debug foo could be added soon ( like by early next week )?
I'll ponder ways to perhaps actively monitor for this

12:28
Wolf
yes

12:28
burley
thx -- I'll leave that in your hands then and just inform SOG of what to look for
and hopefully we can go to the next step when we get some debug info
ticketing this up SOG side and then gonna kick this head so it works again
so a few more minutes for anyone to request more poking till I unwedge us :)

Discussion

  • Wolf - 2011-04-28

    so the real problem is probably in gitdb, but we need debugging foo locally to find it

     
  • Wolf - 2011-04-28
     
  • Dave Brondsema

    Dave Brondsema - 2011-04-28
    • size: --> 4
     
  • Dave Brondsema

    Dave Brondsema - 2011-04-28

    Idea from rick: tell mod_wsgi to kill the proc after N seconds. nginx times out after 30, iirc

     
  • Dave Brondsema

    Dave Brondsema - 2011-04-28

    last insanehumaninv request was several hours before the outage:

    172.29.41.1 - - [28/Apr/2011:12:15:07 +0000] "GET /p/insanehumaninv/code/ci/cd039cccdb9f19640f5a4eb2a76e420b6c78bf0d/tree/ HTTP/1.1" 200 14394 "http://sourceforge.net/p/insanehumaninv/home/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16"
    

    lauge requests went right up until the outage. last successful one was

    172.29.41.1 - - [28/Apr/2011:15:46:53 +0000] "GET /p/lauge/code/ci/1a3f0c86e2c22fa3e3ab3b53d716725220f55d0e/tree/Media/Eye_closed_disabled_24x24.png?format=raw HTTP/1.1" 200 0 "http://sourceforge.net/p/lauge/code/ci/1a3f0c86e2c22fa3e3ab3b53d716725220f55d0e/tree/Media/Eye_closed_disabled_24x24.png" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 ( .NET CLR 3.5.30729; .NET4.0E)"
    

    first failures were:

    172.29.41.1 - - [28/Apr/2011:15:45:17 +0000] "GET /p/lauge/code/ci/1a3f0c86e2c22fa3e3ab3b53d716725220f55d0e/tree/ReleaseBuild/PathInno.txt?format=raw HTTP/1.1" 504 3981 "http://sourceforge.net/p/lauge/code/ci/1a3f0c86e2c22fa3e3ab3b53d716725220f55d0e/tree/ReleaseBuild/PathInno.txt" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 ( .NET CLR 3.5.30729; .NET4.0E)"
    172.29.41.1 - - [28/Apr/2011:15:45:30 +0000] "GET /p/lauge/code/ci/1a3f0c86e2c22fa3e3ab3b53d716725220f55d0e/tree/ReleaseBuild/PathSevenZip.txt?format=raw HTTP/1.1" 504 3981 "http://sourceforge.net/p/lauge/code/ci/1a3f0c86e2c22fa3e3ab3b53d716725220f55d0e/tree/ReleaseBuild/PathSevenZip.txt" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 ( .NET CLR 3.5.30729; .NET4.0E)"
    
     
  • Dave Brondsema

    Dave Brondsema - 2011-04-28

    And nothing seems unusuable about those 3 lauge files.

     
  • Rick Copeland - 2011-05-03
    • status: open --> blocked
    • assigned_to: Rick Copéland
     
  • Dave Brondsema

    Dave Brondsema - 2011-05-06
    • status: blocked --> in-progress
    • assigned_to: Rick Copéland --> Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2011-05-06

    forge:db/2040

    • make a file that ends without an \n
      • or use the above repo as an example
    • reading the git stream to show a diff should not go into an infinite loop
     
  • Dave Brondsema

    Dave Brondsema - 2011-05-06
    • status: in-progress --> code-review
    • assigned_to: Dave Brondsema --> Rick Copéland
     
  • Rick Copeland - 2011-05-09

    Looks good. Merged to dev.

     
  • Rick Copeland - 2011-05-09
    • status: code-review --> closed
     

Log in to post a comment.