#6595 Prevent spiders from requesting tarballs

v1.0.1
closed
General
2015-08-20
2013-08-22
No

The following are examples of spiders requesting tarball creation. This is unnecessary and a waste of resources. We should make it not possible. We already have rel=nofollow but that apparently isn't working. I think the best solution is to require the URL to be a POST.

"GET /p/z-i/code-0/208/tarball HTTP/1.0" 200 16400 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
"GET /p/jhotdraw/svn/729/tarball HTTP/1.0" 200 17834 "-" "msnbot/0.01 (+http://search.msn.com/msnbot.htm)"
"GET /p/fourpane/git4pane/ci/ec65df3a5ff2ec7be011c0722286e766c2b76d94/tarball HTTP/1.0" 200 18137 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)"
"GET /u/lluct/me722-cm/ci/0aa649648a00979ad6ca9e9d61df4e44eb694259/tarball?path=/external/clang HTTP/1.0" 200 17918 "-" "YisouSpider"

Discussion

  • Tim Van Steenburgh

    • status: open --> in-progress
    • assigned_to: Tim Van Steenburgh
     
  • Tim Van Steenburgh

    • status: in-progress --> blocked
     
  • Tim Van Steenburgh

    Will push when apache ldap problem resolved.

    forge:tv/6595
    forgehg:tv/6595

     
  • Tim Van Steenburgh

    • status: blocked --> code-review
     
  • Dave Brondsema

    Dave Brondsema - 2013-08-22
    • QA: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2013-08-22

    I wonder if we'd want to be a bit more nuanced. If I click on "Download Snapshot" and then am waiting for the zip to be generated, I might hit refresh, and then I get a page that says "405 Method Not Allowed". Would it be practical to have a GET request check for status only? If there were no snapshot ready or in-progress, we'd probably need a message & link to POST a new request.

    That would also allow people to share URLs (e.g. in an email or webpage) directly to the code snapshot page still.

     
  • Dave Brondsema

    Dave Brondsema - 2013-08-23
    • status: code-review --> in-progress
     
  • Dave Brondsema

    Dave Brondsema - 2013-08-23

    Merged tv/6595 for now, sending back for further improvement.

     
  • Dave Brondsema

    Dave Brondsema - 2013-08-23
    • Size: --> 2
     
  • Tim Van Steenburgh

    • status: in-progress --> code-review
     
  • Tim Van Steenburgh

    Made tarball controller handle GET and POST. Changes force-pushed to:

    forge:tv/6595
    forgehg:tv/6595

     
  • Dave Brondsema

    Dave Brondsema - 2013-08-26

    If I do a GET on a rev that has no tarball ever requested, it says "Checking snapshot status..." and does ajax checks which return 'na' over and over. We need some way let the user request the snapshot (put a POST form button right on that page?).

    A smaller initial delay is great, but // Check tarball status every 5 seconds should be removed since it's inaccurate now. The upper limit of 600,000ms seems pretty high too, might be good to drop that down while you're in there.

     
  • Dave Brondsema

    Dave Brondsema - 2013-08-26
    • status: code-review --> in-progress
     
  • Tim Van Steenburgh

    Changes pushed to forge:tv/6595

     
  • Tim Van Steenburgh

    • status: in-progress --> code-review
     
  • Dave Brondsema

    Dave Brondsema - 2013-08-28
    • status: code-review --> closed
     

Log in to post a comment.