PUSH begins, sfs-web-a is updated but -b and -c still have old code
user retrieves HTML referencing static https://a.fsdn.com/allura/nf/* resources from sfs-web-a with a SERVER_NAME of s.fsdn.com
user agent requests static resources from CDN, which generates a request in turn to sourceforge.net
nginx rewrites the /allura/nf/ path to /nf/
request is routed to sfs-web-b or sfs-web-c, which have old code and do not recognize the path as a static resource
since the request is (assumed to be) for a dynamic resource, the https middleware is called which detects an https:// scheme but no SFUSER cookie, so it generates a 302 redirect to the same path with an http:// scheme (http://s.fsdn.com/nf/*)
there is nothing mapped to /nf/* on s.fsdn.com, so the connection is dropped.
This is effectively fixed by a CDN purge, but we need a better solution longer term. Current ideas are
bringing all the webheads down before bringing any up with the new code instead of the rolling restart we use during pushes today
recognizing requests that 'look like' static requests but for the timestamp and serving 404s instead of allowing the request to pass along to https middleware
moving the build key out of production.ini and into mongo, where it can be read by static middleware (makes a changed build key effectively atomic)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We have a fix on dev that recognizes requests that 'look like' static requests and disable the http/https redirect logic. This should mostly fix the problem (404s aren't cached long by the CDN) pending an update to production.ini tracked by https://control.sog.geek.net/sog/trac/ticket/17094 .
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If I reload the problem goes away...
status: closed --> in-progress
custom_field__size: --> 4
milestone: post-GA --> GA4
Here is what we believe to be happening:
This is effectively fixed by a CDN purge, but we need a better solution longer term. Current ideas are
We have a fix on dev that recognizes requests that 'look like' static requests and disable the http/https redirect logic. This should mostly fix the problem (404s aren't cached long by the CDN) pending an update to production.ini tracked by https://control.sog.geek.net/sog/trac/ticket/17094 .