#8352 Convert ApacheAccessHandler.py from mod_python to mod_wsgi

unreleased
open
nobody
py3 (9)
General
nobody
2021-02-18
2020-02-26
No

mod_wsgi is how we run the main app, mod_python is very old school and we shouldn't be using it.

Recent versions of Ubunut look like they drop support for mod_python anyway rather than supporting it on python3: https://bugs.launchpad.net/ubuntu/+source/libapache2-mod-python/+bug/1735368 Although mod_python does work with python 3, it'd just have to be built manually.

Discussion

  • Ingo

    Ingo - 2020-12-21

    @brondsem I didn't work too hard on it, yet. But I did some first tests, and I'd love to merge it with my modifications, which were necessary to run it on my Allura installation.

    Specialties, which I needed to implement in the current access handler:

    • Support for SAML. That means, that I needed to match the username of Allura, to the User ID (in my case it was the E-Mail address) of the IDP.
    • Renamed Repositories, to get them flat in one directory. As we heavily use SVN, and Apache doesn't support recursive directories for WebDAV (as you know ;) ), I renamed the repositories, so that the name gives already the full qualified path. The neighborhood, project and repo are then separated by comma, instead of slashes.

    I had the idea, that the refactored access-handler can support also my requirements.

    I would even propose, that the new handler is a fork, so that everyone can decide if he want's to stay with mod_python or switch to WSGI. Especially because both can't coexist in one Apache instance, and there might be reasons to stay with with mod_python for someone.

    What do you think?
    Should I investigate further, or are you on it already!?

     
  • Dave Brondsema

    Dave Brondsema - 2020-12-22

    Hi Ingo,

    The ApacheAccessHandler.py file has always been a bit rough in my opinion, so improving it and adding features would be great. I have not done anything yet to convert it to mod_wsgi (or python 3), so definitely your contributions would be welcome.

    SAML support sounds nice. I don't think we have any ticket about that, but it could probably be a nice config option that can help somebody else in the future too.

    Flattened SVN directory support would for sure be good. [#7940] is a related ticket.

    As for having a mod_python version of the script I think that would be OK if its not too much trouble. But I don't want us to do extra work maintaining 2 versions of a script. In my experience mod_python really isn't very popular any more and since the main Allura app is a wsgi app, I think having this script use wsgi too would be best. If someone really needs to stay with mod_python and we stop providing a script that works with it, they could always grab a copy of the old/current mod_python script and use that.

    So sounds like several things that can change with this script. Maybe best to do them one at a time, so we're not dealing with a big merge request that overhauls everything at once.

    Thanks,
    Dave

     

    Related

    Tickets: #7940

  • Ingo

    Ingo - 2021-01-09

    I played around with it more.

    The good thing:
    Looking at it in detail, I found, that the structure of the existing handler is not the worst, when we want to stay with the interface between the apache instance, and the allura instance itself. So I would not change this fundamentally at the moment.

    The bad thing:
    I didn't find a way to allow this optional anonymous access, how we do it with mod_python, for WSGI.

    Background:
    With mod_python we are able to return HTTP error codes. So we can "simply" return "unauthorized" to trigger the authorization by the browser. But in WSGI we can only implement the functions "check_password" and "allow_access", which can basically only return True / False.

    Has anyone a hint/idea, how to workaround that?

     
  • Dave Brondsema

    Dave Brondsema - 2021-01-11

    Hmm, I forgot this integrated so closely with apache, that it won't be a regular WSGI app. The auth functions you reference are these, right? https://modwsgi.readthedocs.io/en/develop/user-guides/access-control-mechanisms.html#apache-authentication-provider I haven't dealt with those before.

    What happens when you return true/false from allow_access? It seems like that woud be similar to returning unauthorized or not.

     
    • Ingo

      Ingo - 2021-01-12

      Yes, exactly those functions.

      "allow_acces" can only return OK or Forbidden. Unauthorized is not possible
      with WSGI, as far as I saw from the mod_wsgi source.

      I have no idea what the best solution will be.

      Maybe we should live with different URLs for anonymous and for authorized
      access? Because this would work with WSGI.

      Or should we use a reverse proxy, which is doing the authentication and
      permission checks!?

      Any more ideas? ;)

       
  • Ingo

    Ingo - 2021-02-04

    @brondsem Do you have any directions?
    The only solution, which I came up with, is the "two URL solution".
    For example:

    • "/svn/..." and "/git/..." are using allow_access() to check if the repo has anonymous access allowed
    • If it has no access allowed, it will forbid the access
    • "/svn/restricted/..." and "/git/restricted/..." are using check_password() to check the user login

    Now we can have two scenarios:

    • easiest: we configure two access URLs for the tools. One with login, one for anonymous
    • more complex: my reverse proxy is listening on the access denied for "/svn/..." and "/git/...", and redirects automatically.

    Is this the right track?

     

    Last edit: Ingo 2021-02-04
  • Dave Brondsema

    Dave Brondsema - 2021-02-08

    Have you tried WSGIAuthGroupScript yet? That seems to provide a way to list "groups" and then a "group" can be checked with a Require directive which is a normal httpd directive. And it seems both 401 and 403 statuses are options then. https://httpd.apache.org/docs/2.4/mod/mod_authz_core.html#authzsendforbiddenonfailure

    As for other ideas, the two URL solution obviously would work and be simple, but I feel like it is not very user-friendly and elegant. But maybe an option if the other choices are hard.

    The reverse-proxy idea sounds like it could work. Would we have to write a little proxy ourselves? Hopefully not. If we used an existing proxy (like an httpd module or wsgi package) it would have to be configurable enough that we could hook our access checks into it somehow? Looking around a little bit:

     
  • Ingo

    Ingo - 2021-02-12

    Hey Dave,
    either I didn't understand your proposed approach, or it doesn't work.

    WSGIAuthGroupScript has the same problem as WSGIAuthUserScript, it forces the user initially to login, because it relies on credentials. So when I configure one of those two methods, I am always initially prompted for a password.

    And I guess this is what we want to avoid. 😉

    I played around with the other Apache configurations, as I am not that confident with it, yet. And one solution which worked with two URLs, but without an additional proxy, could be:

        <LocationMatch "^/restricted/wsgi/">
            AuthType Basic
            AuthBasicAuthoritative off
            Require valid-user
            AuthBasicProvider wsgi
            WSGIAuthUserScript /ownforge/scripts/WSGIAuthUserScript.py
            [...]
        </LocationMatch>
        <LocationMatch "^/wsgi/">
            AuthType Basic
            AuthBasicAuthoritative off
            Require all granted
            AuthBasicProvider wsgi
            ErrorDocument 403 http://%{SERVER_NAME}:%{SERVER_PORT}/restricted/%{REQUEST_URI}
            WSGIAccessScript   /ownforge/scripts/WSGIAuthUserScript.py
            [...]
        </LocationMatch>
    

    The trick behind:
    Host based authorization works without a login, but you have access to the request URI to decide if an anonymous access is allowed.
    If it isn't, you return "forbidden", but redirect this outside (through ErrorDocument) to another location, which is then enforcing a login.

    What I didn't like, is the full URL in the config. But when using a local path (e.g.: /restricted/%{REQUEST_URI}), it didn't escape from the "LocationMatch", and therefore didn't work.

    But beside this aspect, I find that the config and script structure is comprehendable.

    Cheers,
    Ingo

     
  • Dave Brondsema

    Dave Brondsema - 2021-02-18

    My suggestions were based just on reading the documentation, I haven't actually tried anything. So I'm not too surprised that they didn't work out.

    I guess 2 URLs is the simplest way to make it work with mod_wsgi. I think simple is better than trying to deal with a proxy.

    Another idea though is what if we just kept using mod_python? If it is already working nicely, is it really beneficial to switch to mod_wsgi? I thought it would be good to switch, because mod_python is not popular any more and wsgi is more standard. But it really seems to be better in this case, and then we wouldn't have to make extra changes. At some point we'd want to run mod_python with python 3 and that would require compiling from source but it seems that it is maintained at https://github.com/grisha/mod_python/ and probably wouldn't be too hard to do in Docker.

     

Log in to post a comment.