#7527 Email address associations need better user associations NEEDS MONGO MIGRATION

v1.2.0
closed
sf-2 (994)
General
2015-08-20
2014-07-02
No

A user can claim any address and even if they don't verify it, that still blocks someone else from trying to claim it. This can be fixed in auth.py like:

-                if M.EmailAddress.query.get(_id=new_addr['addr']):
+                if M.EmailAddress.query.get(_id=new_addr['addr'], confirmed=True):

However this leads to another problem, if multiple users have the same email address claimed (but not verified). One user sees a "verify" link, but the other sees "Unknown addr obj foo@bar.com", on the preferences page.

There probably are more issues when it gets to verification, too.

Related

Tickets: #7543

Discussion

  • Dave Brondsema

    Dave Brondsema - 2014-07-11
    • Size: --> 2
     
  • Alexander Luberg

    • status: open --> in-progress
    • assigned_to: Alexander Luberg
     
  • Alexander Luberg

    • status: in-progress --> code-review
     
  • Alexander Luberg

    allura:al/7527

     
  • Dave Brondsema

    Dave Brondsema - 2014-07-23
    • status: code-review --> in-progress
    • QA: Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2014-07-23

    I think this is a good approach, but wow it has a pretty broad impact. I found several issues:

    You left a few pydevd lines.

    In verify_addr when other addresses are removed, we should log which were removed. In case there are issues and we need to check the logs to find out what happened. If [#7524] is merged yet, you can use those audit logs. Else just regular python logging is ok.

    by_email_address and identify_sender and password_recovery_hash methods use the user for the first record it finds. There could be multiple results though. Should limit it to confirmed address and I think that'll work then.

    These were true before this change, but might as well fix them now: claim_only_addresses() has a query on the claimed_by_user_id field which will run slowly since it's not the first field in any index. But claim_only_addresses isn't used anywhere, so you can just delete it :) And verify_addr has a query by nonce so we should add an index on nonce. Might as well make it a unique index too, so we never risk duplicate nonces.

    The upsert method behavior changed so it always just gets a new one and never returns an existing one. Perhaps that is the way it should work now, but probably good to review the logic of each upsert call, I am not so sure. If that's how it should be, it should be renamed to reflect what it does. We also have one reference in SF's internal repo to EmailAddress.upsert that I think needs to be changed (it assumes it'll get the existing record not a new one). That code also has a email_record._id which needs to be fixed.

    The migration fails for me with ming.schema.Invalid: _id:admin2@users.sf.net is not a valid ObjectId probably because its loading old data with the new model and so it doesn't validate. And I don't think a single update() call to change all directly in mongo would work either since we need to assign new ObjectIds. Looking around for ideas, I also came across the restriction that _id field can't be changed. So this may require some research. One approach we've done in the past for really big changes is to copy the old model definition and call it EmailAddressOld and then in EmailAddress change the __mongometa__ name = 'email_address' so that the data is stored in a new collection. Then the migration can copy from one to the other. Also, once you get something working, I'm curious how long it takes to handle a few million user records? I am thinking it might be necessary to also have on-the-fly migrations so that everything works for people using the site before the migration completes. If we do that, we'd still want the script to ensure all records get switched over to the new format. For on-the-fly migrations see http://merciless.sourceforge.net/tour.html#specifying-a-migration although that might not work with a 2-model approach.

    Lastly, and we may want to spin this off into a separate ticket, but giving a "Email address already claimed" message allows for "email enumeration" which we're trying to avoid (e.g. [#7543]). We should instead send a message to the given address and do something from there. I haven't thought through exactly what the process should be: email message says its already associated with username "foo", or allow new account to validate the address and remove it from old user, etc.

     

    Related

    Tickets: #7524
    Tickets: #7543

  • Dave Brondsema

    Dave Brondsema - 2014-07-25
    • Milestone: forge-jul-25 --> forge-aug-8
     
  • Alexander Luberg

    1) I have fixed the bugs & pushed changes related to forge to the same branch in that repo
    2) before_save approach for EmailAddress doesn't work for us, since it is a) called only when you actually save it to the database b) although it is getting called during flush, the actual saved data is not right in the resulting saved object.(looks like that changes I do to data does not reflect on the model) c) In some cases we need to generate an object without saving to DB & it won't be canonical, thus I've just renamed upsert -> create

    3)I have updated the migration script with raw mongo JS. The approach is the following:

    //1) Copy to the new collection with data updates - 8min for 3m records on the sandbox
    db.email_address.find().snapshot().forEach(function(e){
        e.email = e._id;
        e._id = new ObjectId();
        db.email_address_new.insert(e);
        db.email_address.update({'_id': e._id}, {'migrated': true})
    });
    
    //2) pull new code to production
    //3) Rename collections - takes no time
    db.email_address.renameCollection("email_address_old", {dropTarget: true})
    db.email_address_new.renameCollection("email_address", {dropTarget: true})
    
    //4) Post Migration - copy/update all the object which were created between 1)&2)
    // Should be quick, is done just in case if any data was pushed to email_address while we are deploying.
    db.email_address_old.find({'migrated': {'$not': false}}).snapshot().forEach(function(e){
        e.email = e._id;
        e._id = new ObjectId();
        db.email_address.insert(e);
        db.email_address_old.update({'_id': e._id}, {'migrated': true})
    });
    // Optional - we can remove the old collection in case the push is success.
    db.email_address_old.drop()
    
     
  • Alexander Luberg

    • status: in-progress --> code-review
     
  • Dave Brondsema

    Dave Brondsema - 2014-08-18
    • status: code-review --> in-progress
    • Milestone: forge-aug-8 --> forge-aug-22
     
  • Dave Brondsema

    Dave Brondsema - 2014-08-18

    Approach is looking good. In the migration .js file you should have $ne instead of $not. Can you also split this into 3 files? Something like this so that each file can be run on its own cleanly:

    • 030-email-address-_id-to-email--before-upgrade.js
    • 030-email-address-_id-to-email--after-upgrade.js
    • 030-email-address-_id-to-email--cleanup.js

    And then in the SF internal code, M.EmailAddress.create is creating too many records - a new copy each time you log in. Should just look up an existing record with the canonical form of the email address I think. Also need to ensure the email is marked as verified and primary.

     
  • Alexander Luberg

    forge-classic:al/7527 & allura fork:al/7527

     
  • Alexander Luberg

    • status: in-progress --> code-review
     
  • Dave Brondsema

    Dave Brondsema - 2014-08-28
    • summary: Email address associations need better user associations --> Email address associations need better user associations NEEDS MONGO MIGRATION
    • status: code-review --> closed
     
  • Dave Brondsema

    Dave Brondsema - 2015-01-05
    • Milestone: unreleased --> asf_release_1.2.0
     

Log in to post a comment.