Apache Allura™ / Tickets / #7527 Email address associations need better user associations NEEDS MONGO MIGRATION

Dave Brondsema - 2014-07-11

Size: --> 2
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexander Luberg - 2014-07-17

status: open --> in-progress

assigned_to: Alexander Luberg
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexander Luberg - 2014-07-22

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexander Luberg - 2014-07-22

allura:al/7527

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-07-23

status: code-review --> in-progress

QA: Dave Brondsema
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-07-23

I think this is a good approach, but wow it has a pretty broad impact. I found several issues:

You left a few pydevd lines.

In verify_addr when other addresses are removed, we should log which were removed. In case there are issues and we need to check the logs to find out what happened. If [#7524] is merged yet, you can use those audit logs. Else just regular python logging is ok.

by_email_address and identify_sender and password_recovery_hash methods use the user for the first record it finds. There could be multiple results though. Should limit it to confirmed address and I think that'll work then.

These were true before this change, but might as well fix them now: claim_only_addresses() has a query on the claimed_by_user_id field which will run slowly since it's not the first field in any index. But claim_only_addresses isn't used anywhere, so you can just delete it :) And verify_addr has a query by nonce so we should add an index on nonce. Might as well make it a unique index too, so we never risk duplicate nonces.

The upsert method behavior changed so it always just gets a new one and never returns an existing one. Perhaps that is the way it should work now, but probably good to review the logic of each upsert call, I am not so sure. If that's how it should be, it should be renamed to reflect what it does. We also have one reference in SF's internal repo to EmailAddress.upsert that I think needs to be changed (it assumes it'll get the existing record not a new one). That code also has a email_record._id which needs to be fixed.

The migration fails for me with ming.schema.Invalid: _id:admin2@users.sf.net is not a valid ObjectId probably because its loading old data with the new model and so it doesn't validate. And I don't think a single update() call to change all directly in mongo would work either since we need to assign new ObjectIds. Looking around for ideas, I also came across the restriction that _id field can't be changed. So this may require some research. One approach we've done in the past for really big changes is to copy the old model definition and call it EmailAddressOld and then in EmailAddress change the __mongometa__ name = 'email_address' so that the data is stored in a new collection. Then the migration can copy from one to the other. Also, once you get something working, I'm curious how long it takes to handle a few million user records? I am thinking it might be necessary to also have on-the-fly migrations so that everything works for people using the site before the migration completes. If we do that, we'd still want the script to ensure all records get switched over to the new format. For on-the-fly migrations see http://merciless.sourceforge.net/tour.html#specifying-a-migration although that might not work with a 2-model approach.

Lastly, and we may want to spin this off into a separate ticket, but giving a "Email address already claimed" message allows for "email enumeration" which we're trying to avoid (e.g. [#7543]). We should instead send a message to the given address and do something from there. I haven't thought through exactly what the process should be: email message says its already associated with username "foo", or allow new account to validate the address and remove it from old user, etc.

Related

Tickets: ~~#7524~~
Tickets: ~~#7543~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-07-25

Milestone: forge-jul-25 --> forge-aug-8
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

1) I have fixed the bugs & pushed changes related to forge to the same branch in that repo
2) before_save approach for EmailAddress doesn't work for us, since it is a) called only when you actually save it to the database b) although it is getting called during flush, the actual saved data is not right in the resulting saved object.(looks like that changes I do to data does not reflect on the model) c) In some cases we need to generate an object without saving to DB & it won't be canonical, thus I've just renamed upsert -> create

3)I have updated the migration script with raw mongo JS. The approach is the following:

//1) Copy to the new collection with data updates - 8min for 3m records on the sandbox
db.email_address.find().snapshot().forEach(function(e){
    e.email = e._id;
    e._id = new ObjectId();
    db.email_address_new.insert(e);
    db.email_address.update({'_id': e._id}, {'migrated': true})
});

//2) pull new code to production
//3) Rename collections - takes no time
db.email_address.renameCollection("email_address_old", {dropTarget: true})
db.email_address_new.renameCollection("email_address", {dropTarget: true})

//4) Post Migration - copy/update all the object which were created between 1)&2)
// Should be quick, is done just in case if any data was pushed to email_address while we are deploying.
db.email_address_old.find({'migrated': {'$not': false}}).snapshot().forEach(function(e){
    e.email = e._id;
    e._id = new ObjectId();
    db.email_address.insert(e);
    db.email_address_old.update({'_id': e._id}, {'migrated': true})
});
// Optional - we can remove the old collection in case the push is success.
db.email_address_old.drop()

Alexander Luberg - 2014-07-29

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-08-18

status: code-review --> in-progress

Milestone: forge-aug-8 --> forge-aug-22
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-08-18

Approach is looking good. In the migration .js file you should have $ne instead of $not. Can you also split this into 3 files? Something like this so that each file can be run on its own cleanly:

030-email-address-_id-to-email--before-upgrade.js

030-email-address-_id-to-email--after-upgrade.js

030-email-address-_id-to-email--cleanup.js

And then in the SF internal code, M.EmailAddress.create is creating too many records - a new copy each time you log in. Should just look up an existing record with the canonical form of the email address I think. Also need to ensure the email is marked as verified and primary.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexander Luberg - 2014-08-27

forge-classic:al/7527 & allura fork:al/7527

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexander Luberg - 2014-08-27

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2014-08-28

summary: Email address associations need better user associations --> Email address associations need better user associations NEEDS MONGO MIGRATION

status: code-review --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2015-01-05

Milestone: unreleased --> asf_release_1.2.0
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Apache Allura™

Forge software for hosting software projects

Milestone

Searches

Help

#7527 Email address associations need better user associations NEEDS MONGO MIGRATION

Related

Discussion

Related