When more or less big (<100 tickets) tracker import is run against sandbox (with nginx frontend) it ends up with:
$ time python allura_import_test.py data/sf100.json -u https://sf-psokolovsky-3014.sb.sf.net -a tckd18490839ab2c834b16b -s 7c43a517b310ab7477f251922282456fa184c6440f2769c76fecfece04b1a58125063030b166d4e8 Importing 100 tickets Traceback (most recent call last): File "allura_import_test.py", line 93, in <module> res = cli.call(url, doc=doc_txt, options=json.dumps(import_options)) File "allura_import_test.py", line 64, in call raise e urllib2.HTTPError: HTTP Error 504: Gateway Time-out real 0m31.727s user 0m0.300s sys 0m0.048s </module>
I.e. nginx or some other intermediate component times out in 30s. So, the current approach of providing import data in ForgePlucker format as one big JSON and then executing import synchronously (to return status to the client) is not viable and needs replacement/augmentation. Following choices can be proposed:
Choice 1 is by far the easiest to implement - it would just require what #1767 already queued up, just need to split chunks to be of size 1.
Well, and solutions 2 & 3 do have drawback that there's still max size limit for import doc (request body).
Any guesses how much overhead #1 would have? It's a separate request/response for each ticket instead of just one. But it sidesteps the issue you mentioned with 2 & 3 (file size limit)
I also like 1 because we can have just one API, not two. We already have an oauth API for POST to /mytickets/new. I'm not sure if its syntactically similar to ForgePlucker, but I don't think that is necessarily important. We'd just have to add a field or two that can only be set for migration (e.g. 'reported_by').
Any guesses how much overhead #1 would have? It's a separate request/response for each ticket instead of just one. But it sidesteps the issue you mentioned with 2 & 3 (file size limit)
I also like 1 because we can have just one API, not two. We already have an oauth API for POST to /mytickets/new. I'm not sure if its syntactically similar to ForgePlucker, but I don't think that is necessarily important. We'd just have to add a field or two that can only be set for migration (e.g. 'reported_by').
Discussed this with Dave, and we agreed that choice gives best solution to the issue, and implementation-wise, existing infrastructure for ForgePlucker format should used, just docs submitted should contain single ticket per call. The this essentially reduces to [#1767] with chunk size of 1. And my tests show big improvement in import time - 1K tickets are now imported in ~14mins locally. SOLR indexing still takes about an hour, but now background indexing tasks are properly queued/scheduled, and there's no bottleneck for import API calls.
So, closing this ticket, changes will go against [#1767].
Related
Tickets:
#1767