For upcoming solr schema changes, we will reindex everything in Allura, via the ReindexCommand. This will take quite a while. [#6191] will help, but the ReindexCommand code itself will need some improvements to handle this process. Evaluate these concerns, do further research & testing at small scales where needed, and make code changes.
--project
and --neighborhood
. We'll want to do some medium sized batches, so allowing --project
to be a regex will give us more flexibility in the batches we run. (I think we do this on some other script or command too).ref_ids
list could get extremely large even for a single project, and too big to save in mongo (when add_artifacts.post records the task). Consider using the BatchIndexer
class from forge-classic/sfx/lib/migrate.py Or at least it's _post
method technique for splitting tasks into smaller sizes. You can move BatchIndexer
into Allura if it makes sense to use it--solr
to skip that other stuff, and that would make it go faster. However, I recall things not working completely right when doing that. Research and test this risk.
Created following tickets for now:
--project
and--skip-solr-delete
options)Related
Tickets:
#6192Closed #335.
je/42cc_6192
Closed #336.
je/42cc_6192
I don't see benefits of using entire
BatchIndexer
, 'cause it just extendsArtifactSessionExtension
and it split tasks in chunks when some artifacts were added or changed (so, needed to be re-indexed). But during execution of the re-index command we never change artifacts, and, in fact, we never flush artifact session, so this code never get called.However, using
BatchIndexer._post
technique for splitting tasks is good for this case, 'cause we're creating this tasks manually and can split it right away.Also, I've tested --solr option a little bit and seems like it works fine. Do you recall what kind of problems this was causing (invalid 'related artifacts' links, missing index, etc)? It would be helpful to know in what direction to dig in #337. Thanks.
JFYI: comment in
BatchIndexer.flush
lies:cls.to_add
also contains index ids ascls.to_delete
, not ObjectIds. So, chunks forcls.to_add
should be the same size as forcls.to_delete
. Bigger chunks still works, 'cause_post
splits them anyway, but it's a little bit misleading. Maybe you'll want a low-priority ticket to fix this comment and corresponding code. Just letting you know :)I don't remember exactly what errors I had with
--solr
. If it works well for you now, we can leave it at that and do further work if errors come up in the future.Ok, I'll going to do a little bit more of testing for this in #337 (maybe an hour or so) to make sure.
Closed #337.
I've tested a little bit more and it seems like
--solr
works fine.You can review now what's in
je/42cc_6192