Projects should be indexed into solr. They are not artifacts, but they should gain methods like index
and index_id
that are used on artifacts, so a similar pattern is used. Put in a reasonable set of fields of top-level project data that would be useful in searches (name, neighborhood, descr, labels, categories, reg date, etc). (Use-cases for this later will be an admin search page for projects, and also a public project directory)
Also add a method to ProjectRegistrationProvider so that those extensions can add more fields to what gets indexed. Provide a default empty method so that existing providers don't have to be changed. Include good docstrings for Project.index and the ProjectRegistrationProvider method.
After project changes occur that would affect the indexed fields, fire off a background task to update the index for the project. (Somewhat similar to the add_artifacts
task but much much simpler since artifact references aren't applicable here.)
Create a ScriptTask to index all the projects. Use utils.chunked_find
for the loop. Batch up the solr calls in convenient chunks too, so many documents get sent to solr at once.
We will do this later for Users, too, following a very similar pattern, so make new logic a little bit generic so it can be re-used then. Example: background task to add to solr could work for any non-artifact object that has an index()
method.
Forgot to mention that these can be saved in the same solr core as all the artifacts. Just continue to follow the
Artifact.index()
naming conventions like suffixes for type. Use the same names when possible like url_s and project_id_s, and especially make sure to settype_s
so that can continue to be our primary way to distinguish record types.Closed #562, #563.
je/42cc_7257
Looking very good. I like the architecture of this.
Branch db/7257 is rebased against master (few minor conflicts) and I added a few minor improvements.
A slightly larger issue is that
add_projects
index tasks (often several of them) fire upon any artifact changes. For example, if I add a comment to a forum, 4 add_projects tasks fire. That's going to be a lot of unnecessary task and solr activity. Can you figure out why, and see what can be done to avoid that? Just guessing now, but I wonder if it is the Artifact.before_save hook that sets the project.last_updated field. And if so, perhaps we want to somehow make that specifically not trigger anadd_projects
task. (And I think it would be fine to omit the last_updated field from solr so that it isn't out-of-date all the time).Closed #587. Force-pushed
je/42cc_7275
You were right about
Artifact.before_save
causing this. I've added ability for artifact to decide if it's index should be updated and implemented it forProject
.