#7257 Index all projects in solr

asf_release_1.2.0
closed
nobody
42cc (378)
General
2015-03-10
2014-03-07
Dave Brondsema
No

Projects should be indexed into solr. They are not artifacts, but they should gain methods like index and index_id that are used on artifacts, so a similar pattern is used. Put in a reasonable set of fields of top-level project data that would be useful in searches (name, neighborhood, descr, labels, categories, reg date, etc). (Use-cases for this later will be an admin search page for projects, and also a public project directory)

Also add a method to ProjectRegistrationProvider so that those extensions can add more fields to what gets indexed. Provide a default empty method so that existing providers don't have to be changed. Include good docstrings for Project.index and the ProjectRegistrationProvider method.

After project changes occur that would affect the indexed fields, fire off a background task to update the index for the project. (Somewhat similar to the add_artifacts task but much much simpler since artifact references aren't applicable here.)

Create a ScriptTask to index all the projects. Use utils.chunked_find for the loop. Batch up the solr calls in convenient chunks too, so many documents get sent to solr at once.

We will do this later for Users, too, following a very similar pattern, so make new logic a little bit generic so it can be re-used then. Example: background task to add to solr could work for any non-artifact object that has an index() method.

Related

Tickets: #7278
Tickets: #7589

Discussion

  • Dave Brondsema
    Dave Brondsema
    2014-03-07

    Forgot to mention that these can be saved in the same solr core as all the artifacts. Just continue to follow the Artifact.index() naming conventions like suffixes for type. Use the same names when possible like url_s and project_id_s, and especially make sure to set type_s so that can continue to be our primary way to distinguish record types.

     
    • status: open --> code-review
     
  • Closed #562, #563. je/42cc_7257

     
  • Dave Brondsema
    Dave Brondsema
    2014-05-06

    • QA: Dave Brondsema
     
  • Dave Brondsema
    Dave Brondsema
    2014-05-09

    Looking very good. I like the architecture of this.

    Branch db/7257 is rebased against master (few minor conflicts) and I added a few minor improvements.

    A slightly larger issue is that add_projects index tasks (often several of them) fire upon any artifact changes. For example, if I add a comment to a forum, 4 add_projects tasks fire. That's going to be a lot of unnecessary task and solr activity. Can you figure out why, and see what can be done to avoid that? Just guessing now, but I wonder if it is the Artifact.before_save hook that sets the project.last_updated field. And if so, perhaps we want to somehow make that specifically not trigger an add_projects task. (And I think it would be fine to omit the last_updated field from solr so that it isn't out-of-date all the time).

     
  • Dave Brondsema
    Dave Brondsema
    2014-05-09

    • status: code-review --> in-progress
     
    • status: in-progress --> code-review
     
  • Closed #587. Force-pushed je/42cc_7275

    You were right about Artifact.before_save causing this. I've added ability for artifact to decide if it's index should be updated and implemented it for Project.

     
  • Dave Brondsema
    Dave Brondsema
    2014-05-21

    • status: code-review --> closed
    • Milestone: forge-backlog --> forge-may-30
     
  • Dave Brondsema
    Dave Brondsema
    2015-01-05

    • Milestone: unreleased --> asf_release_1.2.0