#2835 Better standard tool search results

v1.0.0
closed
nobody
General
2015-08-20
2011-09-20
No

Specifically the wiki search results should look better, but let's see how much improvement we can get through the generic search result template. E.g. https://sourceforge.net/p/allura/wiki/search/?q=faq A few suggestions

  • an option to only include primary artifacts (no comments)
  • artifact type (WikiPage) doesn't need to be shown in that case - best to stop including that in the title field when indexing?
  • exclude deleted pages
  • better relevance - e.g. a title match should be better than a body text match. I also think we are matching across too many fields in the first place. E.g. a search for "forgewiki" or "WikiPage" or the mount point shouldn't match the "internal" fields in the solr document.
  • show a preview of the content
  • change <hr> for something better
  • add some metadata on the title line, like last modified date
  • if you choose to search history, the results links need "?version=N"
  • change sort order (relevance, date, etc) and direction
  • add search syntax help and examples (like we do for tickets)

Here's a mockup of how https://sourceforge.net/p/allura/wiki/search/?q=faq could look: http://screencast.com/t/06dC4NjT60

Related

Tickets: #2835
Tickets: #4097
Tickets: #4807
Tickets: #4816
Tickets: #5686

Discussion

<< < 1 2 (Page 2 of 2)
  • Igor Bondarenko - 2013-03-25

    Closed #291 (Search: wiki syntax help and examples)

    je/42cc_2835_content_preview

    Now every app can provide it's own text to be shown on click on the 'Help' button. I've also added some text for wiki, but I guess, you would like to change it, since my English is not so good :)

     
  • Dave Brondsema

    Dave Brondsema - 2013-03-26

    Merged je/42cc_2835 plus one commit [aced859] so that results still show reasonably without requiring all projects to be reindexed.

     
  • Dave Brondsema

    Dave Brondsema - 2013-03-26

    Merged je/42cc_2835 plus one commit [aced85] so that results still show reasonably without requiring all projects to be reindexed.

    That caused a conflict when rebasing je/42cc_2835_content_preview. I resolved that and pushed to db/42cc_2835_content_preview if that's helpful.

     
  • Dave Brondsema

    Dave Brondsema - 2013-03-27

    I've pushed 2 more commits to db/42cc_2835_content_preview which include help text changes and a --tasks param to reindex so that they run in the background and macros run properly.

    I see one security problem with solarize(). If you put &lt;script&gt;alert(1)&lt;/script&gt; in a wiki page it is indexed as HTML (striptags decodes the entities - see its source). And then it's rendered as safe. Can you see about a better way for solarize() to strip the tags? And add a test for it :)

    To add another layer of security, I'd rather not have |safe when we render text_match and title_match. Then if solarize() has some other subtle hole that we don't notice now, we're still secure. The only reason we need |safe when rendering is for the <strong>. That comes from the solr highlighter which makes it tricky. What about specifying placeholders instead of <strong> in the hl params. Using python-interpolation placeholders like %(start)s would make it easy to escape first and then safely replace the placeholders with the strong tags. Example:

    >>> match = 'foo %(start)sbar%(end)s <script>alert(1)</script>'
    >>> print jinja2.escape(match) % dict(start=Markup('<strong>'), end=Markup('</strong>'))
    foo <strong>bar</strong> &lt;script&gt;alert(1)&lt;/script&gt;
    

    Then you end up with a Markup object to return to the template and |safe isn't needed.

     
  • Cory Johns - 2013-03-27

    NB: if the text before placeholders contains % characters, we would need to pre-escape them (with %% in this case).

     
  • Dave Brondsema

    Dave Brondsema - 2013-03-27

    Other %s in the original content might cause other interpolation issues too. Might be easiest to avoid that altogether and use a unique placeholder string and then do a simple replacement instead of interpolation. E.g .replace('#ALLURA-HIGHLIGHT-START#', Markup('<strong>'), 1)

     
  • Igor Bondarenko - 2013-03-28

    Created #308: [#2835] Search: better text preprocessing in solarize (1cp) for that

     

    Related

    Tickets: #2835

  • Igor Bondarenko - 2013-03-28

    Closed 292. je/42cc_2835_search_for_all_apps

    Used dismax parser for all apps. Also, changed entire project search to a new template.

    However, there is some strange behavior with entire project search. When I'm using it from project that was created as a new style (allura) project - all is ok, but when I'm trying to access the entire project search page for a migrated project (e.g. /p/project3/search/) I'm getting an 404 error. Seems like this is not related to recent code, 'cause on old master (before any commits from this ticket) it appears to behave exactly the same.

     
  • Dave Brondsema

    Dave Brondsema - 2013-03-28

    Ouch, I guess we don't install the 'search' tool when we upgrade a project. We'll probably need to put that in the anchored tools as a way to get all projects to gain it.

     
  • Igor Bondarenko - 2013-04-02

    Closed #308.

    b1573c2..9f48e48 t308_solarize_fix -> je/42cc_2835_search_for_all_apps

    All search changes now in this branch and only one ticket left in our backlog (#304 'nice' comments links in search results)

     
  • Igor Bondarenko - 2013-04-04
    • status: in-progress --> code-review
     
  • Igor Bondarenko - 2013-04-04

    Closed #304. All changes are in je/42cc_2835_search_for_all_apps

     
  • Dave Brondsema

    Dave Brondsema - 2013-04-24
    • status: code-review --> closed
    • Milestone: forge-backlog --> forge-may-03
     
  • Dave Brondsema

    Dave Brondsema - 2013-04-24

    I've closed this but had to make a few final tweaks. Definitely take a look at commit 62912a96ea805872dc60c67bb2e19f4f811ecaf0 which was necessary to escape the returned text from solr properly.

     
  • Igor Bondarenko - 2013-04-25

    Hm, this is exactly what I've done at first, but then I've changed it to Markup() in [1bdaa1]. AFAIR, that caused a double-escaping of values (solarize escapes them when making the index). I think I've not tested that with old index, which needed escaping.

     
  • Dave Brondsema

    Dave Brondsema - 2013-04-25

    You're right. Thanks for checking me on that. I was testing against data that wasn't escaped with solarize(). I was also thinking that somehow data might get into solr that wasn't escaped, so we'd want to escape it on the way out for extra security. But we can't escape it both places.

     
  • Dave Brondsema

    Dave Brondsema - 2013-04-26

    Tim made a good point about keeping the solr text as original text (not escaped) so that search hits are accurate (not trying to match search terms against escaped content but original text). We also could want to use the solr text in some other context (e.g. email instead of web) where the escaping isn't needed. And if any security hole is found in the future, it'll be faster to deploy a fix if the escaping is on the way out (won't have to reindex everything). But we do still need to strip the markdown via conversion to HTML & then strip those HTML tags, since storing Markdown has its own problems (search term matching would be weird, highlight results have partial markdown which would be basically impossible to strip out). So I've made a change to remove the escaping from solarize() and keep it in the search_app helper. I've also added a test. That's all in [9eceea6680fe97e2af57523452ade27d2a3b34dc] on db/2835. Could you review it?

     
  • Igor Bondarenko - 2013-04-29

    Yes, escaping only on the way out makes sense. I've reviewed your changes and tested on sandbox, it's looks good to me.

     
<< < 1 2 (Page 2 of 2)

Log in to post a comment.