Apache Allura™ / Tickets / #2835 Better standard tool search results

Igor Bondarenko - 2013-03-25

Closed #291 (Search: wiki syntax help and examples)

je/42cc_2835_content_preview

Now every app can provide it's own text to be shown on click on the 'Help' button. I've also added some text for wiki, but I guess, you would like to change it, since my English is not so good :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-03-26

Merged je/42cc_2835 plus one commit [aced859] so that results still show reasonably without requiring all projects to be reindexed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-03-26

Merged je/42cc_2835 plus one commit [aced85] so that results still show reasonably without requiring all projects to be reindexed.

That caused a conflict when rebasing je/42cc_2835_content_preview. I resolved that and pushed to db/42cc_2835_content_preview if that's helpful.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-03-27

I've pushed 2 more commits to db/42cc_2835_content_preview which include help text changes and a --tasks param to reindex so that they run in the background and macros run properly.

I see one security problem with solarize(). If you put <script>alert(1)</script> in a wiki page it is indexed as HTML (striptags decodes the entities - see its source). And then it's rendered as safe. Can you see about a better way for solarize() to strip the tags? And add a test for it :)

To add another layer of security, I'd rather not have |safe when we render text_match and title_match. Then if solarize() has some other subtle hole that we don't notice now, we're still secure. The only reason we need |safe when rendering is for the <strong>. That comes from the solr highlighter which makes it tricky. What about specifying placeholders instead of <strong> in the hl params. Using python-interpolation placeholders like %(start)s would make it easy to escape first and then safely replace the placeholders with the strong tags. Example:

>>> match = 'foo %(start)sbar%(end)s <script>alert(1)</script>' >>> print jinja2.escape(match) % dict(start=Markup('<strong>'), end=Markup('</strong>')) foo <strong>bar</strong> <script>alert(1)</script>

Then you end up with a Markup object to return to the template and |safe isn't needed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Cory Johns - 2013-03-27

NB: if the text before placeholders contains % characters, we would need to pre-escape them (with %% in this case).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-03-27

Other %s in the original content might cause other interpolation issues too. Might be easiest to avoid that altogether and use a unique placeholder string and then do a simple replacement instead of interpolation. E.g .replace('#ALLURA-HIGHLIGHT-START#', Markup('<strong>'), 1)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-03-28

Created #308: [#2835] Search: better text preprocessing in solarize (1cp) for that

Related

Tickets: ~~#2835~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-03-28

Closed 292. je/42cc_2835_search_for_all_apps

Used dismax parser for all apps. Also, changed entire project search to a new template.

However, there is some strange behavior with entire project search. When I'm using it from project that was created as a new style (allura) project - all is ok, but when I'm trying to access the entire project search page for a migrated project (e.g. /p/project3/search/) I'm getting an 404 error. Seems like this is not related to recent code, 'cause on old master (before any commits from this ticket) it appears to behave exactly the same.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-03-28

Ouch, I guess we don't install the 'search' tool when we upgrade a project. We'll probably need to put that in the anchored tools as a way to get all projects to gain it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-04-02

Closed #308.

b1573c2..9f48e48 t308_solarize_fix -> je/42cc_2835_search_for_all_apps

All search changes now in this branch and only one ticket left in our backlog (#304 'nice' comments links in search results)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-04-04

status: in-progress --> code-review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-04-04

Closed #304. All changes are in je/42cc_2835_search_for_all_apps

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-04-24

status: code-review --> closed

Milestone: forge-backlog --> forge-may-03
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-04-24

I've closed this but had to make a few final tweaks. Definitely take a look at commit 62912a96ea805872dc60c67bb2e19f4f811ecaf0 which was necessary to escape the returned text from solr properly.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-04-25

Hm, this is exactly what I've done at first, but then I've changed it to Markup() in [1bdaa1]. AFAIR, that caused a double-escaping of values (solarize escapes them when making the index). I think I've not tested that with old index, which needed escaping.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-04-25

You're right. Thanks for checking me on that. I was testing against data that wasn't escaped with solarize(). I was also thinking that somehow data might get into solr that wasn't escaped, so we'd want to escape it on the way out for extra security. But we can't escape it both places.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dave Brondsema - 2013-04-26

Tim made a good point about keeping the solr text as original text (not escaped) so that search hits are accurate (not trying to match search terms against escaped content but original text). We also could want to use the solr text in some other context (e.g. email instead of web) where the escaping isn't needed. And if any security hole is found in the future, it'll be faster to deploy a fix if the escaping is on the way out (won't have to reindex everything). But we do still need to strip the markdown via conversion to HTML & then strip those HTML tags, since storing Markdown has its own problems (search term matching would be weird, highlight results have partial markdown which would be basically impossible to strip out). So I've made a change to remove the escaping from solarize() and keep it in the search_app helper. I've also added a test. That's all in [9eceea6680fe97e2af57523452ade27d2a3b34dc] on db/2835. Could you review it?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Bondarenko - 2013-04-29

Yes, escaping only on the way out makes sense. I've reviewed your changes and tested on sandbox, it's looks good to me.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Apache Allura™

Forge software for hosting software projects

Milestone

Searches

Help

#2835 Better standard tool search results

Related

Discussion

Related