Specifically the wiki search results should look better, but let's see how much improvement we can get through the generic search result template. E.g. https://sourceforge.net/p/allura/wiki/search/?q=faq A few suggestions
<hr>
for something betterHere's a mockup of how https://sourceforge.net/p/allura/wiki/search/?q=faq could look: http://screencast.com/t/06dC4NjT60
Tickets: #2835
Tickets: #4097
Tickets: #4807
Tickets: #4816
Tickets: #5686
Closed #291 (Search: wiki syntax help and examples)
je/42cc_2835_content_preview
Now every app can provide it's own text to be shown on click on the 'Help' button. I've also added some text for wiki, but I guess, you would like to change it, since my English is not so good :)
Merged je/42cc_2835 plus one commit [aced859] so that results still show reasonably without requiring all projects to be reindexed.
Merged je/42cc_2835 plus one commit [aced85] so that results still show reasonably without requiring all projects to be reindexed.
That caused a conflict when rebasing je/42cc_2835_content_preview. I resolved that and pushed to db/42cc_2835_content_preview if that's helpful.
I've pushed 2 more commits to db/42cc_2835_content_preview which include help text changes and a
--tasks
param toreindex
so that they run in the background and macros run properly.I see one security problem with
solarize()
. If you put<script>alert(1)</script>
in a wiki page it is indexed as HTML (striptags decodes the entities - see its source). And then it's rendered as safe. Can you see about a better way forsolarize()
to strip the tags? And add a test for it :)To add another layer of security, I'd rather not have
|safe
when we rendertext_match
andtitle_match
. Then if solarize() has some other subtle hole that we don't notice now, we're still secure. The only reason we need|safe
when rendering is for the<strong>
. That comes from the solr highlighter which makes it tricky. What about specifying placeholders instead of<strong>
in thehl
params. Using python-interpolation placeholders like%(start)s
would make it easy to escape first and then safely replace the placeholders with the strong tags. Example:Then you end up with a
Markup
object to return to the template and|safe
isn't needed.NB: if the text before placeholders contains % characters, we would need to pre-escape them (with %% in this case).
Other
%
s in the original content might cause other interpolation issues too. Might be easiest to avoid that altogether and use a unique placeholder string and then do a simple replacement instead of interpolation. E.g.replace('#ALLURA-HIGHLIGHT-START#', Markup('<strong>'), 1)
Created #308: [#2835] Search: better text preprocessing in solarize (1cp) for that
Related
Tickets:
#2835Closed 292.
je/42cc_2835_search_for_all_apps
Used dismax parser for all apps. Also, changed entire project search to a new template.
However, there is some strange behavior with entire project search. When I'm using it from project that was created as a new style (allura) project - all is ok, but when I'm trying to access the entire project search page for a migrated project (e.g.
/p/project3/search/
) I'm getting an 404 error. Seems like this is not related to recent code, 'cause on old master (before any commits from this ticket) it appears to behave exactly the same.Ouch, I guess we don't install the 'search' tool when we upgrade a project. We'll probably need to put that in the anchored tools as a way to get all projects to gain it.
Closed #308.
b1573c2..9f48e48 t308_solarize_fix -> je/42cc_2835_search_for_all_apps
All search changes now in this branch and only one ticket left in our backlog (#304 'nice' comments links in search results)
Closed #304. All changes are in
je/42cc_2835_search_for_all_apps
I've closed this but had to make a few final tweaks. Definitely take a look at commit 62912a96ea805872dc60c67bb2e19f4f811ecaf0 which was necessary to escape the returned text from solr properly.
Hm, this is exactly what I've done at first, but then I've changed it to
Markup()
in [1bdaa1]. AFAIR, that caused a double-escaping of values (solarize
escapes them when making the index). I think I've not tested that with old index, which needed escaping.You're right. Thanks for checking me on that. I was testing against data that wasn't escaped with
solarize()
. I was also thinking that somehow data might get into solr that wasn't escaped, so we'd want to escape it on the way out for extra security. But we can't escape it both places.Tim made a good point about keeping the solr text as original text (not escaped) so that search hits are accurate (not trying to match search terms against escaped content but original text). We also could want to use the solr text in some other context (e.g. email instead of web) where the escaping isn't needed. And if any security hole is found in the future, it'll be faster to deploy a fix if the escaping is on the way out (won't have to reindex everything). But we do still need to strip the markdown via conversion to HTML & then strip those HTML tags, since storing Markdown has its own problems (search term matching would be weird, highlight results have partial markdown which would be basically impossible to strip out). So I've made a change to remove the escaping from
solarize()
and keep it in thesearch_app
helper. I've also added a test. That's all in [9eceea6680fe97e2af57523452ade27d2a3b34dc] on db/2835. Could you review it?Yes, escaping only on the way out makes sense. I've reviewed your changes and tested on sandbox, it's looks good to me.