https://bleach.readthedocs.io/en/latest/goals.html#bleach-vs-html5lib has some reasons. Also html5lib hasn't had a lot of activity or releases for a while, and bleach is more actively maintained. Regarding their claim of sanitize_css
being broken, I found these issues which seem to indicate its not a huge risk, but not correct either:
We have customized behavior with our ForgeHTMLSanitizerFilter
class, so it'll take careful work to make sure the right logic is still applied.
https://github.com/yourcelf/bleach-whitelist has a list of tags/attrs/styles that could be handy (doesn't bleach have its own safe list?)
The https://github.com/yourcelf/bleach-whitelist list doesn't look real good to me (very limited on tags, not limited enough on css rules). The new version of Pypeline I've been working on will use
bleach
and will come with a ruleset that should work well for Allura.