#2812 Generate static sitemap xml files for Allura

v1.0.0
closed
sf-2 (994)
General
nobody
2015-08-20
2011-09-15
No

Refactor [#2786] into a script that will generate all the xml files in one shot.

Related

Tickets: #2786

Discussion

    • status: open --> in-progress
     
  • forge:tv/2812

    $ paster script production.ini ../scripts/create-allura-sitemap.py
    

    This will create a new folder (./allura_sitemap) containing all the sitemap xml files for Allura.

    Once QA'd set to validation so we can:
    1. Add new entry to robots.txt:
    Sitemap: http://sourceforge.net/allura_sitemap/sitemap.xml
    2. Have SOG put this on cron to generate new sitemap files daily and copy to webserver.

     
    • status: in-progress --> code-review
    • assigned_to: Tim Van Steenburgh --> John Hoffmann ☠
     
  • Dave Brondsema

    Dave Brondsema - 2011-09-19
    • size: --> 2
     
  • John Hoffman - 2011-09-19
    • status: code-review --> validation
    • assigned_to: John Hoffmann ☠ --> Tim Van Steenburgh
     
  • Dave Brondsema

    Dave Brondsema - 2011-09-20
    • status: validation --> open
     
  • Dave Brondsema

    Dave Brondsema - 2011-09-20

    We shouldn't hardcode <changefreq>daily</changefreq>, better to let Google figure it out if we don't know. Maybe we have data that we could use to populate <lastmod>.

    We need to ask SOG what the OUTPUT_DIR should be (or make it a cmd param). I note that 'allura_sitemap' is hardcoded into one path too. The script shouldn't quit if the dir already exists. I think we want to have this overwrite existing files, rather than making a new dir (don't want missing sitemap files for even a few minutes)

     
  • I think changefreq is okay as is, since the sitemap spec indicates it's regarded as a hint for the crawler and not a command.

    WRT to way the script works, I was envisioning the files being created locally and then copied to the web servers. The 'allura_sitemap' that's hardcoded is the URL path, not the FS path.

    I've opened https://control.sog.geek.net/sog/trac/ticket/19111 with SOG and will make changes to the script as necessary to get it working the way they prefer.

     
  • Need a code review on a couple changes. Chris Everest has run this against prod and given it the ok.

    forge:tv/2812
    79d5a22a0fb65944f366302d856967f64e10ce6c - Added cmd line configurable output dir.
    4aba6ece0942f25642ed977fde07eb51e6e32c2c - Fixed mem leak and handle errors gracefully.

     
    • status: open --> code-review
    • assigned_to: Tim Van Steenburgh --> Dave Brondsema
     
  • Dave Brondsema

    Dave Brondsema - 2011-09-26
    • status: code-review --> closed
     

Log in to post a comment.