We shouldn't hardcode <changefreq>daily</changefreq>, better to let Google figure it out if we don't know. Maybe we have data that we could use to populate <lastmod>.
We need to ask SOG what the OUTPUT_DIR should be (or make it a cmd param). I note that 'allura_sitemap' is hardcoded into one path too. The script shouldn't quit if the dir already exists. I think we want to have this overwrite existing files, rather than making a new dir (don't want missing sitemap files for even a few minutes)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think changefreq is okay as is, since the sitemap spec indicates it's regarded as a hint for the crawler and not a command.
WRT to way the script works, I was envisioning the files being created locally and then copied to the web servers. The 'allura_sitemap' that's hardcoded is the URL path, not the FS path.
Need a code review on a couple changes. Chris Everest has run this against prod and given it the ok.
forge:tv/2812
79d5a22a0fb65944f366302d856967f64e10ce6c - Added cmd line configurable output dir.
4aba6ece0942f25642ed977fde07eb51e6e32c2c - Fixed mem leak and handle errors gracefully.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
forge:tv/2812
This will create a new folder (./allura_sitemap) containing all the sitemap xml files for Allura.
Once QA'd set to validation so we can:
Sitemap: http://sourceforge.net/allura_sitemap/sitemap.xml
We shouldn't hardcode
<changefreq>daily</changefreq>, better to let Google figure it out if we don't know. Maybe we have data that we could use to populate<lastmod>.We need to ask SOG what the OUTPUT_DIR should be (or make it a cmd param). I note that 'allura_sitemap' is hardcoded into one path too. The script shouldn't quit if the dir already exists. I think we want to have this overwrite existing files, rather than making a new dir (don't want missing sitemap files for even a few minutes)
I think
changefreqis okay as is, since the sitemap spec indicates it's regarded as a hint for the crawler and not a command.WRT to way the script works, I was envisioning the files being created locally and then copied to the web servers. The 'allura_sitemap' that's hardcoded is the URL path, not the FS path.
I've opened https://control.sog.geek.net/sog/trac/ticket/19111 with SOG and will make changes to the script as necessary to get it working the way they prefer.
Need a code review on a couple changes. Chris Everest has run this against prod and given it the ok.
forge:tv/2812
79d5a22a0fb65944f366302d856967f64e10ce6c - Added cmd line configurable output dir.
4aba6ece0942f25642ed977fde07eb51e6e32c2c - Fixed mem leak and handle errors gracefully.