This will create a new folder (./allura_sitemap) containing all the sitemap xml files for Allura.
Once QA'd set to validation so we can:
1. Add new entry to robots.txt:
Sitemap: http://sourceforge.net/allura_sitemap/sitemap.xml
2. Have SOG put this on cron to generate new sitemap files daily and copy to webserver.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We shouldn't hardcode <changefreq>daily</changefreq>, better to let Google figure it out if we don't know. Maybe we have data that we could use to populate <lastmod>.
We need to ask SOG what the OUTPUT_DIR should be (or make it a cmd param). I note that 'allura_sitemap' is hardcoded into one path too. The script shouldn't quit if the dir already exists. I think we want to have this overwrite existing files, rather than making a new dir (don't want missing sitemap files for even a few minutes)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think changefreq is okay as is, since the sitemap spec indicates it's regarded as a hint for the crawler and not a command.
WRT to way the script works, I was envisioning the files being created locally and then copied to the web servers. The 'allura_sitemap' that's hardcoded is the URL path, not the FS path.
Need a code review on a couple changes. Chris Everest has run this against prod and given it the ok.
forge:tv/2812
79d5a22a0fb65944f366302d856967f64e10ce6c - Added cmd line configurable output dir.
4aba6ece0942f25642ed977fde07eb51e6e32c2c - Fixed mem leak and handle errors gracefully.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
forge:tv/2812
This will create a new folder (./allura_sitemap) containing all the sitemap xml files for Allura.
Once QA'd set to validation so we can:
1. Add new entry to robots.txt:
Sitemap: http://sourceforge.net/allura_sitemap/sitemap.xml
2. Have SOG put this on cron to generate new sitemap files daily and copy to webserver.
We shouldn't hardcode
<changefreq>daily</changefreq>
, better to let Google figure it out if we don't know. Maybe we have data that we could use to populate<lastmod>
.We need to ask SOG what the OUTPUT_DIR should be (or make it a cmd param). I note that 'allura_sitemap' is hardcoded into one path too. The script shouldn't quit if the dir already exists. I think we want to have this overwrite existing files, rather than making a new dir (don't want missing sitemap files for even a few minutes)
I think
changefreq
is okay as is, since the sitemap spec indicates it's regarded as a hint for the crawler and not a command.WRT to way the script works, I was envisioning the files being created locally and then copied to the web servers. The 'allura_sitemap' that's hardcoded is the URL path, not the FS path.
I've opened https://control.sog.geek.net/sog/trac/ticket/19111 with SOG and will make changes to the script as necessary to get it working the way they prefer.
Need a code review on a couple changes. Chris Everest has run this against prod and given it the ok.
forge:tv/2812
79d5a22a0fb65944f366302d856967f64e10ce6c - Added cmd line configurable output dir.
4aba6ece0942f25642ed977fde07eb51e6e32c2c - Fixed mem leak and handle errors gracefully.