Difference between revisions of "Creating a Google Sitemap"

From WormBaseWiki
Jump to navigationJump to search
(New page: == Synopsis == To facilitate the indexing of WormBase content, a [https://www.google.com/webmasters/sitemaps/login Google Sitemap] file must be created. Google uses this file to index dyn...)
 
Line 1: Line 1:
 +
= Deprecated =
 +
 +
The following documentation refers to google's sitemapgen.  This has been supplanted by Google Site Map Generator usage above.
 +
 
== Synopsis ==
 
== Synopsis ==
  

Revision as of 18:34, 21 September 2009

Deprecated

The following documentation refers to google's sitemapgen. This has been supplanted by Google Site Map Generator usage above.

Synopsis

To facilitate the indexing of WormBase content, a Google Sitemap file must be created. Google uses this file to index dynamic content URLs on a specified basis.

A number of scripts located at wormbase/util/google_indexing help in the creation of this file.

A new sitemap should be created on the primary production server with each new release of the database. Currently, I run the sitemap script once a day under cron.

Object classes included in the Sitemap

See dump_urls.pl for a full list of all classes exported.

Outline of procedure

1. Create a list of URLs of the most common objects in the database

This script will create a file in url_lists/VERSION-urllist.txt and update symlinks as appropriate.

   todd> dump_urls.pl /path/to/database/version

2. Use the sitemap_gen.py script to generate the site map.

To capture dynamic pages, this script uses the file created above. To capture static pages, the configuration file (wormbase_config.xml) also specifies paths to select directories and their corresponding URLs.

Test the script by:

  todd> python sitemap_gen/sitemap_gen.py \
            --config=sitemap_gen/wormbase_config.xml --testing

The '--testing' flag prevents the script from contacting google. If everything looks good, run the site indexer again:

   todd> python sitemap_gen/sitemap_gen.py  --config=sitemap_gen/wormbase_config.xml

The script will automatically contact Google and let them know that we have a new sitemap.

Running under cron

# Generate new sitemap thrice weekly
 0 4 * * 0,2,4 /usr/local/wormbase/util/google_indexing/create_sitemap.sh
 

This will dump out URLs, create the site maps, and send an appropriate HTTP request to Google.

TODO

Indexing of the FTP site and RSS feeds Specific classes (like ?Sequence and ?Protein) should probably be restricted to select objects.


--Tharris 23:54, 2 February 2006 (EST)