Alyda

Just a Girl Weblog

Remembering Who I Really Am

Use XSLT to Transform Your RSS Feed into a Sitemap

XSLT is a language for transforming XML documents into other XML documents. A Sitemap is an XML file that lists the URLs for a site, so we can use XSLT to transform an RSS document into an XML Sitemap.

In this article I'll describe how to use PHP to transform your RSS document into an XML Sitemap and specify it's location in your robots.txt file for 'Autodiscovery.'

The Feed (feed.xml)

  • <?xml version="1.0" encoding="utf-8"?>
  • <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  • <channel>
  • <description>Characterization or summary of the feed</description>
  • <link>http://blog.example.com/</link>
  • <title>Title of the Weblog</title>
  • <item>
  • <title>Title of the Item</title>
  • <link>http://blog.example.com/title-of-the-item/</link>
  • <pubDate>Tue, 19 Oct 2010 12:45:42 -1000</pubDate>
  • <category><![CDATA[Photoshop]]></category>
  • <category><![CDATA[Tips]]></category>
  • <description><![CDATA[A summary of the item"s content.]]></description>
  • <content:encoded><![CDATA[<p>Define the full content of the item, suitable for presentation as XHTML.</p>]]></content:encoded>
  • </item>
  • </channel>
  • </rss>

The XSL Stylesheet (sitemap.xslt)

  • <?xml version="1.0" encoding="utf-8"?>
  • <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:php="http://php.net/xsl" exclude-result-prefixes="php">
  • <xsl:output method="xml" omit-xml-declaration="no" indent="no" encoding="utf-8" />
  • <xsl:template match="/">
  • <urlset>
  • <xsl:apply-templates select="//item" />
  • </urlset>
  • </xsl:template>
  • <xsl:template match="item">
  • <url>
  • <xsl:apply-templates select="link" />
  • <xsl:apply-templates select="pubDate" />
  • <changefreq>never</changefreq>
  • </url>
  • </xsl:template>
  • <xsl:template match="link">
  • <loc><xsl:value-of select="." /></loc>
  • </xsl:template>
  • <xsl:template match="pubDate">
  • <lastmod>
  • <xsl:if test="function-available('php:function')">
  • <xsl:variable name="dt" select="php:function('strtotime', string(.))" />
  • <xsl:value-of select="php:function('date', 'c', number($dt))"/>
  • </xsl:if>
  • </lastmod>
  • </xsl:template>
  • </xsl:stylesheet>

The Transformation (sitemap.php)

  • <?php
  • function transform_XML($docXSL, $docXML)
  • {
  • $xsl = new DOMDocument;
  • $xsl->load($docXSL);
  • $xml = new DOMDocument;
  • $xml->load($docXML);
  • $xml->loadXML($docXML);
  • $xp = new XSLTProcessor();
  • $xp->registerPhpFunctions();
  • $xp->importStylesheet($xsl);
  • return $xp->transformToXML($xml);
  • }
  • $xml = transform_XML('sitemap.xslt', 'feed.xml');
  • $sitemap = new DomDocument;
  • $sitemap->formatOutput = true;
  • $sitemap->loadXML($xml);
  • $el = $sitemap->getElementsByTagName('urlset')->item(0);
  • $el->setAttribute('xmlns', 'http://www.sitemaps.org/schemas/sitemap/0.9');
  • $el->setAttribute('xmlns:xsi', 'http://www.w3.org/2001/XMLSchema-instance');
  • $el->setAttribute('xsi:schemaLocation', 'http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd');
  • $sitemap->save('sitemap.xml');
  • ?>

The Sitemap (sitemap.xml)

If you have more than 50,000 URLs, you will have to create multiple Sitemap files. You can submit these separately or list them in a Sitemap Index file (both are equally effective).

Specifying the Sitemap Location in Your robots.txt File

The sitemap location should be the complete URL to the Sitemap, such as: http://blog.example.com/sitemap.xml. This directive is independent of the user‑agent line, so it doesn't matter where it is placed in the file.

  • Sitemap: http://blog.example.com/sitemap.xml
  • User-agent: *
  • Disallow: /

Where to Go for More Information

Filed under: , , , ,