Dynamic Sitemap with Next.js

Static Sitemap's are easy to implement in Next.js but the task of updating it every week or so becomes tedious really fast.

In this tutorial we're gonna have a look at how to create a sitemap dynamically and as efficiently as possible in next.js. Let's dive in ๐Ÿ˜Š

The Base

Next.js serves all the files in the public folder automatically so adding a sitemap.txt or sitemap.xml file there solves that issue.

Creating a dynamic sitemap however looks pretty strange at first and has some restrictions that apply when serving anything other than normal React stuff in Next.js.

Folder Structure

Everything in the pages folder is served as a page by Next.js. We can use that functionality to add a sitemap.xml/index.tsx file to the base like this:

This creates our /sitemap.xml path and serves as a base for all our sitemap efforts ๐Ÿฅณ

The Page

The basic idea here is to serve an empty page and manipulate the response type to serve a xml file containing our sitemap instead. Since we don't need any content in our page we can 'render' an empty react component:

import { GetServerSideProps } from 'next';

export const getServerSideProps: GetServerSideProps = async ({ res }) => {};

// Default export to prevent next.js errors
const SitemapXML: React.FC = () => {
  return null;
};

export default SitemapXML;

We will use getServerSideProps to make the requests to our cms and manipulate the response to the client. Everything that is Sitemap-related will happen in that function.

We can't use all the next.js data fetching goodness here like getStaticProps. Since we need to manipulate the server response the only function that let's us do that is getServerSideProps. Any other next.js data fetch function would not work here.

Create the Sitemap

In the end we want to have a big string with appropriate xml syntax and serve that to the client. We will start by getting all the data that we need from our data source. This is highly relient on what you are using but the basic idea here is to have a big array with all our pages and then mapping through that. In our case let's just say we have a function that does that for us and we get an array of objects back from it:

export const getServerSideProps: GetServerSideProps = async ({ res }) => {
  const data = await getAllData();
};

After that we want to transform this data into something easily digestible and with information about sitemap related meta info like lastmod and priority:

export const getServerSideProps: GetServerSideProps = async ({ res }) => {
  const data = await getAllData();

  const transformedData = data.reduce((filtered, page) => {
      // exclude documents that should not be in the sitemap e.g. noindex etc.
      const isExcluded = excludeDocument(page);
      if (isExcluded) return filtered;

      filtered.push({
        loc: page.url,
        lastmod: page.last_publication_date || undefined,
        priority: 0.7,
        changefreq: 'daily',
      });

      return filtered;
    }, []);
};

You can see here that we are not only transforming our page data into objects with appropriate meta info but also filtering document's that should not be in the sitemap for example pages that are set to noindex, redirected pages etc.

Right now we have a filtered array with all our meta information about the pages and only have to transform that into a string that contains our sitemap.xml content. We will use two utility functions for that:

buildSitemapXml = (fields): string => {
    const content = fields
      .map((fieldData) => {
        const field = Object.entries(fieldData).map(
          ([key, value]) => {
            if (!value) return '';
            return `<${key}>${value}</${key}>`;
          },
        );

        return `<url>${field.join('')}</url>\n`;
      })
      .join('');

    return this.withXMLTemplate(content);
  };

  withXMLTemplate = (content: string): string => {
    return `<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:mobile="http://www.google.com/schemas/sitemap-mobile/1.0" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">\n${content}</urlset>`;
  };

A huge shoutout and thank you to Vishnu Sankar for providing this open-sourced code in his next-sitemap project. next-sitemap is a great project if you don't want to implement the full code alone but for this solution it was necessary to adjust the response with custom header's and some smaller changes to the logic so I didn't use it here.

You can see we are just mapping through our provided transformedData fields and concatenating a big string with all appropriate fields that we need in the sitemap. In the context of our getServerSideProps function this would look like this:

export const getServerSideProps: GetServerSideProps = async ({ res }) => {
  const data = await getAllData();

  const transformedData = data.reduce((filtered, page) => {
      // exclude documents that should not be in the sitemap e.g. noindex etc.
      const isExcluded = excludeDocument(page);
      if (isExcluded) return filtered;

      filtered.push({
        loc: page.url,
        lastmod: page.last_publication_date || undefined,
        priority: 0.7,
        changefreq: 'daily',
      });

      return filtered;
    }, []);

    const sitemapContent = buildSitemapXml(transformedData);
};

We are getting closer ๐Ÿ˜‰ The only thing missing is manipulating the response and serving our sitemap content to the client.

The Response

For this the only thing we have to do is set the content type to xml and write our sitemap content string to the response:

export const getServerSideProps: GetServerSideProps = async ({ res }) => {
  const data = await getAllData();

  const transformedData = data.reduce((filtered, page) => {
      // exclude documents that should not be in the sitemap e.g. noindex etc.
      const isExcluded = excludeDocument(page);
      if (isExcluded) return filtered;

      filtered.push({
        loc: page.url,
        lastmod: page.last_publication_date || undefined,
        priority: 0.7,
        changefreq: 'daily',
      });

      return filtered;
    }, []);

    const sitemapContent = buildSitemapXml(transformedData);

    res.setHeader('Content-Type', 'text/xml');
    res.write(sitemapContent);

    res.end();

    // Empty since we don't render anything
    return {
      props: {},
    };
};

Pretty easy, right?! ๐Ÿ˜„ The empty props return looks kinda funky but this is again because next.js expects to return props here to the react page, but since we don't actually serve that this can be left empty.

And with this we're already (kinda) done ๐Ÿ˜‡

This code would build your sitemap and serve it to the client on every request. You might be thinking: This works ok for smaller site's but if we have to request thousands of documents in here this could take minutes to build. Well yeah, you're right.

Let's talk about how we can optimize the loading time.

Performance

This is what I was struggling with the longest time. There are a bunch of solutions here:

  1. Building the whole sitemap before next.js start and just adding/removing entries if the document's change. This could be achieved with a webhook that fires a request to your next.js instance and than adjusting the sitemap accordingly. Storing the sitemap entries in a database would also increase the speed here.
  2. Caching the result of the sitemap and updating the sitemap in the background when the page is requested.

I went with the second option here because we already deployed our next.js instance on vercel which has superb cache functionalities and it's super easy to control. If you deploy your next.js server somewhere else this code would slightly change but most providers have some kind of cache control that you can set for the response:

const sitemapContent = buildSitemapXml(transformedData);

/**  Set Cache Control in vercel @see https://vercel.com/docs/edge-network/caching#stale-while-revalidate */
res.setHeader('Cache-Control', 's-maxage=30, stale-while-revalidate');

res.setHeader('Content-Type', 'text/xml');
res.write(sitemapContent);

res.end();

// Empty since we don't render anything
return {
  props: {},
};

This would serve our sitemap to every user that hits the route in 30 seconds but a request after that will start a revalidate request in the background which updates our sitemap. After that revalidation is complete the updated sitemap is served and the cycle starts again. This means our sitemap is build at most every 30 seconds but there is also no downtime for users while the sitemap is updating because we are still serving the old sitemap in that time. We don't have a real-time sitemap in this solution but sitemap's rarely need to be real-time and I'm ok with the user looking at a 30 second old sitemap if it mean's a massive performance increase.

So this is it! It was fun working on this and finding solutions for this problem. I hope this helps some of you. Here is the full code:

import { GetServerSideProps } from 'next';
import Sitemap from '../../util/Sitemap';

export const getServerSideProps: GetServerSideProps = async ({ res }) => {
   const data = await getAllData();

  const transformedData = data.reduce((filtered, page) => {
      // exclude documents that should not be in the sitemap e.g. noindex etc.
      const isExcluded = excludeDocument(page);
      if (isExcluded) return filtered;

      filtered.push({
        loc: page.url,
        lastmod: page.last_publication_date || undefined,
        priority: 0.7,
        changefreq: 'daily',
      });

      return filtered;
    }, []);

    const sitemapContent = buildSitemapXml(transformedData);

    /**  Set Cache Control in vercel @see https://vercel.com/docs/edge-network/caching#stale-while-revalidate */
    res.setHeader('Cache-Control', 's-maxage=30, stale-while-revalidate');

    res.setHeader('Content-Type', 'text/xml');
    res.write(sitemapContent);

    res.end();

    // Empty since we don't render anything
    return {
      props: {},
    };
};

// Default export to prevent next.js errors
const SitemapXML: React.FC = () => {
  return null;
};

export default SitemapXML;

See ya โœŒ๏ธ

17