Creating a Static NextJs Sitemap with correct Last Modified Properties using GraphQL & NextJS

NextJs
Web Dev
Sitemap

Tue Feb 21 2023

Problem

Having used several sitemap packages with www.developright.co.uk I was aware that with the constant increase of blog posts that the last modified property for each post within the sitemap became even more important. This is because google will use the last modified date to decide on whether to recrawl a particular page or take the opportunity crawl a newer page on your site.

With the previous implementation of the sitemap for www.developright.co.uk each static NextJs build resulted in all the blog posts being regenerated in the sitemap as new and therefore all having the same last mod property.

This resulted in the site hitting the crawl limit within Google and therefore Google choosing to not crawl newer pages. I therefore decided to investigate and fix this problem. 

 

What will this post explain?

In this post I am going to share how this site dynamically creates a Sitemap with correct last mod (Last Modified) properties to ensure Google crawls only new or recently updated pages.

Because the site has recently moved over to the "next-sitemap" package and to GraphQL the post will use reference to it although it would be pointed out where to swap out your equivalent to retrieve the Last Modified time for each Page 

 

Installing the package

Install the "next-sitemap" package and add it to your scripts within your package.json

npm install next-sitemap --save

 Inside of your package.json within scripts object add

"sitemap": "next-sitemap"

 

Add and definte the next-sitemap config file

This is the standard config you should need, if you run "npm run sitemap" and your application is static it'll build your sitemap and add it to the public folder

const config = {
  siteUrl: "https://www.yourdomain.com",
  generateRobotsTxt: true,
  changefreq: 'weekly',
  priority: 0.7,
};

module.exports = config;

 

However, if you have dynamic pages that are generated from some sort of data store you will see that these pages all have the same last mod value. This is because the package does not know when the file is actually created in the data store. It is using the file systems created time as the last mod value.

 

Modifying the Sitemap to use Last Mod from your data store

In the following example a transform function has been set up. The function takes a config and path parameter where config is the default config defined in the initial configuration whilst path is the full path for the page. eg (/posts/example, /posts/, etc)

 

In the example the area where we are modifyng the last mod property is within the "path.includes("/posts/")" condition.

The logic within there does the following

  • Retrieves the page name by using path.split. Meaning /posts/example returns example
  • The getGraphQLQuery function is this sites data store, in your case you will need to swap it out to retrieve the dynamic pages created_at or equivalent property
  • In the return part we are using the updated_at property with the Date object and setting it to ISODateTime so that the Last Mod property is using the expected format 
const config = {
  siteUrl: "https://www.yourdomain.com",
  generateRobotsTxt: true,
  changefreq: 'weekly',
  priority: 0.7,
  transform: async (config, path) => {
    let locPath = path;
    // Ensure everything but / (index) page ends with .html
    if (path !== "/") {
      locPath = `${path}.html`;
    }

    // Ensure only pages which start with /posts/ have the following logic applied
    if (path.includes("/posts/")) {
      // Retrieves the path of the page excluding /posts/. eg: /posts/example returns example
      const postPage = path.split('/posts/')[1];
      // Retrieves from the Data Store information about the dynamic page (updated_at property is retrieved here)
      const page = await getGraphQLQuery("getPage", GetPageDocument, {
        slug: postPage,
      });

      // Returns data to set for this page, uses updated_at as the lastmod value
      return {
        loc: locPath,
        changefreq: 'weekly',
        priority: 0.7,
        lastmod: page?.updated_at ? new Date(page.updated_at).toISOString() : new Date().toISOString(),
        alternateRefs: config.alternateRefs ?? [],
      };
    }

    return {
      loc: locPath,
      changefreq: config.changefreq,
      priority: config.priority,
      lastmod: config.autoLastmod ? new Date().toISOString() : undefined,
      alternateRefs: config.alternateRefs ?? [],
    };
  }
};

module.exports = config;