Build XML sitemaps that tell Google exactly which pages exist in each language version, with hreflang annotations that prevent duplicate content confusion.
Elena Vasquez owns a Mexican food export company that sells salsas, tortillas, and dried chiles to distributors across North America and Europe. Her website had grown quickly to include separate Spanish, English, and French versions. But her developer had generated a single XML sitemap that listed every URL from every language version without hreflang annotations. Google was treating the Spanish and English versions of the same product page as entirely separate, unconnected pages, and many of them were flagged as duplicate content. Her indexing coverage was a mess -- some pages indexed in the wrong language, others not indexed at all.
An XML sitemap is a file that lists the URLs on your website and provides metadata about each one, including when it was last updated, how often it changes, and its priority relative to other pages. For multi-language export sites, the sitemap plays an additional critical role: it can include xhtml:link annotations that tell Google which URLs are equivalent translations of each other. This hreflang-in-sitemap approach is Google's preferred method for signaling language and country variants, and implementing it correctly is one of the highest-impact technical SEO actions an exporter can take.
In this lesson, you will learn how to structure sitemaps for multi-language sites, include proper hreflang annotations, and submit and monitor your sitemaps in Google Search Console. You will also learn automated approaches that keep your sitemaps up to date as you add new language versions and products.
While you can put all of your URLs into a single sitemap with hreflang annotations, the cleaner approach for export sites with many language versions is to create a separate sitemap index that points to individual sitemaps for each language. A typical structure would be a root sitemap index that references /sitemap-en.xml, /sitemap-es.xml, /sitemap-fr.xml, and so on. This makes it easier to update individual language sitemaps without regenerating the entire set, and it gives you clearer visibility into each language's coverage in Search Console.
Each language-specific sitemap should include only the URLs for that language version. Within each sitemap, use the xhtml:link element to specify the alternate language versions for each URL. For example, the English sitemap entry for a product page would include xhtml:link tags pointing to the Spanish and French equivalents. Google uses these cross-references to understand that the three URLs represent the same content in different languages, which prevents duplicate content issues and helps serve the correct version to users in each market.
Use the
Hreflang annotations in sitemaps take the form of xhtml:link elements within each URL entry. A typical entry in the English sitemap for a product page looks like this: the
Every URL in the sitemap must have a self-reference -- that is, the English entry must also include an xhtml:link with hreflang="en" pointing back to itself. This is a requirement that many implementations get wrong. If the self-reference is missing, Google may ignore the hreflang annotations entirely. The complete set of annotations for each URL must include every language version, and the annotations must be identical across all versions. If the English sitemap lists the Spanish version as an alternate, the Spanish sitemap must list the English version as an alternate with the same hreflang value.
For export sites targeting specific countries rather than just languages, use the country-and-language format: hreflang="en-GB" for English speakers in the United Kingdom, hreflang="en-US" for the United States, and hreflang="es-MX" for Spanish speakers in Mexico. This tells Google to serve the appropriate page based on both the user's language and their geographic location. The x-default value should point to the version you want served to users whose language or country does not have a specific variant, typically your main English site.
Once your sitemaps are correctly structured, submit each one in Google Search Console under the Sitemaps section. Submit the sitemap index file rather than individual language sitemaps -- Google will discover and crawl the child sitemaps automatically. You can also reference your sitemap in robots.txt using the Sitemap directive, which helps other search engines like Bing and Yandex discover your content as well.
Monitor the Sitemaps report in Google Search Console for errors. Common issues include URLs that return 4xx or 5xx status codes, URLs blocked by robots.txt, URLs with redirects (sitemaps should only contain canonical, non-redirecting URLs), and URLs that are not accessible to Googlebot. Each error should be investigated and fixed promptly because Google may stop processing your sitemap if it encounters too many errors.
The Sitemaps report also shows how many URLs were submitted versus how many were actually indexed. If there is a large gap between submitted and indexed -- for example, 500 submitted but only 200 indexed -- investigate why. The difference may be due to quality issues, duplicate content, or technical problems that prevent indexing. Use the Indexing report to see the reasons pages were excluded and address the underlying causes. Export sites with large discrepancies often have crawl budget or duplicate content issues that need attention.
A single XML sitemap can contain a maximum of 50,000 URLs and must be no larger than 50 MB uncompressed. If your export site has more URLs than this -- which is common for large product catalogs across multiple languages -- you must split the URLs into multiple sitemaps and use a sitemap index file to reference them. Each language version with over 50,000 URLs would need its own set of multiple sitemaps.
No. Every indexable page on your site should be included in at least one sitemap, regardless of language. The hreflang annotations within the sitemap tell Google how the different language versions relate to each other, so excluding any language version from the sitemap would prevent Google from discovering those pages through the sitemap and break the hreflang relationship signals.
Update your sitemaps every time you add, remove, or substantially change a page. For active export sites adding new products or language versions weekly, the sitemap should be regenerated automatically on the same cadence. Google does not specify a minimum update frequency, but a stale sitemap that has not been updated in months signals to Google that the site may not be actively maintained, which can reduce crawl frequency.