There are several solutions to fetch the list of URLs from an XML sitemap directly in a Google Spreadsheet. My favorite one uses the IMPORTXML
function with xPath.
How to extract all URLs from an XML sitemap with the IMPORTXML
function?
If you are eager to get the function, here it is:
=IMPORTXML("https://arthur.camberlein.com/sitemap_blogs_1.xml", "//*[local-name() ='url']/*[local-name() ='loc']")
The step-by-step process would be:
- Open a new Google Spreadsheet
- Define a cell with your XML sitemap URL (in my case
A1 = https://arthur.camberlein.com/sitemap_blogs_1.xml
) - Insert the following formula into another cell:
=IMPORTXML(A1, "//*[local-name() ='url']/*[local-name() ='loc']")
- Make sure there is nothing below the cell containing the formula, or you'll get an error
- Voilà
This solution can be copied from tab to tab, or even between files, as you will only need to change the content of one cell (A1 in this case).
A quick explanation of the formula:
- The
IMPORTXML
function works with xPath from a URL - A1 is the cell with the URL (here the XML sitemap URL)
- The xPath tracks any URLs inside
loc
tags and extracts them using[local-name() ='loc']
Note: This article covers Google Spreadsheet as an all-in-one solution.