There are several solutions to fetch the list of URLs from an XML sitemap directly in a Google Spreadsheet. My favorite one uses the IMPORTXML function with xPath.
How to extract all URLs from an XML sitemap with the IMPORTXML function?
If you are eager to get the function, here it is:
=IMPORTXML("https://arthur.camberlein.com/sitemap_blogs_1.xml", "//*[local-name() ='url']/*[local-name() ='loc']")
The step-by-step process would be:
- Open a new Google Spreadsheet
- Define a cell with your XML sitemap URL (in my case
A1 = https://arthur.camberlein.com/sitemap_blogs_1.xml) - Insert the following formula into another cell:
=IMPORTXML(A1, "//*[local-name() ='url']/*[local-name() ='loc']") - Make sure there is nothing below the cell containing the formula, or you'll get an error
- Voilà
This solution can be copied from tab to tab, or even between files, as you will only need to change the content of one cell (A1 in this case).
A quick explanation of the formula:
- The
IMPORTXMLfunction works with xPath from a URL - A1 is the cell with the URL (here the XML sitemap URL)
- The xPath tracks any URLs inside
loctags and extracts them using[local-name() ='loc']
Note: This article covers Google Spreadsheet as an all-in-one solution.